Imran Lab’s Paper Reading Group

Fall 2024 Discussions

Dec 11, 2024: [N. Munia], VCoder: Versatile Vision Encoders for Multimodal Large Language Models

Dec 4, 2024: [K. Rifa], ZIGMA: A DiT-style Zigzag Mamba Diffusion Model

Nov 20, 2024: [T. Ward], Diffuse, Attend, and Segment: Unsupervised Zero-Shot Segmentation using Stable Diffusion

Nov 13, 2024: [N. Munia], What Matters in Transformers? Not All Attention is Needed

Nov 6, 2024: [M. Shah], Context-Aware Layout to Image Generation With Enhanced Object Appearance

Oct 30, 2024: [M. Massey], Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs

Oct 23, 2024: [K. Rifa], Dimba: Transformer-Mamba Diffusion Models

Oct 16, 2024: [T. Ward], From SAM to CAMs: Exploring Segment Anything Model for Weakly Supervised Semantic Segmentation

Oct 2, 2024: [N. Munia], Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model

Sep 18, 2024: [M. Shah], ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation

Sep 11, 2024: [M. Massey], Bridging Remote Sensors with Multisensor Geospatial Foundation Models

Sep 4, 2024: [K. Rifa], Blind CT Image Quality Assessment Using DDPM-derived Content and Transformer-based Evaluator

Ag 28, 2024: [T. Ward], Medical SAM 2: Segment medical images as video via Segment Anything Model 2

Aug 20, 2024: [N. Munia], LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation