I’m Dongjie Cheng (程东杰), a PhD student in the Department of Computing at The Hong Kong Polytechnic University (PolyU), advised by Prof. Wenjie Li and Prof. Yongqi Li. I received my B.Eng. in Artificial Intelligence from Sichuan University.
My research interests lie in the Multimodal Large Language Models and Unified Multimodal Models. I also always maintain an open mind and am willing to learn new things. You can find my main research interests on

🔥 News

2025.03: 🎉🎉 I’ll be joining PolyU as a PhD student in Fall 2025—see you in 🇭🇰!
2024.10: 🎉🎉 Thanks to all collaborators and mentors, my Google Scholar citations have now reached 100.
2024.09: 🎉🎉 CSR was accepted by NeurIPS 2024
2024.08: 🎉🎉 TV-SAM was accepted by Big Data Mining and Analytics (JCR Q1, 中科院1区, IF=7.7)
2024.07: 🎉🎉 The short version of CSR was presented in ICML 2024 FM-Wild Workshop

📝 Research Publication

Calibrated self-rewarding vision language models

Co-First Author, Accepted, NeurIPS-2024

(The short version is presented in ICML 2024 FM-Wild Workshop)

Arxiv, abs/2405.14622

TV-SAM: Increasing Zero-Shot Segmentation Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation

Co-First Author, Accepted, Big Data Mining and Analytics (JCR Q1, 中科院1区TOP, IF=7.7)

ArXiv, abs/2402.15759

SAM on Medical Images: A Comprehensive Study on Three Prompt Modes

Co-First Author, Cited by: 427

ArXiv, abs/2305.00035

Evaluating Hallucination in Text-to-Image Diffusion Models with Scene-Graph based Question-Answering Agent

Co-First Author

ArXiv, abs/2412.05722

📖 Education

Ph.D. in Computing, 2025-Now, The Hong Kong Polytechnic University (PolyU)
B.S. in Artificial Intelligence, 2021-2025, Sichuan University, GPA (Compulsory): 3.9/4, 91.51/100

🎖 Honors & Awards

National Scholarship (1/48 that year)
2023.11
National Third Prize, “China Software Cup - Finals”
2023.08
Second Prize, “RoboMaster - North Region Competition”
2023.06
Second Prize, “National College Mathematics Competition”
2021.12

💻 Experience

PolyU

PhD student of Prof.Wenjie Li ’s Lab
September, 2025 - Present
Supervisor: Prof.Wenjie Li, Prof.Yongqi Li

WEST CHINA HOSPITAL – BIG DATA CENTER

Research Assitant of Prof.Kang Li ’s Lab
February, 2023 - August 2024
Supervisor: Prof.Kang Li, Prof.Qicheng Lao

UNC-CHAPEL HILL

Remote Intern of Prof.Huaxiu Yao ’s Lab
March, 2024 - September, 2024
Supervisor: Prof.Huaxiu Yao

🧩 Projects

🎙 VLM self-rewarding project

📝 Calibrated Self-Rewarding Vision Language Models

🔗 Project

Our work addresses misalignment challenges in LVLMs by proposing the Calibrated Self-Rewarding (CSR) approach, which enables the model to self-improve by iteratively generating candidate responses, evaluating the reward for each response, and curating preference data for fine-tuning. In the reward modeling, we employ a step-wise strategy and incorporate visual constraints into the self-rewarding process to place greater emphasis on visual input. Empirical results demonstrate that CSR enhances performance and reduces hallucinations across ten benchmarks and tasks, achieving substantial improvements over existing methods by 7.62%. Our empirical results are further supported by rigorous theoretical analysis, under mild assumptions, verifying the effectiveness of introducing visual constraints into the self-rewarding paradigm.
I was responsible for the specific implementation and optimization of the CSR method, as well as core tasks such as DPO training and SFT training for VLM.

👨‍⚕️ SAM project

📝 SAM on Medical Images: A Comprehensive Study on Three Prompt Modes.

📝 TV-SAM: Increasing Zero-Shot Segmentation Performance on Multimodal Medical Images Using GPT-4 Generated Descriptive Prompts Without Human Annotation