Left: Different parameter sizes of LLaVA 1.5 can enhance their learning through CSR iterations. Right: The change in image relevance scores before and after employing CSR.
Compared to the original model and self-rewarding approaches, CSR effectively improves performance across various benchmarks.
Compared to other data-driven preference learning methods and self-rewarding approaches, CSR demonstrates superior performance [1].
LLaVA 1.5, optimized through CSR, outperforms other open-source LVLMs across various benchmarks [2].
Additionally, as the model continues to learn iteratively online, CSR effectively reduces model hallucinations and enhances overall capabilities [3-4].
@article{zhou2024calibrated,
title={Calibrated Self-Rewarding Vision Language Models},
author={Zhou, Yiyang and Fan, Zhiyuan and Cheng, Dongjie and Yang, Sihan and Chen, Zhaorun and Cui, Chenhang and Wang, Xiyao and Li, Yun and Zhang, Linjun and Yao, Huaxiu},
journal={arXiv preprint arXiv:2405.14622},
year={2024}
}