Why Does RL Generalize Better Than SFT A Data-Centric Perspective on VLM Post-Training 2026-06-12 07:38:27 7分钟阅读