본문 바로가기

분류 전체보기26

논문 리뷰 | Learning to Refuse: Refusal-Aware Reinforcement Fine-Tuning for Hard-Irrelevant Queries in Video Temporal Grounding (2025) https://arxiv.org/abs/2511.23151 Learning to Refuse: Refusal-Aware Reinforcement Fine-Tuning for Hard-Irrelevant Queries in Video Temporal GroundingVideo Temporal Grounding (VTG) aims to localize a temporal segment in a video corresponding to a natural language query. However, existing VTG models assume that a relevant segment always exists, causing them to always predict a target segment even w.. 2026. 1. 30.
논문 리뷰 | Visual Agentic Reinforcement Fine-Tuning (2025) https://arxiv.org/abs/2505.14246 Visual Agentic Reinforcement Fine-TuningA key trend in Large Reasoning Models (e.g., OpenAI's o3) is the native agentic ability to use external tools such as web browsers for searching and writing/executing code for image manipulation to think with images. In the open-source research community,arxiv.org [ Visual Agentic Reinforcement Fine-Tuning (2025) ] Large vi.. 2026. 1. 22.
논문 리뷰 | Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual Questions (ACL 2025) https://arxiv.org/abs/2507.13773 Teaching Vision-Language Models to Ask: Resolving Ambiguity in Visual QuestionsIn visual question answering (VQA) context, users often pose ambiguous questions to visual language models (VLMs) due to varying expression habits. Existing research addresses such ambiguities primarily by rephrasing questions. These approaches neglect thearxiv.org [ Teaching Vision-L.. 2026. 1. 19.
논문 리뷰 | Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal Mechanism (EMNLP 2024) https://arxiv.org/abs/2311.01041 Learn to Refuse: Making Large Language Models More Controllable and Reliable through Knowledge Scope Limitation and Refusal MechLarge language models (LLMs) have demonstrated impressive language understanding and generation capabilities, enabling them to answer a wide range of questions across various domains. However, these models are not flawless and often prod.. 2026. 1. 19.
논문 리뷰 | MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual Questions (AAAI 2026) https://arxiv.org/abs/2507.21503 MoHoBench: Assessing Honesty of Multimodal Large Language Models via Unanswerable Visual QuestionsRecently Multimodal Large Language Models (MLLMs) have achieved considerable advancements in vision-language tasks, yet produce potentially harmful or untrustworthy content. Despite substantial work investigating the trustworthiness of language models, MMLarxiv.orght.. 2026. 1. 10.
논문 리뷰 | R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization (ICCV 2025) https://arxiv.org/abs/2503.12937 R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy OptimizationRecent studies generally enhance MLLMs' reasoning capabilities via supervised fine-tuning on high-quality chain-of-thought reasoning data, which often leads models to merely imitate successful reasoning paths without understanding what the wrong reasoni.. 2026. 1. 4.