논문 리뷰 | Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding (AAAI 2026)

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Multimodal Large Language Models struggle to maintain reliable performance under extreme real-world visual degradations, which impede their practical robustness. Existing robust MLLMs predominantly rely on implicit training/adaptation that focuses solely o

arxiv.org

[ Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding (AAAI 2026) ]

MLLM은 real-world visual degradation 상황에서 성능이 저하되는 문제가 존재함

기존 방법은 visual encoder를 training, adaptation하는 것에 초점 맞추었지만, 이는 limited interpretability (semantic information에 대한 degradation impact를 알 수 없음), isolated optimization (visual encoder와 LLM 간의 degradation-propagation relation을 무시함) 문제가 존재함

이러한 문제를 structured reasoning으로 완화하는 Robust-R1 제안함

✦ Methodology

기존 MLLM은 degraded visual input과 텍스트 프롬프트를 받아서 출력을 생성함

동일한 인풋을 robust MLLM에 넣어 기존 MLLM에 깨끗한 입력을 넣었을 때의 출력과 가까워지도록 최적화하는 것이 목표임

이는 explicit degradation-aware reasoning process를 포함함

(Degradation perception -> Reconstruction -> Generate output)

$D_d$ : i번째 degradation types, intensities

$\Delta_d$ : 해당 degradation의 영향

$T_X$ : degradation 정보를 이용해 복원되는 깨끗한 입력 X에서의 semantic representation

$Y_d$ : degradation-aware reasoning chain을 조건으로 생성되는 robust output

Supervised Fine-Tuning (SFT)을 통해 모델이 structured reasoning chain을 시퀀셜하게 생성하게 함으로써 모델이 degradation-aware reasoning ability를 가지도록 함

Reward for Accurate Degradation Parameters

그러나, SFT만으로는 degradation parameter를 정확히 인식할 수 없음

degradation parameter space에서 동작하는 reward function을 디자인함

$\tau_d$ : 모델이 예측한 i번째 degradation type

$s_d$ : 해당 degradation의 intensity

$\delta$ : Kronecker delta function (같으면 1, 다르면 0)

type 불일치하면 -1, type이 일치하면 intensity accuracy에 비례하는 리워드 (1-intensity 예측 차이)를 모든 degradation에 대해 합산

Reward for Suitable Reasoning Path

필요 이상으로 길게 생각하는 overthinking은 출력 퀄리티를 향상시키지 않으면서도 inference efficiency를 낮춤

분석 결과, dengradation intensity가 높을수록 긴 리즈닝 체인 필요함

모델이 예측한 출력의 길이와 GT의 길이가 동일하면 +1, 길이의 차이에 비례해 선형 감소

두 reward function을 통합함

GRPO 사용해 각 입력에 대해 여러 후보 응답 G를 생성함

reward를 최대화하는 방향으로 학습

✦ Experiments

Qwen2.5-VL-3B를 백본으로 사용해 SFT로 학습 데이터의 25%를 학습, RL로 75%를 학습함

vision encoder, visual projection을 고정하고 full-parameter fine-tuning을 수행함

Robust-R1이 모든 degradation에 대해 좋은 성능을 보임

'Paper' 카테고리의 다른 글

논문 리뷰 \| DeepEyesV2: Toward Agentic Multimodal Model (2026) (0)	2026.01.03
논문 리뷰 \| Training-free Uncertainty Guidance for Complex Visual Tasks with MLLMs (2025) (0)	2025.12.24
논문 리뷰 \| R-Tuning: Instructing Large Language Models to Say ‘I Don’t Know’ (NAACL 2024) (0)	2025.12.21
논문 리뷰 \| ClearSight: Visual Signal Enhancement for Object Hallucination Mitigation in Multimodal Large Language Models (CVPR 2025) (0)	2025.11.15
논문 리뷰 \| MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis (CVPR 2025) (0)	2025.11.07

Muad'Dib

논문 리뷰 | Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding (AAAI 2026)

'Paper' 카테고리의 다른 글

티스토리툴바

논문 리뷰 | Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding (AAAI 2026)

'Paper' 카테고리의 다른 글

관련글

티스토리툴바