Media Summary: CVPR26 Poster: Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress. Hyun Lee, Hyemin Jeong, Yejin Kim, Hyungwook Choi, Hyunsoo Cho, Soo Kyung Kim, Joonseok Lee. A More Word-like Image ... VIMCAN: Visual-Inertial 3D Human Pose Estimation with Hybrid Mamba-Cross-Attention Network.
Cvpr 26 R2vlm - Detailed Analysis & Overview
CVPR26 Poster: Recurrent Reasoning with Vision-Language Models for Estimating Long-Horizon Embodied Task Progress. Hyun Lee, Hyemin Jeong, Yejin Kim, Hyungwook Choi, Hyunsoo Cho, Soo Kyung Kim, Joonseok Lee. A More Word-like Image ... VIMCAN: Visual-Inertial 3D Human Pose Estimation with Hybrid Mamba-Cross-Attention Network. Kiseok Choi, Hyeongjun Cho, Inchul Kim, Min H. Kim (2026) “Revisiting Pose Sensitivity in Splat-based Computed Tomography ... Disentangle-then-Align: Non-Iterative Hybrid Multimodal Image Registration via Cross-Scale Feature Disentanglement. [CVPR 2026] tttLRM: Test-Time Training for Long Context and Autoregressive 3D Reconstruction
Video2Robo: 3DGS-based Synthetic Data from One Video Enables Scalable Robot Learning Project page: ... Paper: Project Page: Authors/Affiliations: [Seungho ... DiffusionFF: A Diffusion-based Framework for Joint Face Forgery Detection and Fine-Grained Artifact Localization ( Hakyeong Kim, Ruicheng Wang, Chengtang Yao, Jiaolong Yang, Min H. Kim (2026) “Dense Metric Depth Completion from ... In this video, we introduce a novel video object detection framework called D2FANet. D2FANet is the first framework to jointly ...