Treffer: Mono-ViM: A Self-Supervised Mamba Framework for Monocular Depth Estimation in Endoscopic Scenes.
Weitere Informationen
Self-supervised depth estimation methods enable the recovery of scene depth information from monocular endoscopic images, thereby assisting endoscopic navigation. However, existing monocular endoscopic depth estimation methods generally fail to capture the inherent continuity of depth in intestinal structures. To address this limitation, this work presents the Mono-ViM framework, a CNN-Mamba hybrid architecture that enhances depth estimation accuracy through an innovative depth-first scanning mechanism. The proposed framework comprises a Depth Local Visual Mamba module employing depth-first scanning to extract rich structural features, and a cross-query layer, which reframes depth estimation as a soft classification problem to significantly enhance robustness and uncertainty handling in complex endoscopic environments. Experimental results on the SimCol Dataset and C3VD demonstrate that the proposed method achieves high depth estimation accuracy, with Abs Rel of 0.070 and 0.084, respectively. These results correspond to error reductions of 16.7% and 19.4% compared to existing methods, highlighting the efficacy of the proposed approach. [ABSTRACT FROM AUTHOR]