VISION : Visual Inspection System with Intelligent Observation and Navigation

VISION: Visual Inspection System with Intelligent Observation and Navigation

^* Equal contribution

Center for Embodied Autonomy and Robotics (CEAR)
University at Buffalo

Schematic of the Erie Canal culvert inspection setup. The cross-section of the canal and culvert is drawn to scale, derived from a 3D reconstruction of the site, illustrating the buried drainage conduit, a.k.a. the culvert, beneath the canal embankment. Culverts provide critical drainage but their confined geometry and location make inspection difficult. A Boston Dynamics Spot quadruped, outfitted with a custom inspection payload (pan–tilt gimbal, inspection camera with co-located light, VLM proposal camera, and auxiliary LED flood), is deployed at the culvert entrance and traverses the interior. Insets show the robot at the portal and during inspection runs inside the conduit. Only the robot payload illustration is not to scale; all other canal and culvert geometry is based on the site reconstruction. The system autonomously navigates through the 66 m long, 1.2 m diameter culvert, capturing targeted images for structural condition assessment.

Abstract

Culverts on canals such as Erie Canal built originally in 1825 require frequent inspections to ensure safe operation. Human inspection of culverts is challenging due to age, geometry, poor illumination, weather and lack of easy access. We introduce VISION, an end-to-end, language-in-the-loop autonomy system that couples a web-scale vision–language model (VLM) with constrained viewpoint planning for autonomous inspection of culverts. Brief prompts to the VLM solicit open-vocabulary ROI proposals with rationales and confidences, stereo depth is fused to recover scale, and a planner—aware of culvert constraints commands repositioning moves to capture targeted close-ups. Deployed on a quadruped in Culvert under the Erie canal, VISION closes the see→decide→move→re-image loop on-board and produces high-resolution images for detailed reporting without domain-specific fine-tuning. In an external evaluation by New York Canal Corporation personnel, initial ROI proposals achieved 61.4% agreement with subject-matter experts, and final post-re-imaging assessments reached 80%, indicating that VISION converts tentative hypotheses into grounded, expert-aligned findings.

preprint BibTeX

@misc{dighe2025languageintheloopculvertinspectionerie, title={Language-in-the-Loop Culvert Inspection on the Erie Canal}, author={Yashom Dighe and Yash Turkar and Karthik Dantu}, year={2025}, eprint={2509.21370}, archivePrefix={arXiv}, primaryClass={cs.RO}, url={https://arxiv.org/abs/2509.21370}, }

VISION: Visual Inspection System with Intelligent Observation and Navigation

Abstract

Video Presentation

Results Visualized

preprint BibTeX