MARs: Multi-view Attention Regularizations

ECCV 2024
(European Conference on Computer Vision)

University at Buffalo

Patch-based features of space terrain exhibit extreme inter-class similarity and varying multi-view observations, which is difficult for metric learning to discern where attention focus is disparate. We propose Multi-view Attention Regularizations (MARs) to alleviate this issue and drive the attention of arbitrary viewpoints together.

Abstract

The visual detection and tracking of surface terrain is required for spacecraft to safely land on or navigate within close proximity to celestial objects. Current approaches rely on template matching with pre-gathered patch-based features, which are expensive to obtain and a limiting factor in perceptual capability. While recent literature has focused on in-situ detection methods to enhance navigation and operational autonomy, robust description is still needed. In this work, we explore metric learning as the lightweight feature description mechanism and find that current solutions fail to address inter-class similarity and multi-view observational geometry. We attribute this to the view-unaware attention mechanism and introduce Multi-view Attention Regularizations (MARs) to constrain the channel and spatial attention across multiple feature views, regularizing the what and where of attention focus. We thoroughly analyze many modern metric learning losses with and without MARs and demonstrate improved terrain-feature recognition performance by upwards of 85%. We additionally introduce the Luna-1 dataset, consisting of Moon crater landmarks and reference navigation frames from NASA mission data to support future research in this difficult task.

Method

MARs embeds attention information into height and width-disparate similarity spaces, driving the what and where of multi-view attention focus together while avoiding the trivial solution (e.g., both attention maps are zero everywhere).

The Luna-1 Dataset

To facilitate future research in the challenging space landmark recognition problem, we introduce the Luna-1 dataset, consisting of 5,067 Moon crater landmarks and 2,161 emulated orbital navigation frames from real-world NASA data.

Results

MARs leads to improved attention alignment compared with rotational equivariant and spatial attention layers (RIC CA):

and maintains this alignment across different environment features:

MARs can also converge challenging data, where traditional methods fail:

During training, MARs (bottom) leads to greater stability and faster, more uniform convergence against rotational equivariant and spatial attention layers (RIC CA, top):

Training Evolution

Attention alignment evolution during training for MARs (bottom row) against rotational equivariant and spatial attention layers (RIC CA, top row) for Mars Crater (left), Moon Crater (middle), and Earth Stadium (right) features.

@inproceedings{chase2024mars, title={MARs: Multi-view Attention Regularizations for Patch-based Feature Recognition of Space Terrain}, author={Timothy Chase Jr and Karthik Dantu}, year={2024}, booktitle={ECCV}, }

MARs: Multi-view Attention Regularizations
for Patch-based Feature Recognition
of Space Terrain

ECCV 2024
(European Conference on Computer Vision)

Abstract

Video Presentation

Method

The Luna-1 Dataset

Results

MARs leads to improved attention alignment compared with rotational equivariant and spatial attention layers (RIC CA):

and maintains this alignment across different environment features:

MARs can also converge challenging data, where traditional methods fail:

During training, MARs (bottom) leads to greater stability and faster, more uniform convergence against rotational equivariant and spatial attention layers (RIC CA, top):

Training Evolution

Attention alignment evolution during training for MARs (bottom row) against rotational equivariant and spatial attention layers (RIC CA, top row) for Mars Crater (left), Moon Crater (middle), and Earth Stadium (right) features.

Poster

BibTeX

MARs: Multi-view Attention Regularizationsfor Patch-based Feature Recognitionof Space Terrain

ECCV 2024(European Conference on Computer Vision)

Abstract

Video Presentation

Method

The Luna-1 Dataset

Results

MARs leads to improved attention alignment compared with rotational equivariant and spatial attention layers (RIC CA):

and maintains this alignment across different environment features:

MARs can also converge challenging data, where traditional methods fail:

During training, MARs (bottom) leads to greater stability and faster, more uniform convergence against rotational equivariant and spatial attention layers (RIC CA, top):

Training Evolution

Attention alignment evolution during training for MARs (bottom row) against rotational equivariant and spatial attention layers (RIC CA, top row) for Mars Crater (left), Moon Crater (middle), and Earth Stadium (right) features.

Poster

BibTeX

MARs: Multi-view Attention Regularizations
for Patch-based Feature Recognition
of Space Terrain

ECCV 2024
(European Conference on Computer Vision)