PIXER is a lightweight, self-supervised feature utility network trained with a Bayesian formulation, designed to predict a pixel-wise Reliability score indicating how likely a feature keypoint at that pixel will survive matching. Top-left: raw keypoints from two conventional detectors—SIFT (blue) and ORB (green)—illustrate the well-known disagreement between handcrafted methods. Top-right: after passing the same detections through PIXER, the reliability mask F removes low-utility points (red), leaving a compact set of high-value features. Bottom: dense reliability heatmaps are produced in a single forward pass; thresholding this map yields mask F . When this mask is applied as a pre-processing filter, any downstream detector/descriptor benefits—cutting the number of keypoints by roughly half and reducing drift in visual-odometry pipelines.
Robust feature detection and matching are fun- damental for visual odometry and SLAM, yet most methods lack a principled measure of a feature’s reliability prior to downstream use. We present PIXER, a learning-based method that predicts the interest reliability of each pixel for feature-based visual navigation. RPM is designed as a lightweight, single-shot model that outputs dense reliability maps from a single image, using a generalized Bayesian formulation without requiring Monte Carlo sampling. These outputs are used to selectively filter low-utility features prior to matching, improving downstream matching and pose estimation. Integrated into a standard visual odometry pipeline, RPM improves average trajectory accuracy by 31% while reducing feature usage by 49% across eight different feature detectors. Our results demonstrate that pre-filtering input imagery based on learned reliability enhances the robustness and efficiency of SLAM systems. Code, models, and datasets will be made publicly available upon publication.
Coming Soon!
The training of PIXER is a three-step process. First, we train a network with a general understanding of interestingness (i.e., feature point detection) where we make use of SiLK in this work (top left). Next, we convert this model to a Bayesian Neural Network (BNN) and train again using the addition of probabilistic losses (e.g., KL Divergence, top middle). Finally, we train a specialized uncertainty head using feature variance computed by Monte Carlo supervision from the BNN (top right). The PIXER inference model is then the joint feature-point probability and uncertainty networks (bottom middle). The combination of pixel-wise probability and uncertainty forms our definition of featureness F (bottom right), used to describe the general utility of the visual information.
We evaluate PIXER aided visual odometry on a custom dataset, named "Davis", collected using a ZED 2i camera + Mosaic X5 GNSS on a Boston Dynamics Spot Quadruped. Results in Table below show superior estimation performance with mean RMSE improvement of 34% and mean feature reduction of 41%.
@inproceedings{
turkar2026enhancing,
title={Enhancing Visual Odometry with Reliable Pixel Masking},
author={Yash Turkar and Timothy Chase and Christo Aluckal and Karthik K Dantu},
booktitle={23rd Conference on Robots and Vision},
year={2026},
url={https://openreview.net/forum?id=ZgWAW6mIQh}
}