Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling


In European Conference on Computer Vision (ECCV), 2020
1min introduction
Results

Abstract
Monocular visual odometry (VO) suffers severely from error accumulation during frame-to-frame pose estimation. In this paper, we present a self-supervised learning method for VO with special consideration for consistency over longer sequences. To this end, we model the long-term dependency in pose prediction using a pose network that features a two-layer convolutional LSTM module. We train the networks with purely self-supervised losses, including a cycle consistency loss that mimics the loop closure module in geometric VO. Inspired by prior geometric systems, we allow the networks to see beyond a small temporal window during training, through a novel a loss that incorporates temporally distant (e.g., O(100)) frames. Given GPU memory constraints, we propose a stage-wise training mechanism, where the first stage operates in a local time window and the second stage refines the poses with a "global" loss given the first stage features. We demonstrate competitive results on several standard VO datasets, including KITTI and TUM RGB-D.

Papers

ECCV2020
Citation

Yuliang Zou, Pan Ji, Quoc-Huy Tran, Jia-Bin Huang, and Manmohan Chandraker, "Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling", In Proceedings of European Conference on Computer Vision, 2020.


Bibtex
@inproceedings{zou2020learning,
    author    = {Zou, Yuliang and Ji, Pan and and Tran, Quoc-Huy and Huang, Jia-Bin and Chandraker, Manmohan}, 
    title     = {Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling}, 
    booktitle = {European Conference on Computer Vision},
    year      = {2020}
}
Download

I am applying for the code release from the company. The authors cannot control whether/when the code will be released.

NOTE:

1) For trajectories in training set and Seq. 09-10, a file with "TUM" as suffix means it is in the TUM format and not scale-aligned, files with "KITTI" as suffix means it is in the KITTI format and scale-aligned.

2) For trajectories in the test set, all files are in the KITTI format and ready to submit.


References