Learning Representational Invariances for Data-Efficient Action Recognition

Abstract

Data augmentation is a ubiquitous technique for improving image classification when labeled data is scarce. Constraining the model predictions to be invariant to diverse data augmentations effectively injects the desired representational invariances to the model (e.g., invariance to photometric variations) and helps improve accuracy. Compared to image data, the appearance variations in videos are far more complex due to the additional temporal dimension. Yet, data augmentation methods for videos remain under-explored. This paper investigates various data augmentation strategies that capture different video invariances, including photometric, geometric, temporal, and actor/scene augmentations. When integrated with existing semi-supervised learning frameworks, we show that our data augmentation strategy leads to promising performance on the Kinetics-100/400, Mini-Something-v2, UCF-101, and HMDB-51 datasets in the low-label regime. We also validate our data augmentation strategy in the fully supervised setting and demonstrate improved performance.

Results

Semi-Supervised Learning (FixMatch with 3D ResNet-18)

Semi-Supervised Learning on Mini-Something-v2 (TCL with TSM ResNet-18)

Semi-Supervised Learning on Kinetics-400 (TCL with TSM ResNet-18)

Supervised Learning (FixMatch with R(2+1)D ResNet-34)

Cross-Dataset Semi-Supervised Learning (FixMatch with R(2+1)D ResNet-34)

Papers

arXiv

Supplementary Material

Bibtex

@article{zou2021learning,
  title={Learning Representational Invariances for Data-Efficient Action Recognition},
  author={Zou, Yuliang and Choi, Jinwoo and Wang, Qitong and Huang, Jia-Bin},
  journal={arXiv preprint arXiv:2103.16565},
  year={2021}
}

Learning Representational Invariances for
Data-Efficient Action Recognition

Yuliang Zou

Virginia Tech

Jinwoo Choi*

Kyung Hee University

Qitong Wang

University of Delaware

Jia-Bin Huang

University of Maryland, College Park

* Corresponding author

Learning Representational Invariances for Data-Efficient Action Recognition

Yuliang Zou Virginia Tech

Jinwoo Choi* Kyung Hee University

Qitong Wang University of Delaware