Using Navigational Information to Learn Visual Representations
Lizhen Zhu, Brad Wyble, James Z. Wang
The Pennsylvania State University, USA
Abstract:
Children learn to build a visual representation of the world from
unsupervised exploration and we hypothesize that a key part of this
learning ability is the use of self-generated navigational information
as a similarity label to drive a learning objective for selfsupervised
learning. The goal of this work is to exploit navigational information
in a visual environment to provide performance in training that
exceeds the state-of-the-art self-supervised training. Here, we show
that using spatial and temporal information in the pretraining stage
of contrastive learning can improve the performance of downstream
classification relative to conventional contrastive learning
approaches that use instance discrimination to discriminate between
two alterations of the same image or two different images. We
designed a pipeline to generate egocentric-vision images from a
photorealistic ray-tracing environment (ThreeDWorld) and record
relevant navigational information for each image. Modifying the
Momentum Contrast (MoCo) model, we introduced spatial and temporal
information to evaluate the similarity of two views in the pretraining
stage instead of instance discrimination. This work reveals the
effectiveness and efficiency of contextual information for improving
representation learning. The work informs our understanding of the
means by which children might learn to see the world without external
supervision.
Full color PDF file (6.8 MB)
Poster and video brief (conference site)
Paper on arxiv.org
Citation:
Lizhen Zhu, Brad Wyble and James Z. Wang, ``Using Navigational
Information to Learn Visual Representations,'' Proceedings of the
International Conference on Computational and Systems Neuroscience
(COSYNE), extended abstract, 2022.
Copyright 2022. Personal use of this material is
permitted. However, permission to reprint/republish this material for
advertising or promotional purposes or for creating new collective
works for resale or redistribution to servers or lists, or to reuse
any copyrighted component of this work in other works, must be
obtained from the authors.
Last Modified:
February 17, 2022
© 2022