Treffer: Learning Representations of Satellite Images From Metadata Supervision

Title:
Learning Representations of Satellite Images From Metadata Supervision
Contributors:
Preligens [Paris], Apprentissage de modèles à partir de données massives (Thoth), Centre Inria de l'Université Grenoble Alpes, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Jean Kuntzmann (LJK), Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP), Université Grenoble Alpes (UGA)-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes (UGA)-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP), Université Grenoble Alpes (UGA), CIFRE PhD grant with Preligens, 2021 - 2024, European Computer Vision Association, MIAI @ Grenoble Alpes, ANR-19-P3IA-0003,MIAI,MIAI @ Grenoble Alpes(2019)
Source:
ECCV 2024 - 18th European Conference on Computer Vision. :54-71
Publisher Information:
CCSD; Springer, 2024.
Publication Year:
2024
Collection:
collection:UGA
collection:CNRS
collection:INRIA
collection:INPG
collection:INRIA-RHA
collection:INSMI
collection:INRIA_TEST
collection:LJK
collection:LJK_GI
collection:TESTALAIN1
collection:INRIA2
collection:GENCI
collection:LJK-GI-THOTH
collection:INRIA-RENGRE
collection:MIAI
collection:PNRIA
collection:UGA-EPE
collection:ANR
collection:ANR-IA-19
collection:ANR-IA
collection:TEST-UGA
Subject Geographic:
Original Identifier:
HAL: hal-04709749
Document Type:
Konferenz conferenceObject<br />Conference papers
Language:
English
Relation:
info:eu-repo/semantics/altIdentifier/doi/10.1007/978-3-031-73383-3_4
DOI:
10.1007/978-3-031-73383-3_4
Rights:
info:eu-repo/semantics/OpenAccess
URL: http://creativecommons.org/licenses/by/
Accession Number:
edshal.hal.04709749v1
Database:
HAL

Weitere Informationen

ECCV camera-ready version
Self-supervised learning is increasingly applied to Earth observation problems that leverage satellite and other remotely sensed data. Within satellite imagery, metadata such as time and location often hold significant semantic information that improves scene understanding. In this paper, we introduce Satellite Metadata-Image Pretraining (SatMIP), a new approach for harnessing metadata in the pretraining phase through a flexible and unified multimodal learning objective. SatMIP represents metadata as textual captions and aligns images with metadata in a shared embedding space by solving a metadata-image contrastive task. Our model learns a non-trivial image representation that can effectively handle recognition tasks.We further enhance this model by combining image self-supervision and metadata supervision, introducing SatMIPS. As a result,SatMIPS improves over its image-image pretraining baseline, SimCLR, and accelerates convergence. Comparison against four recent contrastive and masked autoencoding-based methods for remote sensing also highlight the efficacy of our approach. Furthermore, our framework enables multimodal classification with metadata to improve the performance of visual features, and yields more robust hierarchical pretraining. Code and pretrained models will be made available at: https://github.com/preligens-lab/satmip.