Post Date
May 27 2022

Image Retrieval using Cross-view Matching for Remote Sensing Imagery

Dr. Murtaza Taj
Numan Khurshid
Reference / Filters
Electrical Engineering
Advancements in deep learning techniques beget a paradigm shift in computer vision specially some of its core problems for example content-based image retrieval (CBIR). It has a wide range of applications including scene recognition, digital image repository search, organization of image databases and 3D reconstruction. However, robust and accurate retrieval of images from large remote sensing database still remains an open problem. In particular, challenges in multi-view image retrieval are not only from geometric and photometric variations between the images taken from different views, but also from severe visual overlap between the query image and irrelevant database images. This result is large intra-class variation and inter-class similarity between semantic categories. To address these concerns most recent methods employ supervised deep learning based image representation and are thus dependent on labelled image collections which is very scarce in remote sensing.
This dissertation explores learning unsupervised visual descriptors in combination with deep metric learning (DML) as a replacement to conventional distance measurement for same-view and cross-view image retrieval (CVIR). For this purpose multiple unsupervised visual representations and metric learning techniques are exploited through introduction of novel deep models for both same-view and cross-view retrieval. Moreover, to avoid vanishing gradients and diminishing feature reuse problems inherited in deep models we propose a new residual unit termed as residual-dyad.
Deep unsupervised features usually bear large memory footprints and are prone to the curse of dimensionality. Traditional feature pruning schemes involving aggregation of these learned visual descriptors lead to diminished performance. To resolve this in same-view retrieval, we also propose stacked autoencoder based solution to abbreviate unsupervised features without significantly affecting their discriminative and regenerative characteristics. Results demonstrate that our proposed solution achieves 25 times reduction in feature size with only 0.8 times depletion of retrieval score.
Cross-view image retrieval being introduced for the first time is addressed through development of a 9-class benchmark dataset named as CrossViewRet. We leveraged the idea of cross-modal retrieval to handle cross-view retrieval through unsupervised as well as supervised visual representations. In addition, an adversarial feature learning technique (ADML) has also been proposed. This is adapted with an aim to find a feature space as well as a common semantic space in which samples from street-view images are compared directly to satellite-view images (and vice-versa). For this comparison, a novel deep metric learning based solution has been proposed. Experimental evaluation illustrates the superiority of the proposed methods in the applications of same-view and cross-view image retrieval. We believe that introduction to a novel problem of CVIR task and the developed dataset would also serve as a baseline for future research.