Hybrid GATs for remote sensing analysis
With the stellar growth of the population and in order to learn about future planning and geography, it is crucial to analyze construction and cultivation based on spatial data from the past and present. Surveys of this magnitude require the ability to monitor the areas remotely and efficiently with minimum cost. Fortunately, satellite imagery provides visual information for these vast forests but to monitor the areas of planetary scales using images we need to go through a lot of visual information very quickly which is humanly impossible. For this purpose, we propose a faster and cheaper way to analyze the spatio-temporal information in satellite images at a large scale to successfully monitor the land-use transition of wide areas like cities and forests.
In this thesis, we propose three approaches based on Graph Attention Network (GAT) to take advantage of its self-attention mechanisms which allow it to learn the relative importance of the irregular neighbors. GATs have a very small memory footprint and are much faster than 3D convolution models as purposed by Bhimra et al. So we used GATs to classify the satellite images to recognize if a specific patch of the land has undergone destruction, cultivation, construction, or de-cultivation. To apply graph neural networks to the images, the images first need to be represented in the form of graphs. We used SLIC to make superpixels and SLICO to make corresponding region adjacency graphs for images. We propose a novel method to represent the series of images as Spatio-temporal graphs called Temporal-RAGs (T-RAG)s. We purposed 3 approaches for transition classification. In the first approach, we trained a GATv1 model on the Asia14 dataset for one of 14 land-use classes. To determine transition, we did inference on the RAG of each geospatial image separately and used those inferences to label the transition based on a simple voting policy. Secondly, we fed the T-RAGs to the GATv1 directly and changed the number of outputs units of the MLP to 4 instead of 14. Finally, we modified the GATv1 to give separate readouts for each of the RAGs inside a T-RAG and concatenated the readouts to can a 3x size embedding for T-RAG. We also discuss the qualitative and quantitative analysis of our approaches and draw the comparison of computation and performance with previous models.
Dr. Murtaza Taj (advisor)
Dr. Agha Ali Raza