Options
March 1, 2024
Master Thesis
Title
Monocular Depth Estimation for Aerial Vehicles Using Self-Supervised Deep Learning
Abstract
This thesis addresses the problem of single camera based metric depth estimation, specifically for application in low altitude autonomous aerial vehicles. The term low altitude in the context of this thesis, refers to the height ranging from the ground level to 120 metres.
Monocular (single camera) depth estimation is a technically ill-posed problem. This means that the same image can be reconstructed to more than one 3D scene due to the unknown scale of the 3D scene. This is called the Scale Ambiguity issue. Current state-of-the-art solutions use unsupervised/self-supervised deep learning to solve the monocular depth estimation problem, where they address the scale ambiguity issue either by first estimating the scale-ambiguous depths and then using measurements of expensive range sensors to scale them to metric depths, or by using instantaneous velocity measurements from the Inertial Measurement Unit (IMU) in the training loss to directly learn the true scale of the 3D scene.
This thesis adopts and extends the latter method to perform metric monocular depth estimation on aerial imagery, specifically by addressing the challenges of scale recovery and scene-agnostic depth estimation. The scale recovery problem is addressed by integrating velocity information to the depth network with appropriate architecture modifications. The scene-agnostic depth estimation problem is addressed by using multiple sequential images as input to the depth network including appropriate architecture modifications.
The evaluation results presented in this thesis show that it is possible to use multiple sequential images and velocity measurements to perform scene-agnostic metric depth estimation on the challenging aerial scenes. As the sequential images and velocity measurements are already available in the flight controller, the research investigations of this thesis are implemented at no additional financial cost.
Monocular (single camera) depth estimation is a technically ill-posed problem. This means that the same image can be reconstructed to more than one 3D scene due to the unknown scale of the 3D scene. This is called the Scale Ambiguity issue. Current state-of-the-art solutions use unsupervised/self-supervised deep learning to solve the monocular depth estimation problem, where they address the scale ambiguity issue either by first estimating the scale-ambiguous depths and then using measurements of expensive range sensors to scale them to metric depths, or by using instantaneous velocity measurements from the Inertial Measurement Unit (IMU) in the training loss to directly learn the true scale of the 3D scene.
This thesis adopts and extends the latter method to perform metric monocular depth estimation on aerial imagery, specifically by addressing the challenges of scale recovery and scene-agnostic depth estimation. The scale recovery problem is addressed by integrating velocity information to the depth network with appropriate architecture modifications. The scene-agnostic depth estimation problem is addressed by using multiple sequential images as input to the depth network including appropriate architecture modifications.
The evaluation results presented in this thesis show that it is possible to use multiple sequential images and velocity measurements to perform scene-agnostic metric depth estimation on the challenging aerial scenes. As the sequential images and velocity measurements are already available in the flight controller, the research investigations of this thesis are implemented at no additional financial cost.
Thesis Note
Hamburg, TU, Master Thesis, 2024
Author(s)
Advisor(s)