Vision, Machine Learning, MEng, AI • 9th Jun, 16
The assertion of location re-visitations is vital in the creation of mapping algorithms which can perform in a wide variety of environments. We present an evaluation of methodologies available for detecting and closing loops independently of any localisation system using an RGB-D camera. We first investigate pure visual and depth based loop closure before introducing a novel dual-channel method which analyses both data sources in parallel. These methodologies are then tested on numerous datasets of varying visual and physical detail. We show that a method which utilises both channels performs more accurate loop closure detection than those which rely on a single source of information.
For my final year project the University of York i investigated and developed novel solutions for the navigation of environments by autonomous robots with both RGB and Depth sensors (using the Kinect). Specifically i focused on the assertion of location re-visitations, detecting when a system has returned to a previously seen location and adjusting the generated map accordingly, a problem known as Loop Closure detection. The majority of coding was done in C++ and Matlab, with the prior being used for low level image pre-processing such as lens distortion removal and feature detection.
Figure 1 : Loop Closure Detection System Architecture
Several subsystems were needed to explore loop closure with the Kinect, firstly salient and pose invariant features are detected in either rgb images or depth based point clouds. These form a reduced representation of the scene, which is then used to form a vocabulary of descriptors. As an environment is navigated this vocabulary is refined through machine learning to place greater emphasis on distinct features and less on those which frequently occur. During navigation keyframes are also created when the scene is said to have changed sufficiently. This segmentation of a dataset allows for a vast reduction in computational power needed to check for loops.
Using these keyframes, and the vocabulary built whilst navigating an environment, we can check to see if the current location sufficiently matches those seen previously, indicating a loop. To do this linear algebra is used to calculate the shared occurences of visual features between poses, which is then further weighted by word occurence. This provides a measure of the similarity between scenes which is robust to changes in lighting and camera angle, whilst also taking into account how much information each feature provides about location.
Figure 2 : Scene Transformation Calculation
Finally once loops have been detected the change in both visual and depth based sensor data can be used to calculated a transformation in 3D space between loop endpoints. This transformation can then be used to adjust for the discrepency between scene data and close the loop. This adjusts the map to account for the error which builds up over time as a robot navigates and environment. Figure 3 shows the application of an initial visual transformation before this is refined using an iterative adjustment method.
Figure 3 : Scene Alignment using Epipolar geometry and Iterative Refinement
We found that our method worked well in environments with varying levels of lighting and both visual and physical detail. If you're interest a full set of results can be found in the report. This project provided me a broad understanding of the mechanism behind computer vision, and on how to use intelligent machine learning to reduce the search space of otherwise computationally intractible algorithms.