Generation of training examples using OSM data applied for remote sensed landcover classification
Derivation of training data for landcover classification is an important step in almost all supervised classification tasks. Manual annotation is very time-consuming, so it is useful to simplify and to support this process. We developed a semiautomatic approach to generate training data for landcover classification of color (four-channel aerial image) and elevation information by using freely available GIS data (OSM shapefiles). For landcover classification, we distinguish between low and high vegetation, roads and buildings. Each pixel of the aerial image should be assigned to one of these classes or be marked as unlabeled. For generation of training data, we rely on labeled image regions: In OSM shape-files, buildings or croplands are represented by polygons, thus allowing to assign labels to huge areas of the data. Further classes, such as traffic infrastructure or trees, are mostly denoted by polygonal chains or points. In order to rasterize this data and to increase the number of labeled pixels, a graph-based segmentation algorithm is applied to the image and the OSM data are intersected with these segments. To avoid inaccuracies or false assignments, we check the selected segments using assumptions derived from features such as NDVI and NDSM. With the generated data, we train a Random Forest classifier to assign every segment to one landcover class.