

The SpaceNet 5 datasetcontains 2,879 square kilometers of Maxar 30 cm satellite imagery and 8,160 kilometers (~5,070 miles) of labeled roads.

Speed limit is visualized in the example image above (red = 65 mph for motorways to green = 25 mph for residential). Over 1,951 km of roads were labeled for Mumbai including attributes for road type, surface, speed limits, and number of lanes. This tool allows us to divide mapping workload across the labeling team and monitor progress along the way. Once the candidate AOIs are set, suitable satellite imagery is gathered from Maxar's 100+ PB library and provisioned to dedicated labelers via Maxar’s collaborative mapping software and task manager tool that Maxar developed based on iD editor. The SpaceNet partners aim to label new geographic areas to encourage the development of algorithms that can be more geographically generalizable. To begin production, the SpaceNet team consider the areas that currently have training data cover and evaluate what new areas of interest (AOIs) to label to expand the geographic diversity of SpaceNet. Depending on the size and complexity, which typically relates to geographic area and number of features, it generally takes 3-6 months to create a training dataset. The labeling team for this dataset consisted of 15 team members split between digitizers and expert analysts. After years of creating training datasets for AI/ML, we generally find that it’s a best practice to limit the number of features (or attributes) being labeled to 3-5 to avoid overload.


This document provides examples of how and how not to label the main features, and addresses any unusual edge cases a labeler might encounter. Team members from Maxar’s AI/ML training datasets labeling team reviewing the SpaceNet 5 roads dataset.Īs a prerequisite to labeling, the SpaceNet production team develops a detailed requirement document and production guide that outlines general requirements, topology rules and the attributes fields to label. Though semi-automated means of dataset creation are being researched, thus far, we’ve found that hand-labeled datasets with dedicated, expert labelers is the most effective approach. It’s worth explaining that the SpaceNet team creates hand-labeled datasets since they are intended to serve as AI/ML training datasets. Given Maxar’s role as part of the team to produce the SpaceNet 5 dataset, we thought this would be a good opportunity to share a glimpse “behind the scenes.” Generating labeled training datasets is one of the toughest aspects of SpaceNet. That said, limitations to using algorithms for fully automated map production remain, such as improving the quality and generalizability of outputs for any area in the world. Over the last three years, SpaceNet has shown how AI/ML computer vision algorithms applied to overhead imagery can automate tasks such as building footprint and road network extraction. Keeping foundational maps up to date, including roads, continues to remain a labor- and cost-intensive aspect of generating map data. Routing is an important use case since it is essential for many humanitarian, civil, military and commercial applications. We are especially excited about the release of the SpaceNet 5 dataset announced August 22, which revisits the challenge of automated road network detection and routing, while adding the complexity of estimating travel time based on distance and road type. SpaceNet is a collaborative initiative between Maxar Technologies, CosmiQ Works, Intel ® AI, Amazon Web Services, and Capella Space. Maxar is excited to continue our support of SpaceNet, a non-profit organization we helped launch in August 2016 that’s dedicated to accelerating open source, artificial intelligence (AI), and machine learning (ML) applied research for geospatial applications.
