/

September 13, 2022

Picterra pioneers first-of-its-kind data curation technology for geospatial imagery

New industry-first innovation helps users reveal visual patterns in data & build better detectors

Lausanne, Switzerland, September 13, 2022 – Picterra, the leading provider of geospatial machine learning software, today announced powerful new data curation technology that allows users to get a better understanding of their datasets and improve model accuracy. This industry-first innovation gives GIS and data science teams an indication of where – based on image variability – they need to place new training areas in order to achieve maximum model accuracy. This is a powerful tool to speed up and streamline the annotation process and reduce the likelihood of false positives.

This latest technology release builds upon Picterra’s recent market and platform momentum, in which the company announced the closing of a $6.5M investment and introduced powerful collaboration functionality. The company now serves more than 100 enterprises globally, helping leaders from General Motors to The World Bank to innovate operations, improve internal processes, and realize the strategic importance of Earth Observation (EO) data.

Visualizing data is the first step in any machine learning (ML) workflow and can often be challenging to perform when working with large and complex aerial imagery on a global scale.

Dataset recommendation is an industry-first innovation that helps users reveal visual patterns in their data and provide key insights for better and more robust detectors.

Dataset exploration and recommendation are game changers for Picterra users. They are advanced data curation tools that will enable users to effortlessly take the performance of their detectors to the next level.
Julien Rebetez
Julien Rebetez
CTO, Picterra

Accessible alongside the training report, the dataset recommendation report allows a quick assessment of the training coverage and identifies areas where the user should concentrate on future iterations.

  • Quickly assess training areas coverage & their distribution – ensure that annotation efforts focus on the dataset’s most impactful & representative images/regions.
  • Identify unrepresented parts of the dataset & recommendation in training areas distribution to efficiently determine where to focus future iterations & re-training of the detector.
  • Improve dataset quality by making sure the data covers the variety of contexts in which the objects of interest appear.
  • Ensure the validation set covers the variety of the dataset & the validation score is more representative of how well the model will perform in production on new data.

The features are based on unsupervised learning and clustering techniques and allow a user to evaluate the distribution of their dataset. This is important because it allows users to spot “annotation gaps” in their datasets.

The report divides large imagery into small tiles before grouping similar tiles together based on their visual similarity (e.g., forest, water, urban, etc). These tiles are then visualized within the interactive report allowing users to understand which regions are covered by the current training dataset and make adjustments where necessary. Each time a report is created markers are placed within the imagery indicating recommendations for new training areas.

Dataset recommendation can also be used for “data curation” approaches. This is when you have a team of annotators and you need to assign them to images to annotate. By selecting the region to annotate using the dataset recommendation report, you make sure that you distribute the annotation workforce as efficiently as possible because they will annotate regions that maximize the diversity of appearance covered by the dataset. This leads to more robust detectors.

Check out the example below which shows a 10% increase in accuracy score thanks to the new training areas as recommended by the report.

To learn more, view our recent webinars on-demand where we introduced dataset exploration and its evolution into dataset recommendation.

Want to learn more?

sign up to our newsletter