/

May 3, 2023

Why accuracy areas are vital to building successful ML projects

accuracy areas

With the release of the dataset recommendation feature on the platform, we’d like to dedicate a blog post to discussing the importance of accuracy areas. The dataset recommendation feature recommends locations for users to annotate training areas, but we’ve also recently added accuracy area recommendations. We did this because accuracy areas are equally important as training areas. While training areas are what your detector will actually use to learn from, without accuracy areas, you will not be able to have any kind of quantitative measure of how accurate your detector is.

But first a little reminder on what accuracy areas are and what accuracy itself means.

There are many parallels between training areas and accuracy areas. Just as you need to annotate all objects of interest in your training areas, you must create annotations in your accuracy areas. Accuracy areas act as a scorecard for your detector. Your detector will train ONLY your training areas but then generate detections on the accuracy areas and then compare its results to the scorecard (a.k.a your annotations) within that accuracy area.

Tree holes
You can see an example of one accuracy area here. Note that the score of 91.76% here is not coming from just this one accuracy area but from all the 20 accuracy areas in this detector (as seen in the panel on the right)

And just as you need a set of training areas that covers the breadth of variety in your training imagery, you also need the same for accuracy areas. This boils down to a key point of understanding about detector accuracy: there is no such thing as the inherent accuracy of your detector on its own.

The accuracy of a detector depends on the data you want to run it on. If you want your detector to run on the entirety of the earth and thus you want an accuracy score that reflects how well your detector performs on the entirety of the earth then you will have to create accuracy areas across the entire variety of visual contexts the earth has to offer. There is no way around that, that’s why datasets for things like self-driving cars are massive. These companies are trying to get their cars to be able to drive across any road, and for them, it’s still a real challenge to get full coverage in their datasets. How prevalent are self-driving cars right now? Not very.

The good news is that for the geospatial domain, most users with a specific task or project in mind do not need to detect across the entire world at all, just a specific part of the world. The first step to creating a successful project using deep learning is to define the scope of your project well.

Deep learning can work wonders, but only if you’re asking it the right questions in the first place!
Roger Fong
Roger Fong
Lead ML engineer

Bringing it back to dataset recommendation, an important goal of the feature is to help guide your decision-making on where to put more accuracy areas by focusing on the most effective regions (most diverse) where you should remember to assess your model within your training imagery. That should help make the task of deciding where to annotate a bit less daunting.

In the platform screenshot below we see the new dataset recommendation navigation UI. You can now directly generate and navigate through your recommendations from the side panel in the main training UI and you’ll also note that two types of recommendations are generated. The yellow pins are for training areas and the blue pins are for accuracy areas.

Tree holes
In this example, the snowy forest region was apparently missing from both our existing training area and accuracy set so the system produced recommendations in the same region for both area types.

You will still need to bite the bullet and fully annotate those accuracy areas. but that’s better than relying on some pre-trained model that has no guarantees of accuracy on your target region and has been trained on data you’ve never seen. We at Picterra will try to provide tools to make this annotation process smoother and faster by improving our annotations, introducing an AI magic wand, and improving our area recommendation systems so that you’re doing less redundant annotation work from the model’s point of view.

At the end of the day though 90% of machine learning is about the dataset. Make a good one.

Want to learn more?

sign up to our newsletter