Labeling best practices

Labeling best practices

In order to get the highest accuracy out of your AI model, it is crucially important to ensure that tiles marked completed are created with the highest quality. Even just a few bad labels can dramatically affect the performance of a model.
It is much, much better to label less data more precisely, then to poorly label a greater amount of data.


One quick sanity test for whether or not an AI analysis will be able to analyze your image is whether or not the objects are identifiable by the human eye. If they are not possible to distinguish, it is likely very difficult to create a high quality set of labels. This will make training an accurate AI model very difficult.

Label all objects of the same class in all finalized tiles

This doesn't mean labeling all classes of objects present in your dataset. If you have different types of objects in your images (i.e. glomeruli, tubules, single cells, cancer regions, etc in renal tissue slides), you can pick one class and complete the labels for that one class.
In fact, we suggest starting with a few object classes (i.e. glomeruli and tubules) or one object class for initial AI training, then adding more objects in your subsequent training.
In order to achieve good AI performance, all Fully labeled tiles must have every object of each included class completely labeled. If some images are missing labels, it will result in poor AI performance. If you have half-labeled images, make sure to remove them from the Completed column before proceeding to training.
When training, our optimization algorithms penalize the model for predicting an object where there is no label. If you miss labels on your images, you should expect significantly reduced performance.

Label all objects accurately

When you are annotating objects of interest, you need to ensure that you outline the object consistently and accurately. For instance, if you are annotating glomeruli in histology images, you should outline the glomerulus around the edge without overestimating or underestimating the edges.
Examples of good and bad annotations.
It's important to be consistent with how you create your labels. The model will try to make predictions that are similar to the labels that you create.

Include objects at the edge of the tile or image

When objects of interest are cut off by the edges of an tile and partially visible, it is extremely important not to omit them during annotation.
There are two objects of interest (glomeruli) at the edge of the tile shown in the example below. Despite the fact that they are partially visible, we annotated them properly.

Data best practices

Different magnifications

In microscopy, images with different magnifications have different pixel resolutions and object sizes. If you only want to analyze 20x magnification images, you should only include 20x images.
If instead, you will need to analyze images with different magnifications with one AI model, it is crucial to include images with different magnifications in the ground truth labels.
Example of images with different magnifications. a) 10x magnification b) 40x magnification.
In the above example, the 10x magnification image covers a much larger tissue area, but does not allow you to see cell-level information, such as nuclei. It is unlikely that you will get good performance on 10x images, even with a smaller tile size.

Include tiles without any objects or a few objects.

It's important to include tiles that do not have any objects if you expect them to be part of the images during analysis time. In the absence of this, the performance of AI will suffer when it encounters these regions for the first time in actual image analysis.
For example, if you are building an AI model to identify and segment glomeruli, you should include different regions without any glomeruli, or even regions without any tissue.
Example of images with different tissue regions a) Kidney tissue regions with glomeruli b) fat tissue region without any glomeruli or kidney tissue c) region of the image devoid of any tissues or objects
This will help reduce the occurrence of false positives when your model encounters these three types of images during testing and actual image analysis.