7 Best Practices for Image Annotation

This is a guest article by tech writer Melanie Johnson

No matter how big or small your machine learning (ML) project might be, the overall output depends on the quality of data used to train the ML models. Data annotation plays a pivotal role in the process. And as we know it, it’s the process of marking machine-recognizable content using computer vision, or through natural language processing (NLP) in different formats, including texts, images, and videos.

Now, the primary function of data labeling is tagging objects on raw data to help the ML model make accurate predictions and estimations. That said, data annotation is key in training ML models if you want to achieve high-quality outputs. If the data is accurately trained, it won’t matter whether you deploy the model in speech recognition or chatbots, you will get the best results imaginable.

In this article, we are going to look at some of the best practices to use while annotating images for a computer vision project. This blog entry is particularly helpful to anyone who wants to

understand the different image annotation types,
learn about the challenges that data labelers encounter during image annotation for ML,
know some of the best practices to use while annotating images for ML, including the pros and cons for each, and
know what the future holds for data annotation as an industry.

So, if you have a ML project in mind or underway, you’ve come to the right place to get some profound insights and essentially everything you need to be well-versed in image annotation.

Explaining Data Annotation for ML

Data annotation follows a meticulous process of adding metadata to a dataset. This metadata is always in the form of tags, which then can be added to various data types like text, images, and video. The overarching idea when developing a training dataset for ML is to add comprehensive and consistent tags. Data scientists, in this case, understand the fundamental significance of using clean, annotated data to train ML models because it is the only way ML models recognize recurring patterns in annotated data. With enough processing of annotated data, an ML algorithm is able to recognize repetitive patterns when fed labeled data.

Therefore, it is the data annotator’s task to teach the ML model to interpret its environment, basically showing the ML model what output to predict. In other words, all necessary features within a dataset must be accurately labeled for the ML model to be able to recognize on its own and relate to unannotated data from a real-world environment.

Image annotation is central to Artificial Intelligence (AI) development in creating training data for ML. Objects in images are recognizable to machines through annotated images as training data, increasing the accuracy level of predictions. Before going deep into some of the difficulties data annotators face routinely during the image annotation process, it is important to know the various types of image annotation out there.

Bounding boxes. This is the commonly used image annotation type in computer vision. Bounding boxes appear in rectangular form and are then used to define the target object’s location. This annotation type has particular use in object detection and localization.

The example of using bounding boxes

Polygonal segmentation. Just like bounding boxes, polygons are sometimes used to define the shape and location of target objects in cases where the objects do not appear in rectangular shape. Complex polygons are commonly used in the annotation of images portraying sporting activities where target objects appear in different, complex shapes.

3D cuboids. The only difference between 3D cuboids and bounding boxes is the depth of information about the target object. 3D cuboids offer more, including 3D representation of the object, thus creating for machines distinguishable features such as position and volume in a 3D space.

Semantic segmentation. Semantic segmentation is literally the pixelwise annotation, i.e., using pixels in annotation, each pixel is then assigned to a class and carries a meaning. Examples of classes could be cars, traffic lights, or pedestrians. Semantic segmentation is important in use cases where the environmental context matters. For example, driverless cars rely on such information to understand their environments.

That in mind, image annotation is not an easy process, but demands in-depth knowledge and skills to accurately label data for ML training. The data labeling process, therefore, has its own fair share of challenges that data labelers face.

Challenges in the Image Annotation Process for ML

Below are some of the challenges this AI workforce faces from time to time.

Automated vs. human annotation. The cost of data annotation depends on the method used. The annotation using automated mechanisms promising a certain level of accuracy can be quick and less costly but risks precision in annotation because the degree of accuracy stays unknown until investigated. Human annotation, on the other hand, can take time and is costlier but more accurate.

Guaranteeing high-quality data with consistency. High-quality training data gives the best outputs to any ML model, and that is a challenge in itself. An ML model is only able to make accurate predictions if the quality of the data is good and also consistent. Subjective data, for example, is hard to interpret for data labelers coming from different geographical regions in the world due to differences in culture, beliefs, and even biases – and that can give different answers to recurring tasks.

Choosing the right annotation tool: Producing high-quality training datasets demands a combination of the right data annotation tools and a well-trained workforce. Different types of data are used for data labeling and knowing what factors to consider when picking the right annotation tool is important.

7 Best Practices for Annotating Images for ML

Now we know that only high-quality datasets bring about exceptional model performance. A model’s strong performance is attributed to an accurate and careful data labeling process that has been covered previously in this article. However, it’s important to know that data labelers deploy a few “tactics” that help sharpen the data labeling process for outstanding output. Note that every dataset demands unique labeling instructions for its labels With that in mind as you go through these practices, think of a dataset as an evolving phenomenon.

Use Tight Bounding Boxes

The secret behind using tight boxes around objects of interest is to help the model learn, accurately, which pixels count as relevant and which don’t. However, data labelers should be careful not to keep the boxes too tight to the extent of cutting off a portion of the object. Just make sure the boxes are small enough to fit the whole object.

Tag or Label Occluded Objects

What are occluded objects? Sometimes an object can be partially blocked in an image and kept out of view, constituting an occlusion. If that is the case, ensure the occluded object is fully-labeled as if it were in full view. A common mistake in such cases is drawing bounding boxes on the partially visible part of the object. It is important to note that sometimes the boxes can overlap if there is more than one object of interest that appears occluded (which is okay); so that should not bother you as long as all objects are properly labeled.

Maintain Consistency Across Images

The truth is, almost all objects of interest have some degree of sensitivity when identifying them, and that demands a high level of consistency during the annotation process. For example, the extent of damage to a vehicle body part to call it a “crack” must be uniform across all images.

Tag All Objects of Interest in Each Image

Ever heard of false negatives in ML models? You see, computer vision models are made to learn what patterns of pixels in an image correspond to an object of interest. In this regard, every appearance of an object should be labeled in all images in order to help the model identify the object with precision.

Label Objects of Interests in Their Entirety

One of the most basic and significant best practices when labeling images is ensuring the bounding boxes cover the whole object of interest. A computer vision model can be easily confused by what a full object constitutes if only a portion of an object is labeled. In addition to that, ensure there is completeness; in other words, all objects from all categories in an image should be labeled. Failure to annotate any object in an image hampers the ML model’s learning.

Keep Crystal Clear Labeling Instructions

Since labeling instructions are not cast in stone, they should remain clear and shareable for future model improvements. Your fellow data labelers who might need to add more data to a dataset will rely on the set of clear instructions stacked safely somewhere to create and maintain high-quality datasets.

Use Specific Label Names in Your Images

It’s strongly advised to be exhaustive and specific when giving an object a label name. In fact, it’s better to overly specific than less because it makes the relabeling process easier. If you are building a milk breed cow detector, for example, it is advisable to include a class for Friesian and Jersey even though every object of interest is a milk breed cow. In this case, all labels can be combined to be a milk breed cow if being too specific is an error, which is better than realizing too late that there exists individual milk breed cows and now you have to relabel the entire dataset.

Concluding Thoughts

Today’s innovators have embraced complex ML models with grit because they understand that high-quality data is all that matters. While there exists different types of image annotation, we have learned that the process of labeling images is possessed of a myriad of challenges; which, thankfully, data labelers have move beyond, or at least have learned to overcome some of them. Nonetheless, the elephant in the room has been how to make sure ML models perform at their optimum after the annotation process is complete. It is no secret, to this end, that the seven best practices discussed play a significant role in coming up with high-quality training datasets for ML models.

Melanie Johnson is an AI and computer vision enthusiast with a wealth of experience in technical writing. Passionate about innovation and AI-powered solutions, loves sharing expert insights and educating individuals on tech.

Want to write an article for our blog? Read our requirements and guidelines to become a contributor.

7 Best Practices to Use While Annotating Images