With the help of neural networks, object recognition is finding its way more and more into different areas. Neural networks are based on the structure of the human brain. They consist of any number of connected neurons and perform mathematical operations.
The neural network can be trained with data prepared for the network (e.g., marked cars in traffic situations). This changes the parameters of the mathematical operations and the connections between the neurons, thanks to which the neural network can recognize and annotate the objects in any image. This is used in mobility, medicine, logistics, and agriculture. Faster adaptability and scalability make neural networks better suited for many uses than traditional image recognition algorithms.
Just as important as the network architecture itself is the training data used to train the neural network. This training data can be, for example, images showing different traffic situations, including road users such as cars, cyclists, and pedestrians. Each print is also assigned a text file listing the position, size, and object class. The principle of “garbage in, garbage out” (GIGO) applies here: A neural network may be perfectly designed for a specific Application, but it will still not serve its purpose if the training data is unsuitable or insufficient. High-quality training datasets are already freely available online for many objects, mainly everyday objects such as cars, bicycles, and people. These datasets can be downloaded online and fed into the appropriate software for training. However, if new training data is required, such as in logistics, a procedure for the initial generation of training data is needed.
Traditional Types Of Data Annotation
Data annotation is required to generate training data. Certain conditions or representations that a data object must meet are specified. These conditions are set by the nature of the neural network and the training process. Then the annotations for the corresponding images are generated. There are a variety of solutions and procedures for data annotation.
These solutions usually include a range of functions to support users with annotation:
1. Single Data Generation
The possibility of manual annotation of individual images of a video, so-called frames, should be available in every tool. This is used to correct personal data if one of the more intelligent methods does not deliver the desired result. The manual annotation can also be supported by a segmentation method or edge detection, for example, to adjust the bounding box size (limiting object frame). With this method, the image can be divided into similar areas to make the annotation easier for the user. For example, delineated things, such as people, can be highlighted from the background during segmentation.
Some providers offer so-called projects for generating individual data. This project collects objects’ images to create the desired training data. This way, everyone can independently take and annotate pictures of the things in question. This can result in diverse and extensive data sets, making object recognition very robust and usable for many different situations. The disadvantage is that it often takes a long time before a sufficiently large data set is generated.
3. Partially Automated Procedures
The established methods include the propagation and linear interpolation of the annotations. In this case, propagation means copying annotations, which is particularly useful for static objects and camera perspectives. Linear interpolation supports the constant movement of the camera perspective or the object over several frames.
Current tools use intelligent instruments such as object trackers and pre-trained models. An object tracker can track a one-time annotated object across multiple frames, so ideally, it only needs to be annotated once in the first frame. Pre-trained models are already trained with annotated data and can therefore already recognize and annotate the objects to be annotated with a high probability. The prerequisite here is that there is already relevant data and models for the things in question.
4. Data Augmentation
Data augmentation is used to expand existing data sets artificially. This means annotated images are, for example, rotated, mirrored, distorted, or noisy. Depending on the Application, the annotated objects can also be partially covered. This method is beneficial when there is little or no training data.
5. Synthetic Data
Another way to artificially generate training data is synthetic data. Here, the objects annotated are modeled as closely as possible to reality using appropriate tools to create automated and random training data. However, the problem sometimes arises that certain aspects, special features, or specifications of the objects are not considered. As a result, the synthetic data is recognized very well, but errors can occur when identifying accurate data. Similar to data augmentation, artificial data generation makes sense in areas where new training data is difficult to obtain.
These established methods for data annotation are the content of the most common solutions. However, some of these methods can only be used with existing training data.
Challenge With New Objects
However, there are always areas and objects that must be wholly redeveloped and annotated from scratch – and in most cases, in a short time. This includes, for example, new goods in the logistics center. There are no annotated data sets for these objects, so using pre-trained models and data augmentation is not an option at first. New data must be collected on-site using video or photo recordings and then annotated using propagation, linear interpolation, and intelligent object trackers. Crowdsourcing or synthetic data generation is time-consuming and resource-intensive and rarely used.
The Idea Of Mobile Data Annotation
There are first thoughts on mobile data annotation. The data recording and annotation are carried out simultaneously using the mobile device. The smartphones or tablets of the users now have one or even more high-resolution cameras and ever-improving computing power. Objects can thus be image-based, recorded, and annotated on site and at any time.
Object trackers and segmentation algorithms support this. Users can annotate the object displayed on the mobile device’s camera image with a bounding box. The segmentation algorithm adjusts the bounding box accordingly to the distinguishable features on the camera image. In this way, a bounding box that is as precise and clear as possible is created, and the additional drawing of such a box on the touch screen is no longer necessary. In the second step, the object tracker follows the object across frames – i.e., from image to image. This process can also be optimized by regularly re-segmenting and adjusting the bounding box. Many current object trackers already use specific segmentation algorithms not to lose the object and thus improve efficiency.
The object can thus be recorded from different angles and distances, while the bounding box is only defined once at the start of the recording. If an object is recorded from all perspectives, the procedure is repeated for the next thing until a sufficiently large training data set has been collected.
Each annotated image is stored offline on the mobile device and transmitted as soon as there is an internet connection. However, it can also be synchronized directly with online storage for further processing if there is an existing mobile phone connection. With an existing Internet connection, individual, remote annotations or blurred images can be sorted out directly. In addition, the target model can be trained now with the new data during recording and annotation. The model introduced in this way can then be transferred back to the mobile device. The functionality is then tested live and used as a pre-trained model for data annotation.
The efficiency of this method depends on the performance and suitability of the mobile device, the object tracker, and the segmentation algorithm. If one building block is unsuitable for the Application, each image must be considered individually and cannot be analyzed across frames. However, if these are equipped with high performance and suitable suitability, it only needs to be checked afterward that no unexpectedly incorrect annotations have been made. In other words, the bounding box was defined as neither too small nor too large or in the wrong place. As well as the correct marking of the corresponding object.
For example, a tablet is more suitable than a smartphone, thanks to better usability and performance. In addition, multiple object trackers and powerful segmentation algorithms lead to higher efficiency and high-quality annotations.
The finished toolchain should be prepared for changing situations such as lighting, object size and type, and weather. This means that there are several options for different object trackers, segmentation algorithms, and machine learning models to switch on the fly if necessary.
Other fields of Application
However, this form of object recognition can be used in some areas. Still, no training data can be recorded via cameras, for example, in medical imaging procedures such as ultrasound, MRT, or PET-CT. Here, however, the basic concept can be used to annotate healthy and abnormal tissue structures with the help of live recordings. For example, medical staff could mark the tissue during routine examinations to generate data. These can then be used to train a model to support the detection of diseases later.
Training data generation with mobile devices has the potential to quickly and efficiently annotate new data when a new object recognition is to be developed. The age of training data still requires a lot of manual work. Although various partially automated algorithms support this, the degree of automation could increase with the described approach. Due to object trackers with higher performance in the future, segmentation algorithms, and more powerful end devices, this approach could be an alternative to conventional training data generation in certain areas.