The success of any machine learning project stands or falls with the quality of the data used. At Kognic, we are well-known for our high quality data annotations. To reach our level of quality it is essential to understand the challenges self-driving car companies are facing. In this blogpost, we share insights into what it takes to annotate with high quality, and the questions customers must think about when setting up an annotation project.
The first question you should ask yourself when setting up an annotation project is whether you are annotating the sensor data, or whether you are annotating reality. People often assume that annotating data gives you “ground truth”, but this isn’t necessarily true. What our human annotators see is a snapshot of reality captured through the lens of one or more sensors (such as a camera or a LiDAR). This means that what an annotator sees is only a representation of the real world.
Sensor data is subject to multiple kinds of distortions. These distortions are different in camera sensors and LiDAR sensors, and even vary between different brands of cameras and LiDARs. Some distortions are easy to spot and correct for, but others are a bit harder to understand. This blogpost will go over a few common ones, such as the rolling shutter effect in cameras, and the effect of blooming and multiple visualisations of LiDAR sensors.
One possible distortion to a camera image is the rolling shutter effect. Destin from the Youtube channel 'Smarter every day explains this perfectly in this video. The result is that a car in a camera image becomes wider or smaller depending on both the speed of the shutter in the camera and the speed of the car. Even the most careful annotation of an image will therefore not give an accurate representation of reality, unless you compensate for the rolling shutter effect.
LiDARs suffer from issues similar to that of rolling shutters since LiDARs scan the environment using lasers. Depending on the scan pattern, cars can appear deformed in weird ways. You can see that in the following image, where the van coming towards us is deformed. This makes it harder to estimate the true length of the vehicle.
In the images below we show the view of the camera, and the point cloud of a Luminar LiDAR side by side. The data is a snapshot of the Cirrus dataset. The van on the left is coming towards us and has a high speed difference with our own vehicle. You can see that in the point cloud it looks like the van is longer than it most likely is in the real world. It also has a diagonal outline in both the front and the back. This nicely illustrates how objects in LiDAR space are not a precise representation of objects in the real world.
If you have multiple LiDARs on your car, and the scanned car is traveling, it’s also likely that you see the same car multiple times in different locations in your LiDAR scan, as several LiDARs saw the car at a different place at a different time. This also makes it harder to estimate the true length of a vehicle (since you are actually seeing the same vehicle in multiple locations in the scanned point cloud). You can see this in the data in Audi’s A2D2 dataset, which has data from five LiDARs. There, the car moving towards us is spotted multiple times. In this video, this is especially clear if you focus on the bright points, which are the number plate of the vehicle.
A third effect is due to the way a LiDAR measures distance, where a sensor measures photons to determine how long a light beam took to reach the scanner after firing a beam. One problem is that highly reflective surfaces often seem bigger in LiDAR than they actually are. This is called the blooming effect. Although a LiDAR normally has an error of around 2 centimeters, when the blooming effect occurs the outline of a reflective object is overestimated a bit when looking at the LiDAR data. You can see that in the image below, where a few points to the right of the truck are actually LiDAR noise. An untrained annotator might have included these in the bounding box, overestimating the actual width of the truck.
Now let’s circle back to the question at hand: what are you annotating? Sensor data? Or reality? If you are annotating the sensor data you ask an annotator to draw a bounding box around all the pixels and points that belong to an object. If you are attempting to annotate reality you draw the bounding box at the location where you estimate that the object actually is, which means it’s possible it won’t overlap with all detected LiDAR points or pixels in the image. We actively train our annotators, in our Annotator Academy, to understand what points are likely noise and what should actually be annotated. Because our annotators work only with automotive data the quality of our annotations is better and closer to reality.
This leads to the next question you must ask yourself: how accurate annotations do I want, and what can I expect?
Now that we know that there is a difference between annotating reality and data, we were wondering: how precise can you make these annotations? Unfortunately knowing how precise your annotations are is impossible unless you compare your annotations to a geo-referenced trajectory of another vehicle of which you know the exact dimensions. This is why we came up with an alternative metric: inter-annotator agreement. The idea is that we annotate the same data twice, and look at how much annotators agree with each other. We hope that annotators agree with each other as much as possible, as they have the same information available to them.
This is why we ran an experiment with our trained annotators with instructions similar to what we use in production, where we annotated the same sequence of highway LiDAR-data twice. Each sequence was annotated with different annotators. This means we have two bounding boxes for each object on the road, on which we can calculate the inter-annotator agreement.
What makes this problem difficult is that a LiDAR sensor returns sparse information about the world around it. For example, if you have a Luminar Hydra the default horizontal resolution is 0.07 degrees. This means that at 100 meters away from your sensor you have 12.2 centimeters between consecutive scanned points. This gives you an upper bound on how precise you can determine the width of a vehicle at a large distance. If the measured car is driving in your direction, and you are both driving 30 mps (108 kph) the car has come 6 meters closer to you in the 100 ms it took for you to scan the car. This gives you an upper bound on how precise you can determine the length of a vehicle coming towards you.
In our experiment, we first looked at two parameters which are easy to estimate on the highway: the width and height of a vehicle. Because you see the cars from behind it’s relatively easy to estimate precisely how wide and high a vehicle is. There are some difficulties, especially further away (up to 150 meters in front of the car), but overall we expected annotators to agree on this parameter. We show the results in plot one and plot two, below. We see that there is indeed a large agreement between annotators, and it’s extremely rare that annotators disagree on more than 30 centimeters on the width and height of the car.
We also looked at how much annotators agree on the length of a vehicle. Seeing the vehicles from behind makes it much harder to accurately estimate this value. We plotted the results in plot three, below. As you can see annotators mostly agreed, but outliers with up to 80 cm difference do occur. This is a great insight into the extent of trust we can have in our annotators and their level of alignment on the dimensions of a vehicle.
Now, there are actually annotation tools which go one step further, and try to approximate ground truth better than the traditional bounding boxes as in the examples above. One such tool is the "nyc3dcars-labeler” developed by Kevin James Matzen. It is a web annotation tool for the construction of the NYC3DCars dataset. In the tool, annotators see images which are aligned to the world around them. They place 3D CAD models of cars projected on the ground plane to get an estimation of the location for each object in the 2D image. Placing CAD models is a way to get confidence in the information your annotations contain. However, it is also dangerous to annotate all your data with a 3D CAD model. Using this format will give you confidence that the 3D CAD model actually represents your ground truth, and this assumption could be completely wrong. A good example of this is the paper “3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model”. At first glance the results look fantastic and trustworthy, but after closer examination some of the predictions are actually pointing in the wrong direction and predicting the wrong type of car.
Some results of the paper “3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model”. In the image on the left, you see that the 3D estimate of the van fits perfectly. At first glance the vehicles on the right fit pretty well, but at second glance the white vehicle seems to be closer to the camera than the other two vehicles. In the image on the right, the car on the front-left fits great, while the car on the front-right is rotated 180 degrees. At first glance you don’t spot the 180 degree flip, which gives you a false sense of confidence in the predictions.
This brings us to the final question: “**What type of annotations is Kognic pursuing?**” Our annotation tool supports both annotating sensor data, and annotating reality. We also train our annotators extensively in our own tooling, and the data they are working with. As you have read in the previous blog articles we wrote we can measure the quality of our data effectively, and we have interactive automation to create better and faster annotations. We are focused on the automotive sector, where we are pushing our product every day to improve both annotating the sensor data, as well as the accuracy of annotating towards reality. Whether you should pursue one or the other type of annotation completely depends on your use case of the data.
Now that we discussed the difference between sensor data and reality there is one last thing to discuss: automating annotations. We believe that because there is a difference between sensor data and reality, and the need to distinguish these, that one can never fully automate the creation of annotations if you want to keep quality high and the roads safe. The world is very complex, with new strange situations popping up on a daily basis. The only way we can install trust in our system is by guaranteeing that each annotation has been watched and verified by a human trained in understanding sensor data. And we are able to deliver the necessary quality of data for automotive projects by building the best tooling to make humans as effective as possible.
Hopefully you gained new insights reading this blogpost. Want to set up your own annotation project with us? Please get in touch!