Obstacles you can encounter when working with your model - and what to do about them
Safety is and has always been a cornerstone in safety-critical industries for decades. And the automotive one is no exception! In order for increasingly automated vehicles to become accepted by society, they need to be perceived as safe, reliable and trustworthy by the public. We’re also talking here about certifiability, traceability and transparency of algorithms.
You, who work in the field, know the chimera this is. As of today, technical teams meet many obstacles along the machine learning lifecycle in their way to developing safe perception systems. It is while resolving those daily challenges that one can learn valuable lessons, though. However, is it possible to make this journey a little more simple and seamless? Bearing in mind real magic doesn’t exist (though many magicians will disagree!), the best thing one can do is to accept reality as it is, and work on what is under your control. And that’s where we come. For we might not be wizards (yet), but we can share our knowledge, and there are actually a few things you can have an eye on so that the development process of your safe perception system goes as smoothly as possible.
Don’t see issues! See opportunities to improve your way of working
In the following paragraphs, we will give account on some of the pitfalls you can encounter in the different steps of the machine learning lifecycle, a key component for automated and autonomous driving. Developing a machine learning model ready to be used for perception in a vehicle is an iterative process with a few different stages. It is iterative because, well, it is by nature impossible to tell in advance how a particular model will perform, so you will most likely find yourself going back and forth through the different phases before you reach something which is ready. These phases are the requirements or guidelines phase; data management; model development; model testing and verification; and model deployment phase.
Up next, you will find advice on some of the most common problems that can lead to errors in the model in each of those phases, and that can make a system less safe, together with some recommendations on how your team can minimize these unwanted outcomes. We are presenting the stages as if it was a linear process, but as we said, there is little chance that you will complete a successful machine learning project just by a “waterfall process” where you do each phase only once. Instead, you will have to go back and forth and iterate until you’ve understood your problem and found a good solution.
Requirements, and in particular, annotation guidelines
A house is built on solid foundations. Automated driving, too. These foundations include specifying the so-called operational design domain (ODD), meaning laying out under what circumstances the machine learning model is expected to work. Moreover, it also includes specifying how to measure how well that the model works: you will most likely have to decide later which one of two alternative models to choose, and for making that choice you will need to have a way of putting numbers (that you can compare) on their performance. Depending on the context, these numbers can be called metrics, key performance indicators (KPI) or safety performance indicators (SPI).
Most important for automated and autonomous driving, however, are perhaps the annotation guidelines. Machine learning is “programming by example”, and it is through those examples the machine learning model will learn how to interpret the world. That interpretation is the annotations of your data. It is through the guidelines you specify whether your perception system should recognize cats or whether it should make a distinction between fire trucks and tow trucks. That’s why writing adequate guidelines is so important.
Failing in this phase could, for example, lead to:
- Problems with dataset distribution, due to an incomplete specification of the intended ODD.
- Later not picking the best model because of poorly chosen metrics/KPIs.
- Systematic detection failures because of repeated mistakes in the annotations.
- Unreliable uncertainty estimates because of ambiguous annotations.
How to avoid this? Again, it all starts with spending some effort in thinking through the requirements and writing adequate guidelines for your use case. We have written some articles about that, so we recommend you to read them if you haven’t already.
Data management and data annotation phase
This phase is about going from specifications to a ready-to-use dataset. It’s about collecting and selecting data, as well as annotating it. Or, alternatively, simulating it. Or both. This phase is a lot about making sure your requirements and guidelines are materializing in terms of actual data that you can use for training later on. Of course, it is important to make sure the data does represent the ODD and that its annotations follow the guidelines. But there is also room for additional, more advanced, concepts such as using active learning to annotate only the most informative data (not to waste annotation resources).
Failing in this phase can lead to:
- Again, problem with dataset distribution. It is not sufficient to only specify your ODD properly, but you also need the data to follow the ODD specification.
- Again, downstream problems with detection or uncertainty estimates because of poor annotations that did not follow the guideline.
- Not having enough training data. Indeed, there are many aspects of what is good training data, and just considering the sheer amount of data is not sufficient. But having too few data will never work. The amount does matter as well.
The antidote is to closely monitor your data coverage and annotation quality, and also to make sure you have enough data. Though this is easier said than done. 😅
Model development phase
This phase is at the heart of machine learning. It’s about arriving at the best model, and well, you probably already know this game because this is what each and every course and textbook on machine learning is about. We don’t have much to add. Set aside validation data, start simple, iterate systematically and compare against baselines, and off you go 🚀
Model testing and verification
Does your model live up to all the high expectations? How will it react to fresh input data? And, honestly, how much did you overfit to your validation data 😉? It’s time to find out! Even if you did find the best model with the given situations in the previous phase, you might have acquired a bit of a tunnel vision by focusing solely on improving your KPIs. Now is the time to consider: even if these KPIs were the best I could obtain with the given data in the previous phase, are they really good enough for my application? It makes a lot of sense to also take a broader look at the model performance; there are probably more aspects that you should investigate than what you managed to describe with your KPIs. For example, this common sense test.
If everything goes according to plan, the outcome in this phase should be a verified model, which could be later considered a fit for the intended application based on its predictions, and will help argue your safety case. Chances are, however, that things do not go fully according to plan. Maybe you find out that you need more data, data with a better fit to the ODD, better annotations, or all three of them. This can also be the right time to realize unintended consequences of the annotation guideline. Most probably, you will have to iterate back to an earlier phase.
Failing in this step might give you unpleasant surprises later… So take it carefully and play it safe!
Model deployment phase
If the previous phases played out well, it’s MLOps time with integration and deployment. And monitoring. And, probably, also seizing the opportunity to collect more data and later re-training the model with it. Can something go wrong in this phase? Well:
- Compatibility issues across platforms, both in terms of compute platforms (e.g. different hardware architectures or access to different standard libraries) and sensors (e.g. different camera resolution).
- Real-time performance issues. In the previous phases, you had (almost) all time in the world to run your model and detect obstacles. In the vehicle running with 120 kph, you maybe have less than a second before you crash into the obstacle.
- Skewed data distribution. If 2% of your data had unpaved road and your model made a relatively higher number of mistakes there, you might still have found the overall performance satisfying. But, what if you end up driving 80% of the time on unpaved roads?
- Undetected so-called ODD-exits. Machine learning models themselves are notoriously bad at telling if what they see is something they haven’t seen before.
- The new data added in re-training actually incidentally hurts some performance aspect.
Is there something teams can do to avoid those issues? Ensuring the trained model is deployed on the adequate platform; having mechanisms for knowing when we are operating inside the ODD; and always evaluating performance after re-training before deploying are, of course, some ways to start!
Minimizing all risks and unexpected outcomes makes all effort worthwhile
There’s no doubt that developing machine learning models for safe automated driving is an arduous task that requires time, resources, skilled teams and the right tooling.
As if that were not enough, teams can encounter many pitfalls along the way, becoming continuous iteration a natural part of the whole process. The challenge isn't just building an ML model for your safe perception system, but rather building an integrated ML system and having it operate in production continuously without errors. Add there the iterative nature of the process of developing perception, where a linear development rarely happens and you often need to go back to any of the stages, and then jump again to the one you were before, and you’ll realize the actual challenge of it.
Having said that, all these efforts must be made in order to avoid any hazards that imply risks, for reliability is the only way increasingly automated driving will gain universal acceptance. We all have a responsibility to test, identify issues, solve them, and iterate, until we can argue the safety of a safe perception system and, ultimately, a safe product can be launched.
As a last comment, if you want to keep reading about this topic we are afraid that there is not much written on the machine learning cycle for autonomous and automated driving, but here are some of our favorite resources for these aspects of machine learning in general:
https://developers.google.com/machine-learning/guides/rules-of-ml
http://www.mlebook.com
https://www.mlyearning.org/
https://arxiv.org/abs/2108.02497