The buzz of the self-driving world this week was a video by Mark Rober, one YouTube’s biggest stars, comparing Tesla’s camera-based Autopilot to a LIDAR equipped car with some obstacles, including, for amusement, a foam wall with a photograph printed on it showing an empty road ahead. In Rober’s video, the Autopilot failed to detect the wall, while LIDAR, as expected did so easily.
There were a number of flaws in Rober’s tests, the the core issue of just how these different ways of sensing work is an important one. Another YouTuber, a Tesla fan named Kyle Paul made a response video, testing the two main version of Tesla “Full Self Driving,” the system that Rober should have tested–nobody cares about Autopilot or expects it to handle such tests well, but FSD has to face a vastly higher bar. In Paul’s test, FSD version 12 on Tesla’s 3rd generation hardware also failed to detect his wall, but a Cybertruck, with FSD version 13 on Tesla’s 4th generation hardware, did stop for it.
Paul tested the right Tesla systems, which is important, but his test also has some flaws. With a much lower budget, his wall isn’t nearly as good as Rober’s. It is noticeably lighter than the real road, sky and terrain, and this changes with the light between his two tests. The wall has several defects where gaps between the photos are obvious. Unforunately, one thing that computer vision systems, even machine-learning ones, focus on is “edges,” and Paul’s wall is full of clear edges, making it much more likely that CV will detect it. Rober’s wall has its faults as well, it also differs in color, and it’s outer edges are edges but it does not have gaps between the stripes.
This suggests that detection of this wall is on the edge, as the old software and hardware failed and the new generation worked properly. It tells us that different walls might be detected by both, or by neither.
Rober did not respond to requests about what version of Tesla Autopilot he tested. Autopilot is a much simpler system than FSD, though it uses the same hardware. (The latest versions of Autopilot are more similar and derived from the FSD software, the older ones are a completely different system.) But Autopilot is at its core a driver-assist system, essentially a fancy adaptive cruise control. There’s lots of things cruise controls don’t see, and they depend on the driver to handle them. Teslas have infamously run right into the broadsides of trucks on Autopilot, but it’s been regularly ruled that is not any sort of defect in Autopilot, which is not designed to be or expected to be perfect at that task. FSD is another story. Today it’s also a driver-assist tool, not expected to be reliable, but Tesla has promised that in June, they will deploy it with nobody in the car, and then it will need to be very reliable (though not perfect) and not miss key obstacles.
FSD includes a “virtual lidar” that attempts to calculate distance from the camera image. Tesla calls it an “occupancy network.” Such tools can work, though they don’t match the very high reliability of LIDAR at that job. Humans also try to discern distance out of our 2-D “cameras” but use the skill of the human mind to do it. Indeed, a photographic wall is aimed at the things such a tool will do wrong. But it also has a good chance of detecting flaws in the wall, like edges. A perfect video wall would probably fool this network, and also could fool a human, but would not fool LIDAR or radar.
Another potential issue is that Paul held up his wall by parking a truck behind it. He made certain he would not crash through that wall, of course. Tesla’s HW4 has a version which includes a new generation radar, known as the Phoenix radar. This is present in new Model X and S, but not in models 3 and Y. There are conflicting reports if it is present in the Cybertruck, so that is certainly not confirmed, but if it were, a radar would of course strongly detect the truck and respond to it. Almost all self-driving cars use not just cameras, but also radar and LIDAR. Tesla models had an older model automotive radar but it was removed, and use of it disabled in earlier cars.
That old automotive radar might not see the truck, big and bright as it is. That’s because these radars are very low resolution and so they aren’t too useful on stopped objects, though they have some ability with them. The Phoenix radar is an imaging radar which would detect it clearly. Tesla should include it in their FSD cars, not to detect walls painted by Coyotes but for more reliable detection of other obstacles and their speed.
The Cybertruck also is alone among Teslas in having an extra forward camera on the front. It is not known how this camera is used by FSD13. It may only be used by the forward collision warning and avoidance system (and for off-roading.) If it is used by FSD, its “stereo” viewpoint would assist in detecting something like this wall. (Though again, it should see it easily with just one camera.) However, at present there is no confirmation that FSD uses that camera.
This wall remains a silly and contrived test. Rober did it for the fun of recreating the Road Runner cartoon scene, not because it’s an important thing for self-driving cars to detect. There is a field of research where people experiment with deliberate tricks to fool cars, and some of the research in that area informs on real dangers, but most of the issues are more academic than real, and this is one of them. There have been real world problems of this class, such as trucks and billboards with photographs painted on them that confused some camera based systems; most teams are aware of this and work to make sure it doesn’t cause problems. There are also adversarial attacks on radar and LIDAR which have been experimented with and published.
The debate about what sensors to use is a real one. Many, including myself, have been critical of Tesla for using only cameras. I first published this article in 2012 on the issues, and in spite of massive improvements in machine learning that have aided the camera systems, most of the issues remain today. Tesla wants to work with the hardware they already ship, which is low cost. Most other teams feel they should use all available tools, including the superhuman abilities of LIDAR, radar and even thermal cameras to make sure the system is as safe as they can make it, and then rely on the extremely likely vast reduction in cost of electronics at scale to make the safe solution cheaper. Tesla feels that any really working solution needs so much AI intelligence that it will be able to do the job with the cheapest sensors, and extra sensors will just make the job harder. Nobody actually is certain about which of these is the answer, though the multi-sensor choice is the overwhelming favorite in the industry today, and the approach used by all teams that have actually made a working robocar, including Waymo which has been on the road with no safety driver for over 6 years.
In the end, it doesn’t matter much whether Tesla sees a wall like this. What matters is what it does with real world objects and situations. For this, we want to know how often the latest Tesla FSD needs a human to take over to prevent a safety event, as well as to avoid it being a bad road citizen (blocking traffic etc.) Tesla should know these statistics very well. It should be looking at all interventions, re-running them in simulator and determining what would have happened if the human driver had not intervened. Tesla has refused to provide this data which frankly is surprising. If they have good numbers on this, we might expect them to shout it from the rooftops, particularly as they need to convince the public it’s safe to deploy FSD unsupervised in a vehicle with nobody on board as Tesla has said it will do in June.
Most other robotaxi projects have situations they can’t handle well, such as snow or extremely heavy rain as seen in the Rober video. At present, they just do not operate in those situations, and work on expanding what they can do so they can operate at more places and times. The results of these tests is neither heartening or scary, for only statistics over a very wide range of situations actually matters.