Tech News

How neural networks power robots at Starship

Starship is building a fleet of robots to deliver packages locally on demand. To achieve this, robots need to be safe, polite, and fast. But how can this be achieved with low computing resources and without expensive sensors such as LIDARs? This is the technical reality that you need to tackle, unless you live in a world where customers willingly pay $ 100 for a delivery.

To begin with, the robots start by detecting the world with radars, a multitude of cameras and ultrasound.

However, the challenge is that most of this knowledge is low-level and non-semantic. For example, a robot may sense that an object is ten meters away, but without knowing the object category, it is difficult to make safe driving decisions.

Machine learning through neural networks is surprisingly useful in converting this unstructured low-level data into higher-level information.

To understand the surrounding environment in real time, a central component of the robot is an object detection module – a program that captures images and returns a list of boxes of objects.

That’s great, but how do you write such a program?

An image is a large three-dimensional array made up of a myriad of numbers representing the intensities of pixels. These values ​​change considerably when the image is taken at night instead of day; when the color, scale or position of the object changes, or when the object itself is clipped or occluded.

Left – what the human sees. Right – what the computer sees.

In the software of the robot we have a set of trainable units, mainly neural networks, where the code is written by the model itself. The program is represented by a set of weights.

At the beginning, these numbers are initialized randomly and the program output is also random. Engineers showcase sample models of what they would like to predict and ask the network to improve the next time it sees a similar entry. By changing the weights iteratively, the optimization algorithm searches for programs that more and more accurately predict bounding boxes.

Evolution of the programs found by the optimization procedure.

However, one needs to think deeply about the examples used to train the model.

  • Should the model be penalized or rewarded when it detects a car in a reflection of window?
  • What should he do when he detects an image of a human on a poster?
  • Does a car trailer full of cars have to be annotated as one entity or must each of the cars be annotated separately?

These are all examples that occurred during the construction of the object detection module in our robots.

The neural network detects objects in the reflections and on the posters. A bug or a feature?

The team should specifically look for concrete examples and rare cases, otherwise the model would not progress. Starship operates in several different countries and varying weather conditions enrich the set of examples. Many people were surprised when the Starship delivery robots ran during the snowstorm “Emma” UK, yet airports and schools remained closed.

Robots deliver packages in various weather conditions.

At the same time, annotating data takes time and resources. Ideally, it is better to train and improve models with less data. This is where architectural engineering comes in. We encode prior knowledge into architectural and optimization processes to narrow search space to programs that are more likely in the real world.

We integrate the prior knowledge into neural network architectures to obtain better models.

In some computer vision applications, such as segmentation by pixel, it is useful for the model to know if the robot is on a sidewalk or a railway crossing. To provide a clue, we encode global clues at the image level into the architecture of the neural network; the model then determines whether to use it or not without having to learn it from scratch.

Starship wants our shipments to be cheap, which means our materials have to be cheap. This is the same reason Starship does not use LIDAR (a detection system that works on the radar principle, but uses light from a laser) which would make the world easier to understand – but we don’t want that our customers pay more. as needed for delivery.

State-of-the-art object detection systems published in academic papers operate at around 5 frames per second [MaskRCNN], and real-time object detection papers do not report rates significantly above 100 FPS [Light-Head R-CNN, tiny-YOLO, tiny-DSOD]. In addition, these figures are shown on a single image; however, we need a 360 degree understanding (the equivalent of processing about 5 single images).

To provide perspective, Starship models run over 2000 FPS when measured on a consumer GPU and process a full 360 degree panoramic image in a single pass. This equates to 10,000 FPS when processing 5 single images with batch size 1.

Potential problems in the object detection module. How to solve these problems?

Fixing these bugs is a challenge.

Neural networks are seen as black boxes that are difficult to analyze and understand. However, to improve the model, engineers must understand the failure cases and delve into the specifics of what the model has learned.

The model is represented by a set of weights, and one can visualize what each specific neuron is trying to detect. For example, the first layers of the Starship network activate in standard patterns such as horizontal and vertical edges. The next block of layers detects more complex textures, while the upper layers detect car parts and complete objects.

The way the neural network we use in robots helps us understand images better.

There are dozens of components that use the output of the object detection model, each requiring different precision and level of recall defined based on the existing model. However, the new model may act differently in various ways. For example, the exit probability distribution could be biased towards higher values ​​or be wider. Even if the average performance is better, it can be worse for a specific group like big cars. To avoid these obstacles, the team calibrates the probabilities and checks the regressions on several sets of stratified data.

Average performance doesn’t tell you the whole story of the model.

Monitoring trainable software components poses a different set of challenges than monitoring standard software. Few concerns are given regarding inference time or memory usage as they are mostly constant.

However, moving the dataset becomes the primary concern – the distribution of the data used to train the model is different from where the model is currently deployed.

For example, all of a sudden electric scooters are driving on the sidewalks. If the model did not take this class into account, the model will have difficulty classifying it correctly. The information derived from the object detection module will be in disagreement with other sensory information, which will lead to the request for assistance from human operators and therefore a slowdown in deliveries.

A major concern in hands-on machine learning – training and testing data comes from different distributions.

Starship robots achieve this using inexpensive hardware that poses a lot of engineering challenges, but makes robot shipments a strong reality today. Starship robots make real deliveries seven days a week to cities around the world, and it’s gratifying to see how our technology is continually bringing more convenience to people in their lives.


Source link

Show More

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button