Uncrewed Systems Technology 047 l Aergility ATLIS l AI focus l Clevon 1 UGV l Geospatial insight l Intergeo 2022 report l AUSA 2022 report I Infinity fuel cell l BeeX A.IKANBILIS l Propellers focus I Phoenix Wings Orca

39 In practice, using a DNN for image analysis breaks down into two areas: training and inference. For supervised learning, a network is trained with samples from an image sensor as a stream of pixels or from a Lidar or radar sensor where the outputs are a point cloud. The key is that these inputs are tagged, highlighting any areas of interest. These areas of interest can be images of pedestrians, other vehicles, animals, road markings or even wind turbine blades. This tagged data is run through the network during the training process, and the weights are adjusted to achieve the same output as the input for the tagged data. That creates a framework of weights that can be used to identify similar images when they are presented, and this is the process of inference. The greater the number of sample images provided, the higher the probability of accurate recognition. These weights can be in a 32-bit floating-point format (FP32), 16-bit (FP16) or 8-bit (FP8). The processes of reducing the resolution of these weights so that they can be useful for an uncrewed system is called quantisation. This reduces the size of the framework, requiring less memory for inference and allowing faster operation. However, it can lead to less accurate results, which for an application such as driverless cars is critical. The tools for working with the frameworks are key for the safety of the end system. Another issue is that the tagging process is not very scalable, as people have to tag the structures in millions of images or point clouds. An alternative approach is to use ‘synthetic data’ generated by a simulation. Synthetic data tools run a simulation of vehicles in an environment to provide the data already tagged for training the neural network. That avoids having to create images for every kind of environment, whether urban, rural or motorway driving for a driverless car or the images a UAV would see from the air for navigation. The tools can also create specific scenarios that might not occur very often, for example if one vehicle obscures another. These rare ‘edge cases’ and ‘corner cases’ can be created by the simulation tools. This training is performed increasingly by supercomputers using custom processor chips or off-the-shelf graphics processor units (GPUs), taking millions of images as their input to build a framework of weights. These make use of the multiply accumulate (MAC) units in GPUs that were previously used for graphics applications. There are also custom chips that increase the number of MACs to boost the training performance, but this requires a different chip and interconnect architecture. Other tools can then take the frameworks and optimise them for the inference engines. The engines have to be optimised for lower power consumption, to run in vehicles, ships or even UAVs. While PCs can run the frameworks, GPUs are better suited to the calculations, and there are custom designs that are optimised for the workloads required by specific frameworks. Another issue is unsupervised learning using the data acquired using the camera, Lidar and radar feeds in a vehicle. While this could be used to improve the quality of the framework by adding more data, it needs some kind of tagging, This could be performed locally, but that takes a lot of processing power. The data can also be sent to the cloud for processing in the supercomputer, but that can take a lot of bandwidth. To avoid this problem, the data can be stored on the vehicle and downloaded at another time, although that is not a scalable solution and is only used during development. There are tools that select suitable frames from the feed and AI | Focus Uncrewed Systems Technology | December/January 2023 This custom processor has been developed for AI framework training in a supercomputer (Courtesy of Tesla)