Issue 40 Unmanned Systems Technology October/November 2021 ANYbotics ANYmal D l AI systems focus l Aquatic Drones Phoenix 5 l Space vehicles insight l Sky Eye Rapier X-25 l FlyingBasket FB3 l GCS focus l AUVSI Xponential 2021

37 identified to help the central processor decide on the best – and safest – course of action. The perception has been that this requires ever-larger neural networks to handle more complex scenes and more sources of data. This can be accomplished in different ways, for example overlaying radar, Lidar and thermal image data onto a visible light image to provide more layers of information, or identifying ever-smaller objects that might be further away. There is a growing acknowledgement though that there is no single large ‘monolithic’ neural network workload. It is a series of neural network workloads, some executing the same task in parallel, and some pipelined AI systems use multiple neural networks in various ways, breaking down the task into a series of modules. As raw data such as the pixels received by a camera sensor or the points generated by a 3D radar is processed through the AI system, it is often the case that some work is done for each sensor before being combined with data from other sensors. This initial workload, known sometimes as pre-processing, often dominates the total processing requirement, and that is driving the development of more parallel AI hardware. This highlights the way neural networks are being used in other areas of unmanned systems, from signal conditioning of sensor data to the control of swarms of unmanned aircraft and space systems. These are driving different technology implementations, with smaller neural network processors that can be combined, either in a large central chip or distributed around the unmanned system. Signal conditioning networks run on smaller processors with less resources, although they can be combined in a single chip with multiple parallel channels. They are also adding the ability to update the framework and learn from ongoing operations, and to add the ability to use machine learning to monitor the system for security breaches. Neural network implementation Most neural networks for machine learning are implemented using multiply accumulate (MAC) units, which are the basic building blocks of digital signal processors (DSPs). This is different from traditional CPU designs, and uses an approach called SIMD, where a Single Instruction operates on Multiple Data. This is also referred to as vector processing, where an instruction operates on a vector that can be 8, 16, 32 or even 256 bits long. This can carry the data for the neural network calculation, often by multiplying two vectors together. Some implementations of a configurable DSP allow the length of the vector to be arbitrary and therefore optimised for a particular application, such as a particular size of image. However, it may not be suitable for other applications such as data conditioning, so system designers are increasingly using dedicated machine learning accelerators that are optimised for different applications For example, chipsets for small UAVs (under 2 kg) are increasingly integrating a dedicated neural network capability. The accelerator designed into the latest UAV chips provides 8 tera operations/ second (TOPS) for inferencing that reduces the processing time for the sensor data. The accelerator uses additional instructions in the CPU called vector extensions to support SIMD processing of data up to three times faster than a mainstream CPU while consuming one-tenth of the power, as the vector calculations are performed more efficiently. Another approach has been to offer different implementations depending on the machine learning requirement. This can be as simple as a DSP SIMD engine with vector extensions to multiple cores of MAC arrays with their own memory, vector processors and discrete memory engines to move data between the cores without having to use instructions just for moving data around. It can also add dedicated processors for security to monitor the other cores to ensure there is no unexpected access. This multicore approach requires a more sophisticated compiler. With different AI workloads there may be different multiplier instructions for the different layers in the neural network framework. So the neural network compiler will analyse the framework to figure out what is needed in terms of cores and instruction extensions. Such a system can handle object detection with 512 MAC blocks and 4 TOPS of performance. Having the computing resources optimised for AI systems | Focus Unmanned Systems Technology | October/November 2021 A centralised AI engine for driverless cars (Courtesy of AImotive)

RkJQdWJsaXNoZXIy MjI2Mzk4