Mario Miskullo is an Assistant Professor in the Department of Electrical and Computer Engineering at George Washington University. Mario is a subgroup leader of the OPEN Lab neuromorphic computing team led by Prof. Dr. Volcker J. Sorger. Mario earned his MS in Electrical and Computer Engineering from Turin Polytechnic while working as a researcher at Harvard / MIT. He defended his doctoral dissertation in optoelectronics at the University of Genoa at the Italian Institute of Technology, while working as a research assistant at the Molecular Foundry at the National Laboratory. Lawrence at Berkeley. His interests extend to science and engineering, including nano-optics and light-matter interactions, metasurfaces, Fourier optics, and photonic neuromorphic computing.
The authors suggest that as a result of this approach, the processing performance of optical data streams can be 2-3 orders of magnitude higher than that of a GPU. The authors also believe that photonic processors can work exceptionally well on peripherals in 5G networks.
โ , (OPEN) . , .
, , - . โ , , PMAC/s . , (PECASE), AFOSR , . - , OSA , OSA , SPIE . - IEEE, OSA SPIE.
In the approach under study, the photon tensor kernel performs matrix multiplications in parallel, thereby improving the speed and efficiency of deep learning. Neural networks learn how to learn how to execute uncontrolled decisions and build a classification of invisible data. Once a neural network is trained to work with data, it can infer to recognize and classify objects, patterns, and find a signature in the data.
The TPU Photonic Processor stores and processes data in parallel using an electro-optical connection that can efficiently read and write optical memory, while the Photonic TPU interacts with other architectures.
โWe found that photonic platforms with built-in optical memory can perform the same operations as tensor processors. At the same time, they consume less energy and are much more productive. They can be used to perform calculations at the speed of light, โsaid Mario Miskullo, one of the developers.
Most neural networks unravel multiple layers of interconnected neurons in order to mimic how the human brain works. An efficient way to represent these networks is a compound function that multiplies matrices and vectors together. This view allows parallel operations to be performed through architectures specialized in vectorized operations such as matrix multiplication.
Source: Article by Mario Miskullo and Volker Sorger.
(a) The Photonic Tensor Core (PTC) consists of 16 fibers, which by their nature and independently perform line-by-line multiplication and point-by-point accumulation.
(b) . WDM, (, -) . J- . , , (MRR), ( ), , , MAC.
The more difficult the task and the higher the requirements for forecast accuracy, the more complex the network becomes. Such networks require large amounts of data to compute and more power to process that data. Modern digital processors suitable for deep learning, such as graphics processing units (GPUs) or tensor processing units (TPUs), are limited in performing complex, high-precision operations due to the power required to do so. And also because of the slow transfer of electronic data between the processor and memory.
The developers and authors of the article have shown that the performance of a TPU can be 2-3 orders of magnitude higher than that of an electric TPU. Photons are ideal for computing networks and node-to-node operations that perform high-bandwidth intelligent tasks at the edge of networks such as 5G. Data signals from surveillance cameras, optical sensors, and other sources may already be in the form of photons.
โPhotonic dedicated processors can save enormous amounts of energy by reducing response and processing times,โ added Miskullo. For the end user, this means that the data is processed much faster in such a case, because most of it is preprocessed, which means that only some of the data can be sent to the cloud or data center.
A new approach for optical and electrical data transmission
This article presents an example of choosing an optical route for performing machine learning tasks. In most neural networks (NN), which expose multiple layers of interconnected neurons / nodes, each neuron and layer, as well as the connections of the network itself, are important for the task in which the network was trained. In the connected layer under consideration, neural networks are highly dependent on mathematical operations of a vector matrix, in which large matrices of input data and weights are multiplied in accordance with the learning process. Complex multilayer deep neural networks require significant bandwidth and low latency to satisfy the operations required to perform large matrix multiplication without sacrificing efficiency and speed.
How do you efficiently multiply these matrices? In general-purpose processors, matrix operations are performed sequentially, requiring constant access to the cache memory, which creates a bottleneck in the von Neumann architecture. Specialized architectures such as GPUs and TPUs help mitigate these bottlenecks by allowing for some powerful machine learning models.
GPUs and TPUs are especially useful over CPUs. But when they are used to train deep neural networks, performing inference for large two-dimensional datasets such as images, they can consume a lot of energy and require a longer computation time (more than tens of milliseconds). Matrix multiplication for less complex inference tasks still suffers from latency issues, mainly due to access restrictions to various memory hierarchies and latency for each instruction in the GPU.
The authors of the article suggest that given this context, it is necessary to study and reinvent the operational paradigms of modern logical computing platforms in which matrix algebra relies on persistent memory access. In this regard, the wave nature of light and the associated inherent operations such as interference and diffraction can play an important role in increasing computational throughput while reducing the power consumption of neuromorphic platforms.
Developers envision that future technologies must perform computational tasks in the domain of their time-varying inputs using their own physical operations. From this point of view, photons are ideal for computations of distributed networks, performing intelligent tasks on big data at the network edge (for example, 5G), where data signals can already exist in the form of photons (for example, a video surveillance camera, optical sensor, etc.) .), thus pre-filtering and intelligently adjusting the amount of data traffic that is allowed to be directed towards data centers and cloud systems.
This is where they break down a new approach using a Photonic Tensor Kernel (PTC) capable of performing multiplication and accumulation of 4x4 matrices with a trained kernel in one step (i.e. not iteratively); in other words, after training, the neural network weights are stored in a 4-bit multilevel photonic memory, directly implemented on the chip, without the need for additional electro-optical circuits or dynamic random access memory (DRAM). Photonic memories have low-loss, phase-change nanophotonic circuits based on G2Sb2Se5 conductors deposited on a planarized waveguide that can be updated by electrothermal switching, thus being able to read fully optically.Electrothermal switching is carried out using tungsten heating electrodes that communicate with a phase change memory (PCM) sensor.
Table. Comparison of the performance of tensor kernels.
Source: Article by Mario Miskullo and Volker Sorger.
The electronically fed Photonic Tensor Core (PTC) (in the left column) provides a 2-8x increase in throughput compared to Nvidia's T4 and A100, and for optical data (such as a camera), the magnification is approximately 60x (area microcircuit is limited to one crystal (~ 800 mm2).
Tests have shown that the performance of photonic chips is two to three times higher than those on the market today. The data processing speed in them can reach two petaflops per second, while they consume about 80 watts of energy, of which 95% will be spent on maintaining the chip, and only 5% on calculations.
The authors of the article emphasize that this work represents the first approach to the implementation of a photon tensor processor that stores data and processes them in parallel. Such a processor can scale the number of multiply-accumulate (MAC) operations by several orders of magnitude, while at the same time significantly reducing power consumption and latency compared to existing hardware accelerators, as well as providing real-time analytics.
Unlike digital electronics, which relies on logic gates, in integrated photonics, multiply-accumulate and many other linear algebraic operations can be performed non-iteratively, taking advantage of the inherent parallelism provided by the electromagnetic nature of light-matter signals. In this respect, integrated photonics is an ideal platform for displaying specific complex operations in hardware.