Neural processing unit

A neural processing unit (NPU), also known as AI accelerator or deep learning processor, is a class of specialized hardware accelerator^[1] or computer system^[2]^[3] designed to accelerate artificial intelligence (AI) and machine learning applications, including artificial neural networks and computer vision.

They can be used either to efficiently execute already trained AI models (inference) or for training AI models. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks.^[4] They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024^[update], a typical AI integrated circuit chip contains tens of billions of MOSFETs.^[5]

AI accelerators are used in mobile devices such as Apple iPhones and Huawei cellphones,^[6] and personal computers such as Intel laptops,^[7] AMD laptops^[8] and Apple silicon Macs.^[9] Accelerators are used in cloud computing servers, including tensor processing units (TPU) in Google Cloud Platform^[10] and Trainium and Inferentia chips in Amazon Web Services.^[11] Many vendor-specific terms exist for devices in this category, and it is an emerging technology without a dominant design.

Graphics processing units designed by companies such as Nvidia and AMD often include AI-specific hardware, and are commonly used as AI accelerators, both for training and inference.^[12]

Neural Processing Units (NPU) are another more native approach. Since 2017, several CPUs and SoCs have on-die NPUs: for example, Intel Meteor Lake, Lunar Lake, and Apple A11.

Nomenclature

As of 2016, the field is still in flux and vendors are pushing their own marketing term for what amounts to an "AI accelerator", in the hope that their designs and APIs will become the dominant design. There is no consensus on the boundary between these devices, nor the exact form they will take; however several examples clearly aim to fill this new space, with a fair amount of overlap in capabilities.

In the past when consumer graphics accelerators emerged, the industry eventually adopted Nvidia's self-assigned term, "the GPU",^[13] as the collective noun for "graphics accelerators", which had taken many forms before settling on an overall pipeline implementing a model presented by Direct3D^{[clarification needed]}.

All models of Intel Meteor Lake processors have a Versatile Processor Unit (VPU) built-in for accelerating inference for computer vision and deep learning.^[14]

Benchmarks

Benchmarks such as MLPerf and others may be used to evaluate the performance of AI accelerators.^[15] Table 2 lists several typical benchmarks for AI accelerators.

Table 2. Benchmarks.
Year	NN Benchmark	Affiliations	# of microbenchmarks	# of component benchmarks	# of application benchmarks
2012	BenchNN	ICT, CAS	N/A	12	N/A
2016	Fathom	Harvard	N/A	8	N/A
2017	BenchIP	ICT, CAS	12	11	N/A
2017	DAWNBench	Stanford	8	N/A	N/A
2017	DeepBench	Baidu	4	N/A	N/A
2018	AI Benchmark	ETH Zurich	N/A	26	N/A
2018	MLPerf	Harvard, Intel, and Google, etc.	N/A	7	N/A
2019	AIBench	ICT, CAS and Alibaba, etc.	12	16	2
2019	NNBench-X	UCSB	N/A	10	N/A

Potential applications

Agricultural robots, for example, herbicide-free weed control.^[16]
Autonomous vehicles: Nvidia has targeted their Drive PX-series boards at this application.^[17]
Computer-aided diagnosis
Industrial robots, increasing the range of tasks that can be automated, by adding adaptability to variable situations.
Machine translation
Military robots
Natural language processing
Search engines, increasing the energy efficiency of data centers and the ability to use increasingly advanced queries.
Unmanned aerial vehicles, e.g. navigation systems, e.g. the Movidius Myriad 2 has been demonstrated successfully guiding autonomous drones.^[18]
Voice user interface, e.g. in mobile phones, a target for Qualcomm Zeroth.^[19]

References

^ "Intel unveils Movidius Compute Stick USB AI Accelerator". July 21, 2017. Archived from the original on August 11, 2017. Retrieved August 11, 2017.
^ "Inspurs unveils GX4 AI Accelerator". June 21, 2017.
^ Wiggers, Kyle (November 6, 2019) [2019], Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors, archived from the original on March 6, 2020, retrieved March 14, 2020
^ "Google Designing AI Processors". May 18, 2016. Google using its own AI accelerators.
^ Moss, Sebastian (March 23, 2022). "Nvidia reveals new Hopper H100 GPU, with 80 billion transistors". Data Center Dynamics. Retrieved January 30, 2024.
^ "HUAWEI Reveals the Future of Mobile AI at IFA".
^ "Intel's Lunar Lake Processors Arriving Q3 2024". Intel. May 20, 2024.
^ "AMD XDNA Architecture".
^ "Deploying Transformers on the Apple Neural Engine". Apple Machine Learning Research. Retrieved August 24, 2023.
^ Jouppi, Norman P.; et al. (June 24, 2017). "In-Datacenter Performance Analysis of a Tensor Processing Unit". ACM SIGARCH Computer Architecture News. 45 (2): 1–12. arXiv:1704.04760. doi:10.1145/3140659.3080246.
^ "How silicon innovation became the 'secret sauce' behind AWS's success". Amazon Science. July 27, 2022. Retrieved July 19, 2024.
^ Patel, Dylan; Nishball, Daniel; Xie, Myron (November 9, 2023). "Nvidia's New China AI Chips Circumvent US Restrictions". SemiAnalysis. Retrieved February 7, 2024.
^ "NVIDIA launches the World's First Graphics Processing Unit, the GeForce 256". Archived from the original on February 27, 2016.
^ "Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips". PCMAG. August 2022.
^ "Nvidia claims 'record performance' for Hopper MLPerf debut".
^ "Development of a machine vision system for weed control using precision chemical application" (PDF). University of Florida. CiteSeerX 10.1.1.7.342. Archived from the original (PDF) on June 23, 2010.
^ "Self-Driving Cars Technology & Solutions from NVIDIA Automotive". NVIDIA.
^ "movidius powers worlds most intelligent drone". March 16, 2016.
^ "Qualcomm Research brings server class machine learning to everyday devices–making them smarter [VIDEO]". October 2015.

External links

Nvidia Puts The Accelerator To The Metal With Pascal.htm, The Next Platform
Eyeriss Project, MIT
https://alphaics.ai/

[1] "Intel unveils Movidius Compute Stick USB AI Accelerator". July 21, 2017. Archived from the original on August 11, 2017. Retrieved August 11, 2017.

[2] "Inspurs unveils GX4 AI Accelerator". June 21, 2017.

[3] Wiggers, Kyle (November 6, 2019) [2019], Neural Magic raises $15 million to boost AI inferencing speed on off-the-shelf processors, archived from the original on March 6, 2020, retrieved March 14, 2020

[4] "Google Designing AI Processors". May 18, 2016. Google using its own AI accelerators.

[5] Moss, Sebastian (March 23, 2022). "Nvidia reveals new Hopper H100 GPU, with 80 billion transistors". Data Center Dynamics. Retrieved January 30, 2024.

[6] "HUAWEI Reveals the Future of Mobile AI at IFA".

[7] "Intel's Lunar Lake Processors Arriving Q3 2024". Intel. May 20, 2024.

[8] "AMD XDNA Architecture".

[9] "Deploying Transformers on the Apple Neural Engine". Apple Machine Learning Research. Retrieved August 24, 2023.

[10] Jouppi, Norman P.; et al. (June 24, 2017). "In-Datacenter Performance Analysis of a Tensor Processing Unit". ACM SIGARCH Computer Architecture News. 45 (2): 1–12. arXiv:1704.04760. doi:10.1145/3140659.3080246.

[11] "How silicon innovation became the 'secret sauce' behind AWS's success". Amazon Science. July 27, 2022. Retrieved July 19, 2024.

[12] Patel, Dylan; Nishball, Daniel; Xie, Myron (November 9, 2023). "Nvidia's New China AI Chips Circumvent US Restrictions". SemiAnalysis. Retrieved February 7, 2024.

[13] "NVIDIA launches the World's First Graphics Processing Unit, the GeForce 256". Archived from the original on February 27, 2016.

[14] "Intel to Bring a 'VPU' Processor Unit to 14th Gen Meteor Lake Chips". PCMAG. August 2022.

[15] "Nvidia claims 'record performance' for Hopper MLPerf debut".

[16] "Development of a machine vision system for weed control using precision chemical application" (PDF). University of Florida. CiteSeerX 10.1.1.7.342. Archived from the original (PDF) on June 23, 2010.

[17] "Self-Driving Cars Technology & Solutions from NVIDIA Automotive". NVIDIA.

[18] "movidius powers worlds most intelligent drone". March 16, 2016.

[19] "Qualcomm Research brings server class machine learning to everyday devices–making them smarter [VIDEO]". October 2015.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

v t e Hardware acceleration
Theory	Universal Turing machine Parallel computing Distributed computing
Applications	GPU GPGPU DirectX Audio Digital signal processing Hardware random number generation Neural processing unit Cryptography TLS Machine vision Custom hardware attack scrypt Networking Data
Implementations	High-level synthesis C to HDL FPGA ASIC CPLD System on a chip Network on a chip
Architectures	Dataflow Transport triggered Multicore Manycore Heterogeneous In-memory computing Systolic array Neuromorphic
Related	Programmable logic Processor design chronology Digital electronics Virtualization Hardware emulation Logic synthesis Embedded systems

Nomenclature

Benchmarks

Potential applications

See also

References

External links