Domain-specific architecture
Cite error: There are <ref>
tags on this page without content in them (see the help page).This sandbox is in the article namespace. Either move this page into your userspace, or remove the {{User sandbox}} template.
A Domain-specific architecture is a programmable computer architecture specifically tailored to operate very efficiently within the confines of a given application domain. The term is often used in contrast to general-purpose architectures, such as CPUs, that are designed to operate on any computer program.
History
In conjunction with the semiconductor boom started in the 1960s, computer architects were tasked with finding new ways to exploit the increasingly large number of transistors available. Moore's Law and Dennard Scaling enabled architects to focus on improving the performance of general-purpose microprocessors on general-purpose programs [1][2].
These efforts yielded several technological innovations such as multi-level caches, out-of-order execution, deep instruction pipelines, multithreading and multiprocessing. The impact of these innovations was measured on generalist benchmarks such as SPEC, and architects were not concerned with the internal structure or specific characteristics of these programs[3].
The end of Dennard scaling pushed computer architects to switch from a single, very fast processor to several processor cores. Performance improvement could no longer be achieved by simply increasing the operating frequency of a single core.[4]
The end of Moore's Law shifted the focus away from general purpose architectures, towards more specialized hardware. Although general-purpose CPU will likely have a place in any computer system, heterogeneous systems composed of general-purpose and domain-specific components are the most recent trend for achieving high-performance[5].
While hardware accelerators and ASIC have been used in very specialized application domains since the inception of the semiconductor industry, they generally implement a specific function with very limited flexibility. In contrast, the shift towards domain-specific architectures wants to achieve a better balance of flexibility and specialization.
A notable early example of a domain-specific programmable architecture are GPU. These specialized hardware were developed specifically to operate within the domain of image processing and computer graphics. These programmable processing units found widespread adoption both in gaming consoles and personal computers. With the improvement of the hardware/software stack for both NVIDIA and AMD GPUs, these architectures are being used more and more for the acceleration of embarassingly parallel tasks, even outside of the domain of image processing[6].
Guidelines for DSA Design
John Hennessy and David Patterson outlined five principles for DSA design that lead to a better area efficiency and energy savings. The objective in these types of architecture is often also to reduce the Non-Recurring Engineering (NRE) costs, so that the investment in a specialized solution can be more easily amortized[3].
Minimize Distance over which Data is Moved
A remarkable amount of energy is used in general-purpose memory hierarchies moving data attempting to minimize the latency required to access data. In the case of Domain-Specific Architectures, it is expected that understanding of the application domains by hardware and compiler designers allows for simpler and specializied memory hierarchies, where the data movement is largely handled in software, with tailor-made memories for specific functions within the domain.
Invest Saved Resources into Arithmetic Units or Bigger Memories
Since a remarkable amount of hardware resources can be saved by dropping general-purpose architectural optimizations such as out-of-order execution, prefetching, address coalescing and speculation, the resources saved should be re-invested to maximally exploit the availalbe parallelism, for example by adding more arithmetic units, or solve any memory bandwidth issues by adding bigger memories.
See also
References
- ^ Moore, G.E. (1998-01). "Cramming More Components Onto Integrated Circuits". Proceedings of the IEEE. 86 (1): 82–85. doi:10.1109/jproc.1998.658762. ISSN 0018-9219.
{{cite journal}}
: Check date values in:|date=
(help) - ^ Dennard, R.H.; Gaensslen, F.H.; Yu, Hwa-Nien; Rideout, V.L.; Bassous, E.; LeBlanc, A.R. (1974-10). "Design of ion-implanted MOSFET's with very small physical dimensions". IEEE Journal of Solid-State Circuits. 9 (5): 256–268. doi:10.1109/jssc.1974.1050511. ISSN 0018-9200.
{{cite journal}}
: Check date values in:|date=
(help) - ^ a b Hennessy, John L.; Patterson, David A. (2019). Computer architecture: a quantitative approach. Krste Asanović (Sixth edition ed.). Cambridge, Mass: Morgan Kaufmann Publishers, an imprint of Elsevier. p. 540. ISBN 978-0-12-811905-1.
{{cite book}}
:|edition=
has extra text (help) - ^ Schauer, Bryan. "Multicore Processors – A Necessity" (PDF). Archived from the original (PDF) on 2011-11-25. Retrieved 2023-07-06.
- ^ Gajendra, Sharma; Prashant, Poudel (2022-11-24). "Current trends in heterogeneous systems: A review". Trends in Computer Science and Information Technology. 7 (3): 086–090. doi:10.17352/tcsit.000055. ISSN 2641-3086.
- ^ "NVIDIA Accelerated Applications". NVIDIA. Retrieved 2023-07-06.