Simultaneous multithreading
Simultaneous multithreading, often referred to as SMT, is a relatively recent technology to improve the performance of superscalar processors.
Normal multithreading operating systems allow multiple processes and threads to utilize the processor one at a time, giving its exclusive ownership to a particular thread for a time slice in the order of milliseconds. Quite often, a process will stall for hundreds of cycles while waiting for some external resource (for example, a RAM load), thus wasting processor time.
A successive improvement is super-threading, where the processor can execute instructions from a different thread each cycle. Thus cycles left unused by a thread can be used by another that is ready to run.
Still, a given thread is almost surely not utilizing all the multiple execution units of a modern processor at the same time. Simultaneous multithreading allows multiple threads to execute different instructions in the same clock cycle, using the execution units that the first thread left spare. This is done without great changes to the basic processor architecture: the main additions needed are the ability to fetch instructions from multiple threads in a cycle, and a larger register file to hold data from multiple threads. The number of concurrent threads can be decided by the chip designers, but practical restrictions on chip complexity usually limit the number to 2, 4 or sometimes 8 concurrent threads.
The first commercial processor using simultaneous multithreading was the Intel Pentium 4, starting from the 3.06 GHz model, and since introduced into a number of their processors. Intel calls the technology hyper-threading, which is a basic two-threads SMT engine. Intel claims up to a 30% speed improvement compared against an otherwise identical, non-SMT Pentium 4. The performance improvement seen is very application dependent, however, and some programs actually slow down slightly when HT is turned on. This is due to the fact that they have only one thread available to fetch from at any given time, and there is an increase in pipeline length due to processor changes necessary to support the SMT.
The DEC Alpha EV8 was to be equipped with an even more powerful (4-threads) SMT engine, but the company owner, Compaq, terminated the project before it could be commercialized. This technology may eventually find its way into Tukwila.
The latest MIPS architecture designs include a two-threads SMT system known as MIPS MT.
The IBM POWER5, announced in May 2004, is a dual-core processor, with each core including a two-threads SMT engine. IBM's implementation is more sophisticated than the previous ones, because it can assign a different priority to the various threads, and the SMT engine can be turned on and off dynamically, to better execute those workloads where a SMT processor would not increase performance.
Sun Microsystems' forthcoming Niagara (~2005) and Rock (~2007) processors are implementations of SPARC focused almost entirely on exploiting SMT and CMP techniques. Sun refers to these combined approaches as "CMT," and the overall concept as "Throughput Computing."
See also:
- Chip-level multiprocessing, a complementary technique
- Thread (computer programming), what is executed