Streaming SIMD Extensions

En informàtica, Streaming SIMD Extensions (SSE) és una extensió SIMD del repertori d'instruccions de l'arquitectura x86, dissenyada per Intel i introduïda el 1999 amb la seva serie de processadors Pentium III en resposta de 3DNow! d'AMD (que va debutar un any abans). SSE conté 70 noves instruccions.

Originariament va ser conegut com KNI, Katmai New Instructions (Katmai va ser el nom amb clau per la primera revisió del nucli Pentium III). Durant el projecte Katmai, Intel va buscar la forma per diferienciar-lo de la seva linea de productes anteriors, en particular, el seu producte insígnia Pentium II. AMD eventualment va afegir suport per a instruccions SSE, començant amb el seu processador Athlon XP. Que va ser posteriorment renombrat ISSE, per Internet Streaming SIMD Extensions, després SSE.

Intel pel general, es va decebre del seu primer esforç d'implementació de SIMD a IA-32, MMX. MMX tenia dos problemes principals: Aquest reutilitzava els registres de punt flotant existents fent la CPU incapaç de treballar amb ambdós punt flotant i les dades SIMD al mateix moment, i solament treballava amb enters.

Registres

SSE originally added eight new 128-bit registers known as XMM0 through XMM7. The x86-64 extensions from AMD (originally called AMD64) and later duplicated by Intel add a further eight registers XMM8 through XMM15. There is also a new 32-bit control/status register, MXCSR. All the 16 128 bit XMM registers are accessible only in 64-bit operating mode.

Each register packs together four 32-bit single-precision floating point numbers or two 64-bit double-precision floating point numbers or four 32-bit integers or eight 16-bit short integers or sixteen 8-bit bytes or characters. The Integer operations have instructions for signed and unsigned variants. Integer SIMD operations may still be performed with the eight 64-bit MMX registers.

Because these 128-bit registers are additional program states that the operating system must preserve across task switches, they are disabled by default until the operating system explicitly enables them. This means that the OS must know how to use the FXSAVE and FXRSTOR instructions, which is the extended pair of instructions which can save all x87 and SSE register states all at once. This support was quickly added to all major IA-32 operating systems.

Because SSE adds floating point support, it sees much more use than MMX. The addition of SSE2's integer support makes SSE even more flexible. While MMX is redundant, operations can be operated in parallel with SSE operations offering further performance increases in some situations.

The first CPU to support SSE, the Pentium III, shared execution resources between SSE and the FPU. While a compiled application can interleave FPU and SSE instructions side-by-side, the Pentium III will not issue an FPU and an SSE instruction in the same clock-cycle. This limitation reduces the effectiveness of pipelining, but the separate XMM registers do allow SIMD and scalar floating point operations to be mixed without the performance hit from explicit MMX/floating point mode switching.

Instruccions SSE

SSE va introduir instruccions en punt flotant: escalars i empaquetades.

Instruccions en punt flotant

Moviment de dades: Memoria-a-Registre / Registre-a-Memoria / Registre-a-Registre
- Escalar – MOVSS
- Empaquetat – MOVAPS, MOVUPS, MOVLPS, MOVHPS, MOVLHPS, MOVHLPS
Aritmètiques
- Escalar – ADDSS, SUBSS, MULSS, DIVSS, RCPSS, SQRTSS, MAXSS, MINSS, RSQRTSS
- Empaquetat – ADDPS, SUBPS, MULPS, DIVPS, RCPPS, SQRTPS, MAXPS, MINPS, RSQRTPS
Comparació
- Escalar – CMPSS, COMISS, UCOMISS
- Empaquetat – CMPPS
Data shuffle and unpacking
- Packed – SHUFPS, UNPCKHPS, UNPCKLPS
Conversió de tipus de dades
- Escalar – CVTSI2SS, CVTSS2SI, CVTTSS2SI
- Empaquetat – CVTPI2PS, CVTPS2PI, CVTTPS2PI
Operacions lòguiques binàries
- Empaquetat – ANDPS, ORPS, XORPS, ANDNPS

Instruccions per a nombres enters

Aritmètiques
- PMULHUW, PSADBW, PAVGB, PAVGW, PMAXUB, PMINUB, PMAXSW, PMINSW
Moviment de dades
- PEXTRW, PINSRW
Altres
- PMOVMSKB, PSHUFW

Altres instruccions

Gestió MXCSR
- LDMXCSR, STMXCSR
Gestió de cache i memoria
- MOVNTQ, MOVNTPS, MASKMOVQ, PREFETCH0, PREFETCH1, PREFETCH2, PREFETCHNTA, SFENCE

Exemple

El següent exemple demostra els avantatges de la utilització de SSE. Considera una operació com la suma de vectors, que és molt utilitzada en aplicacions gràfiques d'ordinadors. Per sumar dos números de precissió simple, juntar vectors de 4 components utilitzant x87 necessita quatre operacions de suma en punt flotant

vec_res.x = v1.x + v2.x;

vec_res.y = v1.y + v2.y;

vec_res.z = v1.z + v2.z;

vec_res.w = v1.w + v2.w;

Aquest podria correspondre a quatre instruccions FADD de x87 FADD en codi odjecte. Per un altre costat, com en el següent preudocodi mostra, una sola instrucció de 128 bits 'packed-add' que pot remplaçar les quatre operacions de suma.

movaps xmm0,address-of-v1          ;xmm0=v1.w | v1.z | v1.y | v1.x 

addps xmm0,address-of-v2           ;xmm0=v1.w+v2.w | v1.z+v2.z | v1.y+v2.y | v1.x+v2.x               
movaps address-of-vec_res,xmm0

Versions posteriors

SSE2, introduced with the Pentium 4, is a major enhancement to SSE (which some programmers renamed "SSE1"). SSE2 adds new math instructions for double-precision (64-bit) floating point and also extends MMX instructions to operate on 128-bit XMM registers. Until SSE4 [see below], SSE integer instructions introduced with later SSE extensions would still operate on 64-bit MMX registers because the new XMM registers require operating system support. SSE2 enables the programmer to perform SIMD math of virtually any type (from 8-bit integer to 64-bit float) entirely with the XMM vector-register file, without the need to touch the (legacy) MMX/FPU registers. Many programmers consider SSE2 to be "everything SSE should have been", as SSE2 offers an orthogonal set of instructions for dealing with common datatypes.

SSE3, also called Prescott New Instructions, is an incremental upgrade to SSE2, adding a handful of DSP-oriented mathematics instructions and some process (thread) management instructions.

SSSE3 is an incremental upgrade to SSE3, adding 16 new opcodes which include permuting the bytes in a word, multiplying 16-bit fixed-point numbers with correct rounding, and within-word accumulate instructions. SSSE3 is often mistaken for SSE4 as this term was used during the development of the Core microarchitecture.

SSE4 is another major enhancement, adding a dot product instruction, lots of additional integer instructions, a popcnt instruction, and more. SSE4 ends MMX register support.^[1]

SSE5 is a new iteration announced by AMD in August 2007.^[2]^[3]

AVX (Advanced Vector Extensions) is an advanced version of SSE announced by Intel featuring a widened data path from 128 bits to 256 bits and 3-operand instructions (up from 2). Products implementing AVX are slated for 2010. [1]

Vegeu també

Referències

↑ Intel® Streaming SIMD Extensions 4 Instruction Set Innovation
↑ «AMD plots single thread boost with x86 extensions». The Register, 30-08-2007 [Consulta: 1r febrer 2008].
↑ http://developer.amd.com/sse5.jsp

[1] Intel® Streaming SIMD Extensions 4 Instruction Set Innovation

[2] «AMD plots single thread boost with x86 extensions». The Register, 30-08-2007 [Consulta: 1r febrer 2008].

[3] ttp://developer.amd.com/sse5.jsp

[1]

[2]

[3]