Back to Technology

x86 Assembly Series Part 11: Floating-Point & x87/SSE

February 6, 2026 Wasil Zafar 35 min read

Master floating-point programming in x86: legacy x87 FPU stack-based operations, modern SSE scalar instructions using XMM registers, IEEE 754 representation, and int/float conversion.

Table of Contents

  1. IEEE 754 Representation
  2. x87 FPU (Legacy)
  3. SSE Scalar Operations
  4. Conversion Instructions
  5. Floating-Point Comparison
  6. x87 vs SSE: When to Use

IEEE 754 Representation

IEEE 754: The standard for floating-point arithmetic. A float consists of sign bit, exponent, and mantissa (fraction). Understanding this format is essential for debugging FP code.

Single Precision (32-bit float)

Format

IEEE 754 Single Precision

| Sign (1 bit) | Exponent (8 bits) | Mantissa (23 bits) |
|     31       |      30-23        |        22-0        |

Value = (-1)^S × 1.M × 2^(E-127)

Examples:
1.0  = 0x3F800000 = 0 01111111 00000000000000000000000
-2.0 = 0xC0000000 = 1 10000000 00000000000000000000000
3.14 ≈ 0x4048F5C3

Double Precision (64-bit double)

Format

IEEE 754 Double Precision

| Sign (1 bit) | Exponent (11 bits) | Mantissa (52 bits) |
|     63       |       62-52        |        51-0        |

Value = (-1)^S × 1.M × 2^(E-1023)

Examples:
1.0  = 0x3FF0000000000000
-2.0 = 0xC000000000000000
3.14159265358979 ≈ 0x400921FB54442D18

Special Values:
+∞  = 0x7FF0000000000000 (exponent all 1s, mantissa 0)
-∞  = 0xFFF0000000000000
NaN = 0x7FF8000000000000 (exponent all 1s, mantissa non-zero)

Precision Comparison

TypeBitsSignificant DigitsRange
Float (single)32~7±1.18×10⁻³⁸ to ±3.4×10³⁸
Double64~15-16±2.23×10⁻³⁰⁸ to ±1.8×10³⁰⁸
x87 Extended80~19±3.65×10⁻⁴⁹³² to ±1.18×10⁴⁹³²

x87 FPU (Legacy)

FPU Stack Model

Stack-Based: x87 uses an 8-register stack (ST0-ST7). Operations typically work on ST0 (top of stack). Results are pushed, consumed operands may be popped.

FPU Operations

section .data
    value1 dq 3.14159
    value2 dq 2.71828
    result dq 0.0

section .text
    finit               ; Initialize FPU
    fld qword [value1]  ; Push 3.14159 onto ST0
    fld qword [value2]  ; Push 2.71828 onto ST0 (3.14 moves to ST1)
    fadd                ; ST0 = ST0 + ST1, pop ST1
    fstp qword [result] ; Pop and store result

SSE Scalar Operations

Modern Approach: SSE scalar operations use XMM registers directly (no stack), are faster, and integrate better with x86-64 calling conventions. Prefer SSE over x87 for new code.

MOVSS & MOVSD

section .data
    fval dd 3.14        ; 32-bit float
    dval dq 3.14159     ; 64-bit double

section .text
    movss xmm0, [fval]  ; Load 32-bit float into XMM0
    movsd xmm1, [dval]  ; Load 64-bit double into XMM1
    
    movss [fval], xmm0  ; Store 32-bit float
    movsd [dval], xmm1  ; Store 64-bit double

Arithmetic: ADDSS, MULSS, etc.

; Scalar single-precision (32-bit)
addss xmm0, xmm1      ; XMM0 = XMM0 + XMM1 (low 32 bits)
subss xmm0, xmm1      ; XMM0 = XMM0 - XMM1
mulss xmm0, xmm1      ; XMM0 = XMM0 * XMM1
divss xmm0, xmm1      ; XMM0 = XMM0 / XMM1
sqrtss xmm0, xmm1     ; XMM0 = sqrt(XMM1)

; Scalar double-precision (64-bit)
addsd xmm0, xmm1      ; Double-precision add
mulsd xmm0, xmm1      ; Double-precision multiply

Conversion Instructions

; Integer to float
cvtsi2ss xmm0, eax    ; Convert int32 to float
cvtsi2sd xmm0, rax    ; Convert int64 to double

; Float to integer (truncate)
cvttss2si eax, xmm0   ; Convert float to int32 (truncate)
cvttsd2si rax, xmm0   ; Convert double to int64 (truncate)

; Float precision conversion
cvtss2sd xmm0, xmm1   ; Float to double
cvtsd2ss xmm0, xmm1   ; Double to float

Floating-Point Comparison

SSE provides comparison instructions that set CPU flags (like integer CMP):

InstructionOperandsSets FlagsNaN Handling
COMISSxmm, xmm/m32ZF, PF, CFRaises #IA exception
COMISDxmm, xmm/m64ZF, PF, CFRaises #IA exception
UCOMISSxmm, xmm/m32ZF, PF, CFQuiet (no exception)
UCOMISDxmm, xmm/m64ZF, PF, CFQuiet (no exception)
section .data
    pi    dq 3.14159
    e     dq 2.71828

section .text
    movsd xmm0, [pi]
    movsd xmm1, [e]
    
    ucomisd xmm0, xmm1    ; Compare pi vs e (quiet NaN handling)
    
    ; Use UNSIGNED condition codes (not JG/JL!):
    ja  .pi_greater       ; Jump if pi > e (Above)
    jb  .pi_less          ; Jump if pi < e (Below)
    je  .equal            ; Jump if pi == e
    jp  .unordered        ; Jump if either is NaN (Parity)

; Checking for NaN:
check_nan:
    ucomisd xmm0, xmm0    ; Compare value with itself
    jp .is_nan            ; NaN != NaN, sets PF=1
    ; Not NaN
.is_nan:
    ; Handle NaN case
Critical: After floating-point compare, use unsigned branch instructions (JA, JB, JAE, JBE) not signed (JG, JL)! The flag encodings are different.

Exercise: Max of Two Doubles

; max_double(xmm0, xmm1) -> xmm0
max_double:
    ucomisd xmm0, xmm1
    ja .done              ; If xmm0 > xmm1, already have max
    movsd xmm0, xmm1      ; Else xmm0 = xmm1
.done:
    ret

; Or use MAXSD instruction:
max_double_v2:
    maxsd xmm0, xmm1      ; xmm0 = max(xmm0, xmm1)
    ret

x87 vs SSE: When to Use

Aspectx87 FPUSSE/SSE2
Register ModelStack (ST0-ST7)Flat (XMM0-XMM15)
Precision80-bit extended32/64-bit only
SIMD SupportNone4 floats or 2 doubles
Calling ConventionVaries, complexClean (XMM0 returns)
Modern UseLegacy code onlyPreferred for new code
TranscendentalsBuilt-in (FSIN, FCOS)None (use libraries)

Decision Guide

  • Use SSE/SSE2: New code, performance-critical, ABI compliance, SIMD potential
  • Use x87: Need 80-bit precision, built-in transcendental functions, legacy code maintenance
  • Use AVX/AVX-512: When you need 256/512-bit vectors (see Part 12)
; Modern approach (SSE2) - Preferred!
add_doubles_sse:
    movsd xmm0, [value1]
    addsd xmm0, [value2]      ; xmm0 = value1 + value2
    movsd [result], xmm0
    ret

; Legacy approach (x87) - Avoid unless necessary
add_doubles_x87:
    fld qword [value1]        ; Push value1 to ST0
    fadd qword [value2]       ; ST0 = ST0 + value2
    fstp qword [result]       ; Pop ST0 to result
    ret
Compiler Default: Modern compilers (GCC, Clang, MSVC) default to SSE for floating-point. x87 is only used when explicitly requested (-mfpmath=387) or for 80-bit long double.