# Spectre Attacks: Exploiting Speculative Execution Paul Kocher [1], Daniel Genkin [2], Daniel Gruss [3], Werner Haas [4], Mike Hamburg [5], Moritz Lipp [3], Stefan Mangard [3], Thomas Prescher [4], Michael Schwarz [3], Yuval Yarom [6] 1 Independent 2 University of Pennsylvania and University of Maryland 3 Graz University of Technology 4 Cyberus Technology 5 Rambus, Cryptography Research Division 6 University of Adelaide and Data61 40th IEEE Symposium on Security and Privacy #### Outline - Attack description - Background - Spectre attack - Spectre variations - Mitigation options - Strengths - Weaknesses - Discussion Spectre exploit gives the attacker the ability to read out the memory of a bug-free victim process - Spectre exploit gives the attacker the ability to read out the memory of a bug-free victim process - Works on Intel, AMD and ARM - Spectre exploit gives the attacker the ability to read out the memory of a bug-free victim process - Works on Intel, AMD and ARM - How? ## Background #### Background: Virtual Memory #### Background: Virtual Memory #### Background: Virtual Memory When two processes cooperate to communicate, not by architecturally defined means but by changing the microarchitectural state in a suitable way - When two processes cooperate to communicate, not by architecturally defined means but by changing the microarchitectural state in a suitable way - Example of state that can be used: - When two processes cooperate to communicate, not by architecturally defined means but by changing the microarchitectural state in a suitable way - Example of state that can be used: - Cache timing - When two processes cooperate to communicate, not by architecturally defined means but by changing the microarchitectural state in a suitable way - Example of state that can be used: - Cache timing - Instruction timing - When two processes cooperate to communicate, not by architecturally defined means but by changing the microarchitectural state in a suitable way - Example of state that can be used: - Cache timing - Instruction timing - ALU contention - When two processes cooperate to communicate, not by architecturally defined means but by changing the microarchitectural state in a suitable way - Example of state that can be used: - Cache timing - Instruction timing - ALU contention - Memory contention Cache Memory Example: cache timing as covert channel - Example: cache timing as covert channel - Sender process has a value it wants to transmit to the receiver process - Example: cache timing as covert channel - Sender process has a value it wants to transmit to the receiver process - Sender changes the cache (loading, evicting) in a value-dependent way - Example: cache timing as covert channel - Sender process has a value it wants to transmit to the receiver process - Sender changes the cache (loading, evicting) in a value-dependent way - Receiver can't see the value in the cache directly but can time the cache and thus infer the value Predicting/Speculating for example: - Predicting/Speculating for example: - Prefetcher (what will be needed in the future) - Predicting/Speculating for example: - Prefetcher (what will be needed in the future) - Branch Predictor (Speculate if direct branch taken or not) - Predicting/Speculating for example: - Prefetcher (what will be needed in the future) - Branch Predictor (Speculate if direct branch taken or not) - Branch Target Buffer/BTB (Speculate what a value will be) - Predicting/Speculating for example: - Prefetcher (what will be needed in the future) - Branch Predictor (Speculate if direct branch taken or not) - Branch Target Buffer/BTB (Speculate what a value will be) - Leads to improved Instruction Level Parallelism - Predicting/Speculating for example: - Prefetcher (what will be needed in the future) - Branch Predictor (Speculate if direct branch taken or not) - Branch Target Buffer/BTB (Speculate what a value will be) - Leads to improved Instruction Level Parallelism - Otherwise CPU would have to sit idle while waiting for results ``` if (slow condition){ //do something } else { //do something } ``` #### Branch predictor ``` if (slow condition){ //do something } else { //do something } ``` ``` if (slow condition){ //do something } else { //do something } ``` ``` if (slow condition) { //do something } else { //do something } Branch predictor ``` ``` if (slow condition) { //do something } else { //do something } Branch predictor ``` Up to 200 instruction ahead - Up to 200 instruction ahead - Revert the result of incorrect execution - Up to 200 instruction ahead - Revert the result of incorrect execution - => No correctness issues? - Up to 200 instruction ahead - Revert the result of incorrect execution - => No correctness issues? - But speculative execution has measurable side effects Locate sequence of instructions which act as covert channel transmitter - Locate sequence of instructions which act as covert channel transmitter - Setup phase: Mistrain CPU into speculatively executing these instructions Second phase: speculatively execute instruction that leak information - Second phase: speculatively execute instruction that leak information - Via syscall/socket/file - Second phase: speculatively execute instruction that leak information - Via syscall/socket/file - Misexecute own code (e.g. sandbox, interpreter, JIT) Final phase: Recover data by retrieving over covert channel - Final phase: Recover data by retrieving over covert channel - Cache - Final phase: Recover data by retrieving over covert channel - Cache - Execution time - Final phase: Recover data by retrieving over covert channel - Cache - Execution time - ALU contention - Final phase: Recover data by retrieving over covert channel - Cache - Execution time - ALU contention - Memory contention - Final phase: Recover data by retrieving over covert channel - Cache - Execution time - ALU contention - Memory contention - Other microarchitectural state # Spectre Variants # Variant 1: Exploiting Conditional Branch Misprediction | Address | Value | |---------|-------| | 000 | 4 | | 001 | 13 | | 010 | 9 | | 011 | 2 - | | 100 | 12 | | 101 | 75 | | 110 | 24 | | 111 | 87 | #### Misprediction We want to find out what a certain byte in the virtual address of the victim is | Address | Value | | |---------|-------|-----| | 000 | 4 | | | 001 | 13 | | | 010 | 9 | | | 011 | 2 | — k | | 100 | 12 | | | 101 | 75 | | | 110 | 24 | | | 111 | 87 | | #### Misprediction - We want to find out what a certain byte in the virtual address of the victim is - Let's call this secret byte k | Address | Value | | |---------|-------|-----| | 000 | 4 | | | 001 | 13 | | | 010 | 9 | | | 011 | 2 | — k | | 100 | 12 | | | 101 | 75 | | | 110 | 24 | | | 111 | 87 | | # Misprediction ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` ## Misprediction ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` Locate a conditional which matches this pattern in the software you want to attack ## Misprediction ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` - Locate a conditional which matches this pattern in the software you want to attack - Setup phase: ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` - Locate a conditional which matches this pattern in the software you want to attack - Setup phase: - call many times with some x < array1\_size to mistrain branch predictor ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` - Locate a conditional which matches this pattern in the software you want to attack - Setup phase: - call many times with some x < array1\_size to mistrain branch predictor - Evict array1\_size and array2, but leave secret byte k in cache ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` ## Misprediction ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` Second phase: Choose x out-of-bounds such that array1[x] resolves to secret byte ## Misprediction ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` **Attacker** Victim | | Tag | Value | Tag | Value | |-------|-----|-------|-----|-------| | Set 0 | | | | | | Set 1 | 01 | 2 | | | | | Address | Value | |----------|---------|-------| | | 000 | 4 | | array1 | 001 | 13 | | | 010 | 9 | | | 011 | 2 k | | | 100 | 12 | | array2 | 101 | 75 | | | 110 | 24 | | <u> </u> | 111 | 87 | ## Misprediction ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` Attacker Victim | | Tag | Value | Tag | Value | |-------|-----|-------|-----|-------| | Set 0 | | | | | | Set 1 | 01 | 2 | | | | | Address | Value | |--------|------------|----------| | array1 | 000 | 4 | | array1 | 001 | 13 | | | 010 | 9 | | | 011 | 2 k | | | 100 | 12 | | 1 | 101 | 75 | | array2 | 110 | 24 | | • | 111 | 87 | | array2 | 101<br>110 | 75<br>24 | ## Misprediction ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` Attacker Victim | | Tag | Value | Tag | Value | |-------|-----|-------|-----|-------| | Set 0 | | | | | | Set 1 | 01 | 2 | | | | | Address | Value | | |----------|---------|-------|--| | | 000 | 4 | | | array1 | 001 | 13 | | | | 010 | 9 | | | | 011 | 2 k | | | | 100 | 12 | | | | 101 | 75 | | | array2 | 110 | 24 | | | <u> </u> | 111 | 87 | | | array2 | 110 | 24 | | ## Misprediction ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` Attacker Victim | | Tag | Value | Tag | Value | |-------|-----|-------|-----|-------| | Set 0 | | | | | | Set 1 | 01 | 2 | | | | | Address | Value | | |----------|---------|-------|--| | | 000 | 4 | | | array1 | 001 | 13 | | | | 010 | 9 | | | | 011 | 2 k | | | | 100 | 12 | | | | 101 | 75 | | | array2 | 110 | 24 | | | <u> </u> | 111 | 87 | | | array2 | 110 | 24 | | ## Misprediction #### Cache miss ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` Attacker Victim | | Tag | Value | Tag | Value | |-------|-----|-------|-----|-------| | Set 0 | | | | | | Set 1 | 01 | 2 | | | | | Address | Value | |----------|---------|-------| | array1 | 000 | 4 | | array1 | 001 | 13 | | | 010 | 9 | | | 011 | 2 k | | | 100 | 12 | | array2 | 101 | 75 | | | 110 | 24 | | <u> </u> | 111 | 87 | | | | | ## Misprediction Cache miss ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` Attacker Victim | | Tag | Value | Tag | Value | |-------|-----|-------|-----|-------| | Set 0 | | | | | | Set 1 | 01 | 2 | | | | | Address | Value | |--------|---------|-------| | rray1 | 000 | 4 | | | 001 | 13 | | | 010 | 9 | | | 011 | 2 k | | | 100 | 12 | | array2 | 101 | 75 | | | 110 | 24 | | | 111 | 87 | | | | | ## Misprediction Cache miss ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` Attacker Victim | | Tag | Value | Tag | Value | |-------|-----|-------|-----|-------| | Set 0 | | | | | | Set 1 | 01 | 2 | | | | | | Address | Value | | | |--------|--------|---------|-------|--|--| | rray1 | rav1 | 000 | 4 | | | | | i ay i | 001 | 13 | | | | | | 010 | 9 | | | | | | 011 | 2 k | | | | | | 100 | 12 | | | | array2 | | 101 | 75 | | | | | rray2 | 110 | 24 | | | | | | 111 | 87 | | | | | | | | | | ## Misprediction Cache miss ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` Attacker Victim | x = 4 | | | | | | Address | Value | |-------|-----|-------|-----|-------|--------|---------|-------| | | | | | | crav1 | 000 | 4 | | | | | | | rray1 | 001 | 13 | | | Tag | Value | Tag | Value | | 010 | 9 | | Set 0 | | | | | | 011 | 2 k | | Set 1 | 01 | 2 | | | | 100 | 12 | | | | | | | | 101 | 75 | | | | | | | array2 | 110 | 24 | | | | | | | | 111 | 87 | ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` ## Misprediction ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` #### Final phase: ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` - Final phase: - If array2 is readable by attacker, load array2[n] for all n, will be fast for n==k ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` - Final phase: - If array2 is readable by attacker, load array2[n] for all n, will be fast for n==k - Otherwise detect eviction ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` - Final phase: - If array2 is readable by attacker, load array2[n] for all n, will be fast for n==k - Otherwise detect eviction - Prime & Probe [1] ``` if (x < array1_size) y = array2[array1[x]];</pre> ``` - Final phase: - If array2 is readable by attacker, load array2[n] for all n, will be fast for n==k - Otherwise detect eviction - Prime & Probe [1] - Call method again with in bounds value of x, if array1[x'] == k, will be fast In C: - In C: - Wrote a victim function which has the pattern - In C: - Wrote a victim function which has the pattern - Attacker function - In C: - Wrote a victim function which has the pattern - Attacker function - Were able to read out address space at 10 kB/second - In C: - Wrote a victim function which has the pattern - Attacker function - Were able to read out address space at 10 kB/second - In JS: - In C: - Wrote a victim function which has the pattern - Attacker function - Were able to read out address space at 10 kB/second - In JS: - JS gets JITed and bounds checks inserted - In C: - Wrote a victim function which has the pattern - Attacker function - Were able to read out address space at 10 kB/second - In JS: - JS gets JITed and bounds checks inserted - Works even though no high res. timer available #### In C: - Wrote a victim function which has the pattern - Attacker function - Were able to read out address space at 10 kB/second - In JS: - JS gets JITed and bounds checks inserted - Works even though no high res. timer available - Able to read out browser's address space Locate gadgets whose execution will leak the chosen memory in the process you want to attack (either source code or in shared library) - Locate gadgets whose execution will leak the chosen memory in the process you want to attack (either source code or in shared library) - Example gadget: - Locate gadgets whose execution will leak the chosen memory in the process you want to attack (either source code or in shared library) - Example gadget: - add R2, [R1] - Locate gadgets whose execution will leak the chosen memory in the process you want to attack (either source code or in shared library) - Example gadget: - add R2, [R1] - mov R3, [R2] Branch Target Buffer jmp //slow computation Branch Target Buffer → jmp //slow computation Branch Target Buffer Start evaluating → jmp //slow computation # Background: Speculative Execution Branch Target Buffer → jmp //slow computation Ask BTB Predicted target address Branch Target Buffer # Background: Speculative Execution Branch Target Buffer Setup phase: - Setup phase: - Train branch target buffer in attacker thread to jump to the chosen gadget's virtual address - Setup phase: - Train branch target buffer in attacker thread to jump to the chosen gadget's virtual address - This works because the BTB is unaware/doesn't care about process ids - Setup phase: - Train branch target buffer in attacker thread to jump to the chosen gadget's virtual address - This works because the BTB is unaware/doesn't care about process ids - Example of indirect branch: - Setup phase: - Train branch target buffer in attacker thread to jump to the chosen gadget's virtual address - This works because the BTB is unaware/doesn't care about process ids - Example of indirect branch: - jmp eax Second phase: - Second phase: - Victim speculatively jumps to gadget, which then leaks information - Second phase: - Victim speculatively jumps to gadget, which then leaks information - Final phase: Recover over covert channel Creates random key, calls sleep, reads from file, calls crypto - Creates random key, calls sleep, reads from file, calls crypto - When compiled with optimization, Sleep() gets made with file data in registers ebx and edi - Creates random key, calls sleep, reads from file, calls crypto - When compiled with optimization, Sleep() gets made with file data in registers ebx and edi - Found in ntdll.dll - Creates random key, calls sleep, reads from file, calls crypto - When compiled with optimization, Sleep() gets made with file data in registers ebx and edi - Found in ntdll.dll - adc edi,dword ptr [ebx+edx+13BE13BDh] - Creates random key, calls sleep, reads from file, calls crypto - When compiled with optimization, Sleep() gets made with file data in registers ebx and edi - Found in ntdll.dll - adc edi,dword ptr [ebx+edx+13BE13BDh] - adc dl,byte ptr [edi] Branch to mistrain found in Sleep() - Branch to mistrain found in Sleep() - "jmp dword ptr ds:[76AE0078h]" - Branch to mistrain found in Sleep() - "jmp dword ptr ds:[76AE0078h]" - Speed: 41 B/s # Other Methods of achieving speculative execution # Other Methods of achieving speculative #### execution Mistraining return instructions [1] # Other Methods of achieving speculative #### execution - Mistraining return instructions [1] - Return from interrupts Evict+Time Evict+Time if (false but mispredicts as true) read array1[R1] read [R2] - Evict+Time - if (false but mispredicts as true) read array1[R1] read [R2] - Instruction Timing Evict+Time ``` if (false but mispredicts as true) read array1[R1] read [R2] ``` Instruction Timing if (false but misbredicts as true) multiply R1, R2 multiply R3, R4 Evict+Time ``` if (false but mispredicts as true) read array1[R1] read [R2] ``` Instruction Timing if (false but mispredicts as true) multiply R1, R2 multiply R3, R4 Contention on the Register File Turn off speculation - Turn off speculation - =>very large performance impact - Turn off speculation - =>very large performance impact - Itanium, Mill architectures not vulnerable - Turn off speculation - =>very large performance impact - Itanium, Mill architectures not vulnerable - Retpoline [1] - Turn off speculation - =>very large performance impact - Itanium, Mill architectures not vulnerable - Retpoline [1] - swaps indirect branches for returns, to avoid using predictions which come from the BTB - Turn off speculation - =>very large performance impact - Itanium, Mill architectures not vulnerable - Retpoline [1] - swaps indirect branches for returns, to avoid using predictions which come from the BTB - Masking, etc. - Turn off speculation - =>very large performance impact - Itanium, Mill architectures not vulnerable - Retpoline [1] - swaps indirect branches for returns, to avoid using predictions which come from the BTB - Masking, etc. - Addressing Spectre Variant 1 (CVE-2017-5753) in Software [2] Halt speculative execution on potentially sensitive execution paths - Halt speculative execution on potentially sensitive execution paths - Not enough only on security-critical code because non-security-critical code in same process. - Halt speculative execution on potentially sensitive execution paths - Not enough only on security-critical code because non-security-critical code in same process. - Compiler can't find automatically [1] - Halt speculative execution on potentially sensitive execution paths - Not enough only on security-critical code because non-security-critical code in same process. - Compiler can't find automatically [1] - Need to recompile (what about legacy software?) - Halt speculative execution on potentially sensitive execution paths - Not enough only on security-critical code because non-security-critical code in same process. - Compiler can't find automatically [1] - Need to recompile (what about legacy software?) - Flush branch prediction state on context switch Countermeasures limited to cache likely insufficient - Countermeasures limited to cache likely insufficient - = > different microarchitectural state can be used to leak information First to exploit speculative execution - First to exploit speculative execution - Developers now need to know about microarchitecture to code non-vulnerable software! # Strengths Hits at fundamentals of modern CPU - Hits at fundamentals of modern CPU - No good mitigations - Hits at fundamentals of modern CPU - No good mitigations - Non-buggy software affected - Hits at fundamentals of modern CPU - No good mitigations - Non-buggy software affected - Can be used by JS - Hits at fundamentals of modern CPU - No good mitigations - Non-buggy software affected - Can be used by JS - And even remotely: - Hits at fundamentals of modern CPU - No good mitigations - Non-buggy software affected - Can be used by JS - And even remotely: - NetSpectre: Read Arbitrary Memory over Network by Michael Schwarz, Martin Schwarzl, Moritz Lipp, Daniel Gruss - Hits at fundamentals of modern CPU - No good mitigations - Non-buggy software affected - Can be used by JS - And even remotely: - NetSpectre: Read Arbitrary Memory over Network by Michael Schwarz, Martin Schwarzl, Moritz Lipp, Daniel Gruss - Very well written paper - Hits at fundamentals of modern CPU - No good mitigations - Non-buggy software affected - Can be used by JS - And even remotely: - NetSpectre: Read Arbitrary Memory over Network by Michael Schwarz, Martin Schwarzl, Moritz Lipp, Daniel Gruss - Very well written paper - Gives a refresher on virtual memory, caches, CPU architecture ## Weaknesses Have to target specific application and find gadgets in those - Have to target specific application and find gadgets in those - => But one can search in shared libraries - Have to target specific application and find gadgets in those - = > But one can search in shared libraries - Doesn't tell us the speed of the Javascript implementation MeltdownPrime and SpectrePrime: Automatically-Synthesized Attacks Exploiting Invalidation-Based Coherence Protocols by Caroline Trippel, Daniel Lustig, Margaret Martonosi - MeltdownPrime and SpectrePrime: Automatically-Synthesized Attacks Exploiting Invalidation-Based Coherence Protocols by Caroline Trippel, Daniel Lustig, Margaret Martonosi - Trusted Browsers for Uncertain Times by David Kohlbrenner and Hovav Shacham - MeltdownPrime and SpectrePrime: Automatically-Synthesized Attacks Exploiting Invalidation-Based Coherence Protocols by Caroline Trippel, Daniel Lustig, Margaret Martonosi - Trusted Browsers for Uncertain Times by David Kohlbrenner and Hovav Shacham - Foreshadow: Extracting the Keys to the {Intel SGX} Kingdom with Transient Out-of-Order Execution by Van Bulck, Jo and Minkin, Marina and Weisse, Ofir and Genkin, Daniel and Kasikci, Baris and Piessens, Frank and Silberstein, Mark and Wenisch, Thomas F. and Yarom, Yuval and Strackx, Raoul - MeltdownPrime and SpectrePrime: Automatically-Synthesized Attacks Exploiting Invalidation-Based Coherence Protocols by Caroline Trippel, Daniel Lustig, Margaret Martonosi - Trusted Browsers for Uncertain Times by David Kohlbrenner and Hovav Shacham - Foreshadow: Extracting the Keys to the {Intel SGX} Kingdom with Transient Out-of-Order Execution by Van Bulck, Jo and Minkin, Marina and Weisse, Ofir and Genkin, Daniel and Kasikci, Baris and Piessens, Frank and Silberstein, Mark and Wenisch, Thomas F. and Yarom, Yuval and Strackx, Raoul - A Systematic Evaluation of Transient Execution Attacks and Defenses by Claudio Canella, Jo Van Bulck, Michael Schwarz, Moritz Lipp, Benjamin von Berg, Philipp Ortner, Frank Piessens, Dmitry Evtyushkin, Daniel Gruss 100 #### Discussion Starters #### Discussion Starters Is there a fundamental tradeoff between security and speed? #### Discussion Starters - Is there a fundamental tradeoff between security and speed? - Can Spectre be fixed in hardware?