buzzword
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
buzzword [2018/05/04 11:38] – [Lecture 18 (03.05 Thu.)] jeremie | buzzword [2019/02/12 16:34] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 682: | Line 682: | ||
* Hybrid branch predictors | * Hybrid branch predictors | ||
- | ===== Lecture 19 (04.05 | + | ===== Lecture 19 (04.05 |
+ | * GPU-based RowHammer | ||
* Out-of-Order Execution | * Out-of-Order Execution | ||
* Single-cycle Microarchitectures | * Single-cycle Microarchitectures | ||
Line 709: | Line 710: | ||
* Intel Pentium M Predictors | * Intel Pentium M Predictors | ||
* Binary classifier | * Binary classifier | ||
- | * GHR | + | * Global History Register (GHR) |
* Perceptron weights | * Perceptron weights | ||
* Bias weight | * Bias weight | ||
Line 717: | Line 718: | ||
* Control Dependences | * Control Dependences | ||
* Branch delay slot | * Branch delay slot | ||
- | * Fine-grained multi-threading | ||
* Predicated execution | * Predicated execution | ||
* Multipath execution | * Multipath execution | ||
Line 733: | Line 733: | ||
* Static Instruction Scheduling | * Static Instruction Scheduling | ||
+ | ===== Lecture 20 (11.05 Fri.) ===== | ||
+ | * Throwhammer: | ||
+ | * SIMD processing | ||
+ | * GPU | ||
+ | * Regular parallelism | ||
+ | * Single Instruction Single Data (SISD) | ||
+ | * Single Instruction Multiple Data (SIMD) | ||
+ | * Multiple Instruction Single Data (MISD) | ||
+ | * Systolic array | ||
+ | * Streaming processor | ||
+ | * Multiple Instruction Multiple Data (MIMD) | ||
+ | * Multiprocessor | ||
+ | * Multithreaded processor | ||
+ | * Data parallelism | ||
+ | * Array processor | ||
+ | * Vector processor | ||
+ | * Very Long Instruction Word (VLIW) | ||
+ | * Vector register | ||
+ | * Vector control register | ||
+ | * Vector length register (VLEN) | ||
+ | * Vector stride register (VSTR) | ||
+ | * Prefetching | ||
+ | * Vector mask register (VMASK) | ||
+ | * Vector functional unit | ||
+ | * CRAY-1 | ||
+ | * Seymour Cray | ||
+ | * Memory interleaving | ||
+ | * Memory banking | ||
+ | * Vector memory system | ||
+ | * Scalar code | ||
+ | * Vectorizable loops | ||
+ | * Vector chaining | ||
+ | * Multi-ported memory | ||
+ | * Vector stripmining | ||
+ | * Gather/ | ||
+ | * Masked vector instructions | ||
+ | |||
+ | ===== Lecture 21 (17.05 Thu.) ===== | ||
+ | * SIMD processing | ||
+ | * GPU | ||
+ | * Flynn’s taxonomy | ||
+ | * Systolic arrays | ||
+ | * Micron' | ||
+ | * VLIW | ||
+ | * Array processor | ||
+ | * Vector processor | ||
+ | * Row/Column major | ||
+ | * Sparse vector | ||
+ | * Gather/ | ||
+ | * Address indirection | ||
+ | * Data parallelism | ||
+ | * Vector register | ||
+ | * Vector instruction | ||
+ | * Vector functional units | ||
+ | * Memory banks | ||
+ | * Vectorizable loop | ||
+ | * Vector Instruction Level Parallelism | ||
+ | * Automatic code vectorization | ||
+ | * SIMD ISA extensions | ||
+ | * Intel Pentium MMX | ||
+ | * Multimedia registers | ||
+ | * Programming model | ||
+ | * Sequential | ||
+ | * Single-Instruction Multiple Data (SIMD) | ||
+ | * Multi-threaded | ||
+ | * Single-Program Multiple Data (SPMD) | ||
+ | * Execution model | ||
+ | * Single-Instruction Multiple Thread (SIMT) | ||
+ | * Warp (wavefront) | ||
+ | * Warp-level FGMT | ||
+ | * Shader core | ||
+ | * Scalar pipeline | ||
+ | * Latency hiding | ||
+ | * Interleave warp execution | ||
+ | * Warp instruction level parallelism | ||
+ | * Warp-based SIMD vs. Traditional SIMD | ||
+ | * Control flow path | ||
+ | * Branch divergence | ||
+ | * SIMD utilization | ||
+ | * Dynamic warp formation | ||
+ | |||
+ | ===== Lecture 22 (18.05 Fri.) ===== | ||
+ | * GPGPU programming | ||
+ | * NVIDIA Volta | ||
+ | * Inherent parallelism | ||
+ | * Data parallelism | ||
+ | * GPU main bottlenecks | ||
+ | * CPU-GPU data transfers | ||
+ | * DRAM memory | ||
+ | * Task offloading | ||
+ | * Serial code (host) | ||
+ | * Parallel code (device) | ||
+ | * Bulk synchronization | ||
+ | * Transparent scalability | ||
+ | * Memory hierarchy | ||
+ | * CUDA programming language | ||
+ | * OpenCL | ||
+ | * Indexing and memory access | ||
+ | * Streaming multiprocessor (SM) | ||
+ | * Streaming processor (SP) | ||
+ | * Memory coalescing | ||
+ | * Shared memory tiling | ||
+ | * Bank conflict | ||
+ | * Padding | ||
+ | * GPU computing | ||
+ | * GPU kernel | ||
+ | * Massively parallel sections | ||
+ | * Shared memory | ||
+ | * Data transfers | ||
+ | * Kernel launch | ||
+ | * Latency hiding | ||
+ | * Occupancy | ||
+ | * Data reuse | ||
+ | * SIMD utilization | ||
+ | * Atomic operations | ||
+ | * Histogram calculation | ||
+ | * CUDA streams | ||
+ | * Asynchronous transfers | ||
+ | * Overlap of communication and computation | ||
+ | |||
+ | ===== Lecture 23a (24.05 Thu.) ===== | ||
+ | * Systolic Arrays | ||
+ | * High concurrency | ||
+ | * Balanced computation and I/O memory bandwidth | ||
+ | * Simple, regular design | ||
+ | * Processing Elements | ||
+ | * Decoupled Access Execute (DAE) | ||
+ | * Image processing | ||
+ | * Convolution | ||
+ | * Convolutional layers | ||
+ | * Convolutional Neural Network | ||
+ | * AlexNet | ||
+ | * ImageNet | ||
+ | * GoogLeNet | ||
+ | * Stream processing | ||
+ | * Pipeline parallelism | ||
+ | * Staged execution | ||
+ | * WARP Computer | ||
+ | * Tensor Processing Unit | ||
+ | * Astronautics ZS-1 | ||
+ | * Loop unrolling | ||
+ | |||
+ | ===== Lecture 23b (24.05 Thu.) ===== | ||
+ | * Memory | ||
+ | * Virtual memory | ||
+ | * Physical memory | ||
+ | * Load/store data | ||
+ | * Random Access Memory (RAM) | ||
+ | * Static RAM (SRAM) | ||
+ | * Dynamic RAM (DRAM) | ||
+ | * Memory array | ||
+ | * Decoder | ||
+ | * Wordline | ||
+ | * Memory bank | ||
+ | * Sense amplifier | ||
+ | |||
+ | ===== Lecture 24 (25.05 Fri.) ===== | ||
+ | * Destructive reads | ||
+ | * Refresh | ||
+ | * Capacitor and logic manufacturing technologies | ||
+ | * DRAM vs SRAM | ||
+ | * Mature and immature memory technologies | ||
+ | * Flash | ||
+ | * Phase Change Memory | ||
+ | * Magnetic RAM | ||
+ | * Resistive RAM | ||
+ | * Memory hierarchy | ||
+ | * Temporal locality | ||
+ | * Spatial locality | ||
+ | * Caching basics | ||
+ | * Caching in a pipelined design | ||
+ | * Hierarchical latency analysis | ||
+ | * Access latency and miss penalty | ||
+ | * Hit-rate, miss-rate | ||
+ | * Prefetching | ||
+ | * Cache line, cache block | ||
+ | * Placement | ||
+ | * Replacement | ||
+ | * Granularity of management | ||
+ | * Write policy | ||
+ | * Separation of instruction and data | ||
+ | * Tag store and data store | ||
+ | * Cache bookkeeping | ||
+ | * Tag - index - byte in block | ||
+ | * Direct mapped cache | ||
+ | * Conflict misses | ||
+ | * Set associativity | ||
+ | * Ways in cache | ||
+ | * Fully associative cache | ||
+ | * Degree of associativity | ||
+ | * Insertion, promotion, and eviction (replacement) | ||
+ | * Replacement policies | ||
+ | * Random | ||
+ | * FIFO | ||
+ | * Least recently used | ||
+ | * Not most recently used | ||
+ | * Least frequently used | ||
+ | * Implementing LRU | ||
+ | * Set thrashing | ||
+ | |||
+ | ===== Lecture 25a (31.05 Thu.) ===== | ||
+ | * Cache Tag | ||
+ | * Tag Store Entry | ||
+ | * Valid Bit | ||
+ | * Dirty Bit | ||
+ | * Replacement Policy Bit | ||
+ | * Write-Back Cache | ||
+ | * Write-Through Cache | ||
+ | * Cache Coherence | ||
+ | * Cache Consistency | ||
+ | * Write Combining | ||
+ | * (No-)Allocate on Write Miss | ||
+ | * First-Level Cache | ||
+ | * Second-Level Cache | ||
+ | * Last-Level Cache | ||
+ | * Sub-blocked (Sectored) Caches | ||
+ | * Instruction Cache | ||
+ | * Data Cache | ||
+ | * Unified Instruction and Data Cache | ||
+ | * Cache Management Policy | ||
+ | * Cache Hit/Miss Rate | ||
+ | * Cache Block Size | ||
+ | * Critical-Word First | ||
+ | * Working Set | ||
+ | * Set Associativity | ||
+ | * Compulsory Cache Miss | ||
+ | * Capacity Cache Miss | ||
+ | * Conflict Cache Miss | ||
+ | * Loop Interchange | ||
+ | * Loop Fusion | ||
+ | * Array Merging | ||
+ | * Shared vs. Private Caches | ||
+ | * Cache Contention | ||
+ | * Performance Isolation | ||
+ | * Quality of Service | ||
+ | * Starvation | ||
+ | * Dynamic Cache Partitioning | ||
+ | |||
+ | ===== Lecture 25b (31.05 Thu.) ===== | ||
+ | * Virtual Memory | ||
+ | * Physical Memory | ||
+ | * Virtual Memory Address | ||
+ | * Physical Memory Address | ||
+ | * Code/Data Relocation | ||
+ | * Memory Isolation | ||
+ | * Memory Protection | ||
+ | * Code/Data Sharing | ||
+ | * Address Indirection | ||
+ | * Virtual Address Translation | ||
+ | * x86 Linear Address | ||
+ | * Virtual Memory Page | ||
+ | * Physical Memory Frame | ||
+ | * Page Size | ||
+ | * Page Table | ||
+ | * Demand Paging | ||
+ | * Page Replacement | ||
+ | * Page Granularity | ||
+ | * Virtual Page Number | ||
+ | * Physical Frame Number | ||
+ | * Page Fault | ||
+ | * Translation Lookaside Buffer (TLB) |
buzzword.1525433921.txt.gz · Last modified: 2019/02/12 16:34 (external edit)