buzzword
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
buzzword [2021/11/05 18:47] – [Lecture 12b (5.11 Fri.)] nadigr | buzzword [2022/01/03 11:06] (current) – loisor | ||
---|---|---|---|
Line 358: | Line 358: | ||
* Banks | * Banks | ||
- | ==== Lecture 7 (21.10 Thu.) ==== | + | ===== Lecture 7 (21.10 Thu.) ===== |
*Data Movement | *Data Movement | ||
*Processing in memory (PIM) | *Processing in memory (PIM) | ||
Line 506: | Line 506: | ||
===== Lecture 11 (4.11 Thu.) ===== | ===== Lecture 11 (4.11 Thu.) ===== | ||
+ | |||
+ | *Maslow’s Hierarchy | ||
+ | *Low latency | ||
+ | *Memory bottleneck | ||
+ | *Data-centric (Memory-centric) architectures | ||
+ | *DRAM | ||
+ | *DDR3 | ||
+ | *3D-Stacked DRAM | ||
+ | *Runahed Execution | ||
+ | *Sense Amplifier | ||
+ | *DRAM cell | ||
+ | *DRAM bank | ||
+ | *DRAM chip | ||
+ | *Tiered Latency DRAM | ||
+ | *Variable Latency DRAM | ||
+ | *CROW (The Copy Row Substrate) | ||
+ | *CLR-DRAM | ||
+ | *SALP | ||
+ | *Global Row-buffer | ||
+ | *Local Row-buffer | ||
+ | *Timing margins | ||
+ | *Process variation | ||
+ | *Worst-case | ||
+ | *Adaptive-latency | ||
+ | *DRAM characterization | ||
+ | *SoftMC | ||
+ | *Restore time | ||
+ | *AL-DRAM | ||
+ | *Latency variation | ||
+ | *Flexible-Latency DRAM | ||
+ | *Solar DRAM | ||
+ | *Physical Unclonable Function (PUF) | ||
+ | *True Random Number Generator | ||
+ | *Refresh Latency | ||
+ | *ChargeCache | ||
+ | *Vampire DRAM | ||
===== Lecture 12a (5.11 Fri.) ===== | ===== Lecture 12a (5.11 Fri.) ===== | ||
Line 574: | Line 610: | ||
* Subarray-Level Parallelism | * Subarray-Level Parallelism | ||
* Main Memory Interference | * Main Memory Interference | ||
+ | |||
===== Lecture 12b (5.11 Fri.) ===== | ===== Lecture 12b (5.11 Fri.) ===== | ||
* Self-Optimizing DRAM Controller | * Self-Optimizing DRAM Controller | ||
Line 588: | Line 625: | ||
* FCFS (First Come First Served) | * FCFS (First Come First Served) | ||
* FR-FCFS (first Ready, First Come First Served) | * FR-FCFS (first Ready, First Come First Served) | ||
+ | |||
+ | ===== Lecture 13 (11.11 Thu.) ===== | ||
+ | * DRAM Timing Constraints | ||
+ | * DRAM Refresh | ||
+ | * Memory Controller | ||
+ | * Resource Sharing | ||
+ | * Multi-Core Systems | ||
+ | * Partitioning | ||
+ | * Resource Contention | ||
+ | * Performance isolation | ||
+ | * Quality of service (QoS) | ||
+ | * Fairness | ||
+ | * Shared Cache | ||
+ | * Shared Resource Management | ||
+ | * Inter-Thread/ | ||
+ | * Unfair Slowdown | ||
+ | * Memory Performance Attack | ||
+ | * Memory Performance Hog | ||
+ | * Stream Access | ||
+ | * Random Access | ||
+ | * Memory Scheduling Policy | ||
+ | * Denial of Service (DoS) | ||
+ | * Service-Level Aggreement (SLA) | ||
+ | * Distributed DoS | ||
+ | * Networked Multi-Core Systems | ||
+ | * Interconnnect | ||
+ | * QoS-Aware Memory Systems | ||
+ | * Prioritization | ||
+ | * Data Mapping | ||
+ | * Core/Source Throttling | ||
+ | * Application/ | ||
+ | * QoS-Aware Memory Scheduling | ||
+ | * DRAM-Related Stall Time | ||
+ | * Memory Slowdown | ||
+ | * Stall-Time Fair Memory Scheduler (STFM) | ||
+ | * Parallelism-Aware Batch Scheduling (PAR-BS) | ||
+ | * Memory-Level Parallelism (MLP) | ||
+ | * Out-of-Order Execution | ||
+ | * Non-Blocking Cache | ||
+ | * Runahead Execution | ||
+ | * Bank-Level Parallelism | ||
+ | * Request Batching | ||
+ | * ATLAS (Adaptive per-Thread Least Attained Service Scheduling) | ||
+ | * Thread Cluster Memory Scheduling (TCM) | ||
+ | * Starvation | ||
+ | * MPKI (Misses per Kiloinstruction) | ||
+ | * Row-Buffer Locality | ||
+ | |||
+ | ===== Lecture 14a (12.11 Fri.) ===== | ||
+ | * Resource Sharing | ||
+ | * Multi-Core Systems | ||
+ | * Partitioning | ||
+ | * Resource Contention | ||
+ | * Performance isolation | ||
+ | * Quality of service (QoS) | ||
+ | * Fairness | ||
+ | * Shared Cache | ||
+ | * Shared Resource Management | ||
+ | * Inter-Thread/ | ||
+ | * Unfair Slowdown | ||
+ | * Memory Performance Attack | ||
+ | * Memory Performance Hog | ||
+ | * Stream Access | ||
+ | * Random Access | ||
+ | * Memory Scheduling Policy | ||
+ | * Denial of Service (DoS) | ||
+ | * Service-Level Aggreement (SLA) | ||
+ | * Distributed DoS | ||
+ | * Networked Multi-Core Systems | ||
+ | * Interconnnect | ||
+ | * QoS-Aware Memory Systems | ||
+ | * Prioritization | ||
+ | * Data Mapping | ||
+ | * Core/Source Throttling | ||
+ | * Application/ | ||
+ | * QoS-Aware Memory Scheduling | ||
+ | * DRAM-Related Stall Time | ||
+ | * Memory Slowdown | ||
+ | * Stall-Time Fair Memory Scheduler (STFM) | ||
+ | * Parallelism-Aware Batch Scheduling (PAR-BS) | ||
+ | * Memory-Level Parallelism (MLP) | ||
+ | * Out-of-Order Execution | ||
+ | * Non-Blocking Cache | ||
+ | * Runahead Execution | ||
+ | * Bank-Level Parallelism | ||
+ | * Request Batching | ||
+ | * ATLAS (Adaptive per-Thread Least Attained Service Scheduling) | ||
+ | * Thread Cluster Memory Scheduling (TCM) | ||
+ | * Starvation | ||
+ | * MPKI (Misses per Kiloinstruction) | ||
+ | * Row-Buffer Locality | ||
+ | |||
+ | ===== Lecture 14b (12.11 Fri.) ===== | ||
+ | * Emerging memory technologies | ||
+ | * Charge memory | ||
+ | * Resistive memory technologies | ||
+ | * Phase Change Memory (PCM) | ||
+ | * STT-MRAM | ||
+ | * Memristor | ||
+ | * RRAM/ReRAM | ||
+ | * Non-volatile | ||
+ | * Multi-Level Cell PCM (MLC-PCM) | ||
+ | * Endurance | ||
+ | * Reliability | ||
+ | * Intel Optane Memory | ||
+ | * 3D-XPoint Technology | ||
+ | * Read Asymmetry | ||
+ | * Magnetic Tunnel Junction (MTJ) device | ||
+ | * Hybrid main memory | ||
+ | * DRAM buffer/DRAM cache | ||
+ | * Data placement | ||
+ | * Row buffer | ||
+ | * Memory-Level Parallelism (MLP) | ||
+ | * Translation Lookaside Buffer (TLB) | ||
+ | * Page Table | ||
+ | * In-memory bulk bitwise operations | ||
+ | * In-memory crossbar array operations | ||
+ | * Analog computation | ||
+ | * Digital to Analog Converter (DAC) | ||
+ | * Analog to Digital Converter (ADC) | ||
+ | * NVM-based PIM system | ||
+ | |||
+ | ===== Lecture 16a (19.11 Fri.) ===== | ||
+ | * RowHammer Vulnerability | ||
+ | * RowHammer Protection | ||
+ | * DRAM | ||
+ | * Target Row Buffer (TRR) | ||
+ | * Data retention failure | ||
+ | * DRAM Refresh | ||
+ | * Bit flip | ||
+ | * Aggressor row | ||
+ | * U-TTR | ||
+ | * Row Scout | ||
+ | * TRR Analyzer | ||
+ | * Retention time | ||
+ | * DRAM Access pattern | ||
+ | * SoftMC | ||
+ | * FPGA | ||
+ | * TREF | ||
+ | * Dummy row hammer | ||
+ | * ECC | ||
+ | * Memory Controller | ||
+ | |||
+ | ===== Lecture 16b (19.11 Fri.) ===== | ||
+ | |||
+ | * RowHammer | ||
+ | * DRAM | ||
+ | * Activate | ||
+ | * Precharge | ||
+ | * Temperature | ||
+ | * Aggressor Row Active Time | ||
+ | * Victim Cell | ||
+ | * Physical Location | ||
+ | * SoftMC | ||
+ | * FPGA | ||
+ | * DRAM Refresh | ||
+ | * Bit flips | ||
+ | * Variation | ||
+ | * Spatial variation across columns | ||
+ | * RowHammer attacks | ||
+ | * RowHammer defences | ||
+ | |||
+ | ===== Lecture 16c (19.11 Fri.) ===== | ||
+ | |||
+ | * RowHammer | ||
+ | * Preventing RowHammer | ||
+ | * DRAM | ||
+ | * Bit flip | ||
+ | * DRAM Refresh | ||
+ | * Activate | ||
+ | * Precharge | ||
+ | * Physical isolation | ||
+ | * Reactive refresh | ||
+ | * Proactive throttling | ||
+ | * Scalability | ||
+ | * Compatibility | ||
+ | * Victim row | ||
+ | * Aggressor row | ||
+ | * RowBlocker | ||
+ | * Bloom filter | ||
+ | * AttackThrottler | ||
+ | |||
+ | ===== Lecture 16d (19.11 Fri.) ===== | ||
+ | |||
+ | * DRAM | ||
+ | * On-die ECC | ||
+ | * Memory error | ||
+ | * Data retention | ||
+ | * Write recovery | ||
+ | * Variable retention time | ||
+ | * Single-bit errors | ||
+ | * ECC encoder | ||
+ | * ECC decoder | ||
+ | * Error profile | ||
+ | * Error-prone data store | ||
+ | * Uncorrectable error | ||
+ | * At-risk bits | ||
+ | * Data patterns | ||
+ | * Memory controller | ||
+ | * Monte-carlo simulation | ||
+ | * Data retention | ||
+ | |||
+ | |||
+ | ===== Lecture 17a (25.11 Thu.) ===== | ||
+ | * Phase-Change Memory (PCM) | ||
+ | * Spin-Transfer Torque Magnetic Random-Access Memory (SST-MRAM) | ||
+ | * Hybrid main memory | ||
+ | * Data placement | ||
+ | * Memory-Level Parallelism (MLP) | ||
+ | * Performance-critical data | ||
+ | * Dynamic data transfer granularity | ||
+ | * DRAM cache | ||
+ | * Processing tightly-coupled with memory | ||
+ | * Processing using memory | ||
+ | * In-DRAM bulk bitwise operations | ||
+ | * NVM-based PIM system | ||
+ | * In-memory crossbar array operations | ||
+ | * Two-level memory/ | ||
+ | * Unified memory/ | ||
+ | * Persistent memory | ||
+ | * Intel Optane Persistent Memory | ||
+ | * UPMEM Processing-in-DRAM Engine | ||
+ | * Crash consistency problem | ||
+ | * Flash memory | ||
+ | |||
+ | ===== Lecture 17b (25.11 Thu.) ===== | ||
+ | * Heterogeneity (Asymmetry) | ||
+ | * Specialization | ||
+ | * Customization | ||
+ | * CRAY-1 | ||
+ | * Throughput | ||
+ | * Fairness | ||
+ | * Heterogeneous retention times in DRAM | ||
+ | * Heterogeneous interconnects | ||
+ | * General-Purpose | ||
+ | * Special-Purpose | ||
+ | * Amdahl’s Law | ||
+ | * Serialized code sections | ||
+ | * Asymmetric Chip Multiprocessor (ACMP) | ||
+ | * Accelerating serial bottlenecks | ||
+ | * Accelerating parallel bottlenecks | ||
+ | ===== Lecture 18 (26.11 Fri.) ===== | ||
+ | * Amdahl' | ||
+ | * Parallelizable fraction of a program | ||
+ | * Serial bottleneck | ||
+ | * Synchronization overhead | ||
+ | * Load imbalance overhead | ||
+ | * Resource sharing overhead | ||
+ | * Critical section | ||
+ | * Asymmetric multi-core (ACMP) | ||
+ | * Symmetric CMP (SCMP) | ||
+ | * Accelerated Critical Sections (ACS) | ||
+ | * Selective Acceleration of Critical Sections (SEL) | ||
+ | * Critical Section Request Buffer(CSRB) | ||
+ | * Cache misses for private data | ||
+ | * Cache misses for shared data | ||
+ | * Equal-area comparison | ||
+ | * Bottleneck Identification and Scheduling (BIS) | ||
+ | * Thread waiting cycles (TWC) | ||
+ | * Bottleneck Table (BT) | ||
+ | * Scheduling Buffers (SB) | ||
+ | * Acceleration Index Tables (AIT) | ||
+ | * The critical path | ||
+ | * Feedback-Directed Pipelining (FDP) | ||
+ | * Comprehensive fine-grained bottleneck acceleration | ||
+ | * Lagging threads | ||
+ | * Multiple applications | ||
+ | * Criticality of code segments | ||
+ | * Utility-Based Acceleration (UBA) | ||
+ | * Global criticality of the segment | ||
+ | * Fraction of execution time spent on segment | ||
+ | * Local speedup of the segment | ||
+ | * Data marshaling | ||
+ | * Staged execution model | ||
+ | * Segment spawning | ||
+ | * Producer-Consumer Pipeline Parallelism | ||
+ | * Locality of inter-segment data | ||
+ | * Generator instruction | ||
+ | * Marshal buffer | ||
+ | * Pipeline parallelism | ||
+ | * Aggressive stream prefetcher | ||
+ | * Energy expended per instruction (EPI) | ||
+ | * Dynamic voltage frequency scaling (DVFS) | ||
+ | * Frequency Boosting | ||
+ | |||
+ | ===== Lecture 19a (2.12 Thu.) ===== | ||
+ | |||
+ | * Multiprocessing | ||
+ | * Multiprocessor | ||
+ | * Multithreading | ||
+ | * Memory Consistency | ||
+ | * Cache Coherence | ||
+ | * Flynn’s Taxonomy | ||
+ | * SISD | ||
+ | * SIMD | ||
+ | * MISD | ||
+ | * MIMD | ||
+ | * Parallelism | ||
+ | * Instruction Level Parallelism | ||
+ | * Data Parallelism | ||
+ | * Task Level Parallelism | ||
+ | * Loosely Coupled Multiprocessors | ||
+ | * Tightly Coupled (Or Symmetric) Multiprocessors | ||
+ | * Hardware-Based Multithreading | ||
+ | * Parallel Speedup | ||
+ | * Superlinear Speedup | ||
+ | * Utilization | ||
+ | * Redundancy | ||
+ | * Efficiency | ||
+ | * Amdahl’s Law | ||
+ | * Sequential Bottleneck | ||
+ | * Synchronization | ||
+ | * Load Imbalance | ||
+ | * Resource Contention | ||
+ | * Critical Sections | ||
+ | * Barriers | ||
+ | * Stages Of Pipelined Programs | ||
+ | * Array Processor | ||
+ | * Vector Processor | ||
+ | * Systolic Array Processor | ||
+ | * Streaming Processor | ||
+ | * Parallel Computation | ||
+ | * Fault-Tolerance | ||
+ | * Power Equation (P = C*V^2*F) | ||
+ | * Dennard Scaling | ||
+ | * Instruction-Level Parallelism | ||
+ | * Task-Level Parallelism | ||
+ | * Pipelining | ||
+ | * Out-Of-Order Execution | ||
+ | * Speculative Execution | ||
+ | * VLIW: Very Long Instruction Word | ||
+ | * Dataflow | ||
+ | * Superscalar Execution | ||
+ | * Thread-Level Speculation | ||
+ | * Threads Vs. Processes | ||
+ | * Shared Global Memory Address Space | ||
+ | * Shared Resource Management | ||
+ | * Hardware-Based Multithreading | ||
+ | * Cache Coherence | ||
+ | * Memory Consistency | ||
+ | * Load Imbalance | ||
+ | ===== Lecture 19b (2.12 Thu.) ===== | ||
+ | |||
+ | * Memory ordering | ||
+ | * Memory consistency | ||
+ | * Parallel computer architecture | ||
+ | * Multiprocessor operation | ||
+ | * MIMD (multiple instruction, | ||
+ | * Performance-correctness trade-off | ||
+ | * Cache coherence | ||
+ | * Ordering of operations | ||
+ | * Local ordering | ||
+ | * Global ordering | ||
+ | * Memory fence instruction | ||
+ | * Out-of-order execution | ||
+ | * Mutual exclusion | ||
+ | * Protecting shared data | ||
+ | * Critical section | ||
+ | * Sequential consistency | ||
+ | * Weaker memory consistency | ||
+ | * Dataflow processor | ||
+ | |||
+ | |||
+ | ===== Lecture 20 (03.12 Fri.) ===== | ||
+ | |||
+ | * Cache coherence | ||
+ | * Memory consistency | ||
+ | * Shared memory model | ||
+ | * Software coherence | ||
+ | * Coarse-grained (page-level) | ||
+ | * Non-cacheable | ||
+ | * Fine-grained (cache flush) | ||
+ | * Hardware coherence | ||
+ | * Valid/ | ||
+ | * Write propagation | ||
+ | * Write serialization | ||
+ | * Update vs. Invalid | ||
+ | * Snoopy bus | ||
+ | * Directory | ||
+ | * Exclusive bit | ||
+ | * Directory optimizations (bypassing) | ||
+ | * Snoopy cache | ||
+ | * Shared bus | ||
+ | * VI protocol | ||
+ | * MSI (Modified, Shared, Invalid) | ||
+ | * Exclusive state | ||
+ | * MESI (Modified, Exclusive, Shared, Invalid) | ||
+ | * Illinois Protocol (MESI) | ||
+ | * Broadcast | ||
+ | * Bus request | ||
+ | * Downgrade/ | ||
+ | * Snoopy invalidation | ||
+ | * Cache-to-cache transfer | ||
+ | * Writeback | ||
+ | * MOESI (Modified, Owned, Exclusive, Shared, Invalid) | ||
+ | * Directory coherence | ||
+ | * Race conditions | ||
+ | * Totally-ordered interconnect | ||
+ | * Directory-based protocols | ||
+ | * Set inclusion test | ||
+ | * Linked list | ||
+ | * Bloom filters | ||
+ | * Contention resolution | ||
+ | * Ping-ponging | ||
+ | * Synchronization | ||
+ | * Shared-data-structure | ||
+ | * Token Coherence | ||
+ | * Coherence for NDAs | ||
+ | * Optimistic execution | ||
+ | * Signature | ||
+ | * Commit/ | ||
+ | |||
+ | ===== Lecture 21 (09.12 Thu.) ===== | ||
+ | * Interconnects | ||
+ | * Cache coherence | ||
+ | * Interconnect networks: | ||
+ | * Topology | ||
+ | * Routing | ||
+ | * Buffering and flow control | ||
+ | * Oversubscription of routers | ||
+ | * Terminology: | ||
+ | * Network Interface | ||
+ | * Link | ||
+ | * Switch/ | ||
+ | * Channel | ||
+ | * Node | ||
+ | * Message | ||
+ | * Packet | ||
+ | * Flit | ||
+ | * Direct/ | ||
+ | * Properties of a Network Topology: | ||
+ | * Regular/ | ||
+ | * Routing distance | ||
+ | * Diameter | ||
+ | * Average distance | ||
+ | * Bisection Bandwidth | ||
+ | * Blocking/ | ||
+ | * Topologies: | ||
+ | * Bus | ||
+ | * P2P | ||
+ | * Crossbar | ||
+ | * Ring | ||
+ | * Tree | ||
+ | * Omega | ||
+ | * Hypercube | ||
+ | * Mesh | ||
+ | * Torus | ||
+ | * Butterfly | ||
+ | * cost, latency, contention, energy, bandwidth, overall performance | ||
+ | * Circuit switching network | ||
+ | * Multistage network | ||
+ | * Fetch-and-add | ||
+ | * Unidirectional Ring | ||
+ | * Bidirectional rings | ||
+ | * Hierarchical rings | ||
+ | * Mesh: asymmetricity on the edge | ||
+ | * Torus | ||
+ | * H-tree | ||
+ | * Fat-tree | ||
+ | * Hyper-cube. Cosmic Cube | ||
+ | * Routing mechanism: Arithmetic, Source-based, | ||
+ | * Types of routing algorithm: deterministic, | ||
+ | * Deadlock | ||
+ | * Oblivious routing | ||
+ | * Adaptive Routing: minimal adaptive, non-minimal adaptive | ||
+ | * Flow Control | ||
+ | * Handling contention: buffer, drop or misroute | ||
+ | * Flow control methods: | ||
+ | * Circuit switching | ||
+ | * Bufferless | ||
+ | * Store and forward | ||
+ | * Virtual cut through | ||
+ | * Wormhole | ||
+ | * Performance and congestion at high loads | ||
+ | * Store-and-forward | ||
+ | * Cut-though flow control | ||
+ | * Wormhole flow control | ||
+ | * Head of Line Blocking | ||
+ | * Virtual Channel Flow Control | ||
+ | |||
+ | ===== Lecture 22 (10.12 Fri.) ===== | ||
+ | * Load latency curve | ||
+ | * Ideal latency | ||
+ | * Manhattan distance | ||
+ | * Bit-Compliment traffic | ||
+ | * Uniform Random traffic | ||
+ | * Transpose traffic | ||
+ | * Adaptive topology | ||
+ | * Performance of interconnection networks | ||
+ | * On-chip networks | ||
+ | * Difference between off-chip and on-chip networks | ||
+ | * Network buffers | ||
+ | * Efficient routing | ||
+ | * Advantages of on-chip interconnects | ||
+ | * Saturation throughput | ||
+ | * Pin constraints | ||
+ | * Wiring resources | ||
+ | * Disadvantages of on-chip interconnects | ||
+ | * Energy/ | ||
+ | * Tradeoffs of interconnect design | ||
+ | * Buffers in NoC routers | ||
+ | * Bufferless routing | ||
+ | * Flit-level routing | ||
+ | * Deflection routing | ||
+ | * Age-Based prioritization | ||
+ | * Buffer and link energy consumption | ||
+ | * Self-throttling | ||
+ | * Livelock freedom problem | ||
+ | * Golden packet for livelock freedom | ||
+ | * Reassembly buffers | ||
+ | * Performance-Power Spectrum | ||
+ | * Packet retransmission | ||
+ | * Packet scheduling | ||
+ | |||
+ | ===== Lecture 23 (16.12 Thu.) ===== | ||
+ | * SIMD | ||
+ | * SISD | ||
+ | * MISD | ||
+ | * Systolic arrays | ||
+ | * MIMD | ||
+ | * Instruction level parallelism (ILP) | ||
+ | * Array processor | ||
+ | * Vector processor | ||
+ | * VLIW: Very long instruction word | ||
+ | * Vector length register (VLEN) | ||
+ | * Vector stride register (VSTR) | ||
+ | * Vector load instruction (VLD) | ||
+ | * Intra-vector dependencies | ||
+ | * Regular parallelism | ||
+ | * Memory bandwidth | ||
+ | * Vector data register | ||
+ | * Vector control registers | ||
+ | * Vector mask register | ||
+ | * Vector functional units | ||
+ | * Vector registers | ||
+ | * VADD | ||
+ | * Scalar operations | ||
+ | * Memory data register | ||
+ | * Memory address register | ||
+ | * Interleaved memory | ||
+ | * Memory banking | ||
+ | * Address generator | ||
+ | * Monolithic memory | ||
+ | * Memory access latency | ||
+ | * Vectorizable loops | ||
+ | * Vector code performance | ||
+ | * Vector data forwarding (chaining) | ||
+ | * Vector chaining | ||
+ | * Vector stripmining | ||
+ | * Irregular memory access | ||
+ | * Gather/ | ||
+ | * Sparse vector | ||
+ | * Masked operations | ||
+ | * Predicated execution | ||
+ | * Row/Column major layouts | ||
+ | * Bank conflicts | ||
+ | * Randomized mapping | ||
+ | * Vector instruction level parallelism | ||
+ | * Automatic code vectorization | ||
+ | * Packed arithmetic | ||
+ | * GPUs | ||
+ | * Programming model vs execution model | ||
+ | * SPMD | ||
+ | * Warp (wavefront) | ||
+ | * SIMD vs. SIMT | ||
+ | * Warp-level FGMT | ||
+ | * Vector lanes | ||
+ | * Warp scheduler | ||
+ | * Fine-grained multithreading | ||
+ | * Warp instruction level parallelism | ||
+ | * Warp-based SIMD vs. traditional SIMD | ||
+ | * Multiple instruction streams | ||
+ | * Conditional control flow instructions | ||
+ | * Branch divergence | ||
+ | * Dynamic warp formation | ||
+ | * Functional unit |
buzzword.1636138033.txt.gz · Last modified: 2021/11/05 18:47 by nadigr