buzzword
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
buzzword [2018/11/14 14:15] – alserm | buzzword [2019/09/20 11:09] (current) – juang | ||
---|---|---|---|
Line 63: | Line 63: | ||
* Error Correcting Codes (ECC) | * Error Correcting Codes (ECC) | ||
- | ===== Lecture 2 (20.09 | + | ===== Lecture 2 (20.09 |
* Rowhammer | * Rowhammer | ||
* Memory reliability | * Memory reliability | ||
Line 616: | Line 616: | ||
* Uncorrectable Errors | * Uncorrectable Errors | ||
* Wearout Period | * Wearout Period | ||
+ | |||
===== Lecture 15 (14.11 Wed.) ===== | ===== Lecture 15 (14.11 Wed.) ===== | ||
* Program Interference | * Program Interference | ||
Line 643: | Line 644: | ||
* Hot/Cold Page | * Hot/Cold Page | ||
* Write-hotness aware management | * Write-hotness aware management | ||
+ | |||
+ | ===== Lecture 16 (15.11 Thu.) ===== | ||
+ | * Resource sharing | ||
+ | * Contention for resources | ||
+ | * Multiple hardware contexts | ||
+ | * Performance isolation | ||
+ | * Revolving door analogy | ||
+ | * Quality of Service (QoS) | ||
+ | * Shared resource management | ||
+ | * Utilization/ | ||
+ | * Uncontrolled (free-for-all) sharing | ||
+ | * Unfair sharing | ||
+ | * Unpredictable performance (or lack of QoS) | ||
+ | * Service Level Agreement (SLA) | ||
+ | * Overprovision | ||
+ | * Partitioning (dedicated space) | ||
+ | * Memory performance hog | ||
+ | * FR-FCFS | ||
+ | * Denial of Service (DoS) | ||
+ | * Distributed DoS | ||
+ | * Packet-switched routers | ||
+ | * Inter-thread interference | ||
+ | * QoS-aware memory systems | ||
+ | * Smart resources | ||
+ | * QoS-aware memory controller | ||
+ | * QoS-aware interconnect | ||
+ | * QoS-aware caches | ||
+ | * Dumb resources | ||
+ | * Injection control | ||
+ | * Data mapping | ||
+ | * Prioritization / requests scheduling | ||
+ | * Fair memory scheduling | ||
+ | * Stall-time tracking / estimation | ||
+ | * Bank parallelism interference | ||
+ | * Parallelism-aware scheduler | ||
+ | * Request batching | ||
+ | * PAR-BS | ||
+ | * Within-batch scheduling | ||
+ | * ATLAS | ||
+ | * Thread ranking | ||
+ | * Shortest job first | ||
+ | * Shortest stall-time first | ||
+ | * Multiple memory controllers | ||
+ | * Thread clusters | ||
+ | * TCM | ||
+ | * Niceness | ||
+ | * Throughput biased | ||
+ | * Fairness biased | ||
+ | * Misses Per Kilo Instructions (MPKI) | ||
+ | * Priority shuffle | ||
+ | * Vulnerability to interference | ||
+ | * Tunable knobs | ||
+ | * Blacklisting | ||
+ | * Performance vs fairness vs simplicity | ||
+ | |||
+ | ===== Lecture 17 (21.11 Wed.) ===== | ||
+ | * Memory System | ||
+ | * Memory Controller | ||
+ | * Heterogeneous Agents | ||
+ | * DASH | ||
+ | * Staged memory scheduling (SMS) | ||
+ | * First-ready first-come first-serve (FR-FCFS) | ||
+ | * Memory Interference-Induced Slowdown Estimation (MISE) | ||
+ | * Service Level Agreements (SLA) | ||
+ | * Stall-time Fair Memory (STFM) | ||
+ | * Quality of Service (QoS) | ||
+ | * Soft QoS | ||
+ | * Multithreaded applications | ||
+ | * Row-Buffer Locality | ||
+ | * Channel Partitioning | ||
+ | * Page mapping | ||
+ | * request buffer | ||
+ | |||
+ | ===== Lecture 18a (22.11 Thu.) ===== | ||
+ | |||
+ | * Fundamental interference control techniques | ||
+ | * Core/Source throttling | ||
+ | * Smart resources | ||
+ | * Dynamic unfairness estimation | ||
+ | * Throttling cores' memory access rates | ||
+ | * FST: Fairness via Source Throttling | ||
+ | * Runtime unfairness evaluation | ||
+ | * Dynamic request throttling | ||
+ | * Request injection rate | ||
+ | * Application/ | ||
+ | * Many-core on-chip communication | ||
+ | * Shared cache bank | ||
+ | * Spatial task scheduling | ||
+ | * Clustering, balancing, isolation, and radial mapping | ||
+ | * Network power | ||
+ | * Microarchitecture unawareness | ||
+ | * Operating-system-level metrics and microarchitecture-level metrics | ||
+ | * Architecture-aware distributed resource management (DRM) | ||
+ | * Interference-aware thread scheduling | ||
+ | * Memory quality of service (QoS) approaches and techniques | ||
+ | * Smart vs dump components | ||
+ | * Cache interference management | ||
+ | * Interconnect interference management | ||
+ | * DRAM designs to reduce interference | ||
+ | * SoftMC | ||
+ | * PIM accelerators | ||
+ | * Decoupled direct memory access (DDMA) | ||
+ | |||
+ | ===== Lecture 18b (22.11 Thu.) ===== | ||
+ | |||
+ | * Multi-core issues in caching | ||
+ | * Cache coherence | ||
+ | * Flush-local and flush-global | ||
+ | * Snoopy cache coherence | ||
+ | * Free for all sharing | ||
+ | * Controlled cache sharing | ||
+ | * Hardware -based cache partitioning | ||
+ | * Marginal utility of a cache way | ||
+ | * Dynamic set sampling | ||
+ | * UCP | ||
+ | * Optimal partitioning: | ||
+ | * Dynamic fair caching | ||
+ | * Software-based shared cache partitioning | ||
+ | * Page coloring | ||
+ | * Static cache partitioning | ||
+ | * Dynamic cache partitioning via page re-coloring | ||
+ | |||
+ | ===== Lecture 19a (28.11 Thu.) ===== | ||
+ | * Controlled Shared Caching | ||
+ | * Cache spilling | ||
+ | * Cooperative caching | ||
+ | * DSR: Dynamic spill-receive | ||
+ | * Set dueling | ||
+ | * Cooperative caching | ||
+ | * Handling shared data in provate caches | ||
+ | * Non-uniform cache access | ||
+ | * Multi-core cache efficiency | ||
+ | * Cache compression | ||
+ | * Decompression latency | ||
+ | * Compression ratio | ||
+ | * Zero compression | ||
+ | * Frequent value compression | ||
+ | * Frequent pattern compression | ||
+ | * Base-delta immediate compression | ||
+ | * Toggle-aware compression for GPU systems | ||
+ | * Core-assisted bottleneck acceleration in GPUs | ||
+ | * Cache placement | ||
+ | * Cache insertion policies: MRU, LRU | ||
+ | * LIP: LRU insertion position (Low-prioirity insertion policy) | ||
+ | * BIP: Bimodal insertion policy | ||
+ | * DIP: Dynamic insertion policy | ||
+ | * Circular reference model | ||
+ | * Cache pollution | ||
+ | * Cache thrashing | ||
+ | * Reuse prediciton | ||
+ | * EAF: Evicted-address filter | ||
+ | * TA-DIP: Thread-aware dynamic insertion policy | ||
+ | * Run-time bypassing | ||
+ | * Single-usage block prediction | ||
+ | * SHIP: Signature-based block prediction | ||
+ | * Miss classification table | ||
+ | * s-curve | ||
+ | * ASM: Application slowdown model | ||
+ | * Cache access rate | ||
+ | * Memory access rate | ||
+ | * Auxilary tag store | ||
+ | |||
+ | ===== Lecture 19b (28.11 Thu.) ===== | ||
+ | * Heterogeneity and asymmetry | ||
+ | * CRAY-1 design | ||
+ | * Scalar machine and vector pipeline machine | ||
+ | * RAIDR | ||
+ | * DRAM + Phase change memory | ||
+ | * Reliable, costly DRAM + Unreliable, cheap DRAM | ||
+ | * Heterogeneous retention time | ||
+ | * Tilera | ||
+ | * Packet switching and circuit switching | ||
+ | * TDN, MDN, IDN, UDN, and STN | ||
+ | * General purpose vs special purpose | ||
+ | * Heterogeneity of CPU and GPUs | ||
+ | * Predictability and robustness | ||
+ | |||
+ | ===== Lecture 20 (29.11 Thu.) ===== | ||
+ | * DRAM | ||
+ | * NVM | ||
+ | * Flash | ||
+ | * Processing in Memory | ||
+ | * Hardware Security | ||
+ | * Heterogeneous Multi-Core Systems | ||
+ | * Bottleneck Acceleration | ||
+ | * Heterogeneity (Asymmetry) | ||
+ | * Symmetric design | ||
+ | * One-size-fits-all | ||
+ | * Quality of Service (QoS) | ||
+ | * Hybrid Memory Controllers | ||
+ | * Heterogeneous agents (e.g., CPUs, GPUs, and HWAs) | ||
+ | * Heterogeneous memories: Fast vs. Slow DRAM | ||
+ | * Heterogeneous interconnects: | ||
+ | * Amdahl’s Law | ||
+ | * Synchronization overhead | ||
+ | * Load imbalance overhead | ||
+ | * Resource sharing overhead | ||
+ | * Sequential portions (Amdahl’s “serial part”) | ||
+ | * Critical sections | ||
+ | * Barriers | ||
+ | * Asymmetric Chip Multiprocessor (ACMP) | ||
+ | * Bottleneck Acceleration | ||
+ | * Staged Execution | ||
+ | * Data Marshaling | ||
+ | * Phase Change Memory | ||
+ | |||
+ | ===== Lecture 21 (05.12 Wed.) ===== | ||
+ | * GPU | ||
+ | * Programming model | ||
+ | * Sequential | ||
+ | * SIMD | ||
+ | * SPMD | ||
+ | * SIMT | ||
+ | * Warp (wavefront) | ||
+ | * Multithreading of warps | ||
+ | * Warp-level FGMT | ||
+ | * Latency-hiding | ||
+ | * Interleave warp execution | ||
+ | * Registers of thread ID | ||
+ | * Warp-based SIMD vs. Traditional SIMD | ||
+ | * GPGPU programming | ||
+ | * Inherent parallelism | ||
+ | * Data parallelism | ||
+ | * GPU main bottlenecks | ||
+ | * CPU-GPU data transfers | ||
+ | * DRAM memory | ||
+ | * Task offloading | ||
+ | * Serial code (host) | ||
+ | * Parallel code (device) | ||
+ | * Bulk synchronization | ||
+ | * Transparent scalability | ||
+ | * Memory hierarchy | ||
+ | * Indexing and memory access | ||
+ | * Streaming multiprocessor (SM) | ||
+ | * Streaming processor (Vector lane) | ||
+ | * Occupancy | ||
+ | * Memory coalescing | ||
+ | * Shared memory tiling | ||
+ | * Bank conflict | ||
+ | * Padding | ||
+ | * SIMD utilization | ||
+ | * Atomic operations | ||
+ | * Histogram calculation | ||
+ | * CUDA streams | ||
+ | * Asynchronous transfers | ||
+ | * Heterogeneous systems | ||
+ | * Unified memory | ||
+ | * System-wide atomic operations | ||
+ | * Collaborative computing | ||
+ | * CPU+GPU collaboration | ||
+ | * Collaborative patterns | ||
+ | * Data partitioning | ||
+ | * Task partitioning | ||
+ | * Coarse-grained | ||
+ | * Fine-grained | ||
+ | * Bézier surfaces | ||
+ | * NVIDIA Pascal | ||
+ | * NVIDIA Volta | ||
+ | * Padding | ||
+ | * Chai benchmark suite | ||
+ | |||
+ | ===== Lecture 22 (6.12 Thu.) ===== | ||
+ | * Persistent memory | ||
+ | * Crash consistency | ||
+ | * Checkpointing | ||
+ | * Flynn´s taxonomy of computers | ||
+ | * Parallelism | ||
+ | * Performance | ||
+ | * Power consumption | ||
+ | * Cost efficiency | ||
+ | * Dependability | ||
+ | * Instruction level parallelism | ||
+ | * Data parallelism | ||
+ | * Task level parallelism | ||
+ | * Multiprocessor | ||
+ | * Loosely coupled | ||
+ | * Tightly coupled | ||
+ | * Shared global memory address space | ||
+ | * Shared memory synchronization | ||
+ | * Cache coherence | ||
+ | * Memory consistency | ||
+ | * Shared resource management | ||
+ | * Interconnects | ||
+ | * Programming issues in tightly coupled multiprocessor | ||
+ | * Sublinear speedup | ||
+ | * Linear speedup | ||
+ | * Superlinear speedup | ||
+ | * Unfair comparison | ||
+ | * Cache/ | ||
+ | * Utilization | ||
+ | * Redundancy | ||
+ | * Efficiency | ||
+ | * Amdahl' | ||
+ | * Bottlenecks in parallel portion | ||
+ | * Ordering of operations | ||
+ | * Sequential consistency | ||
+ | * Weaker memory consistency | ||
+ | * Memory fence instructions | ||
+ | * Higher performance | ||
+ | * Burden on the programmer | ||
+ | * Coherence scheme | ||
+ | * Valid/ | ||
+ | * Write propagation | ||
+ | * Write serialization | ||
+ | * Update vs. Invalid | ||
+ | * Cache coherence | ||
+ | * Snoopy bus | ||
+ | * Directory | ||
+ | * Directory optimizations | ||
+ | * Directory bypassing | ||
+ | * Snoopy cache | ||
+ | * Shared bus | ||
+ | * VI protocol | ||
+ | * MSI (Modified, Shared, Invalid) | ||
+ | * Exclusive state | ||
+ | * MESI (Modified, Exclusive, Shared, Invalid) | ||
+ | * Illinois Protocol (MESI) | ||
+ | * Broadcast | ||
+ | * Bus request | ||
+ | * Downgrade | ||
+ | * Upgrade | ||
+ | * Snoopy invalidation | ||
+ | * Cache-to-cache transfer | ||
+ | * Writeback | ||
+ | * MOESI (Modified, Owned, Exclusive, Shared, Invalid) | ||
+ | * Directory coherence | ||
+ | * Race conditions | ||
+ | * Totally-ordered interconnect | ||
+ | * Directory-based protocols | ||
+ | * Set inclusion test | ||
+ | * Linked list | ||
+ | * Bloom filters | ||
+ | * Contention resolution | ||
+ | * Ping-ponging | ||
+ | * Synchronization | ||
+ | * Shared-data-structure | ||
+ | * Token Coherence | ||
+ | * Virtual bus | ||
+ | |||
+ | ===== Lecture 23 (12.12 Wed.) ===== | ||
+ | * Interconnection Network, Interconnect | ||
+ | * Topology | ||
+ | * Routing | ||
+ | * Buffering and Flow Control | ||
+ | * Switch/ | ||
+ | * Channel | ||
+ | * Wire | ||
+ | * Packet | ||
+ | * Path | ||
+ | * Bus | ||
+ | * Mesh, 2D Mesh | ||
+ | * Throttling | ||
+ | * Oversubscription | ||
+ | * Network Interface | ||
+ | * Link | ||
+ | * Node | ||
+ | * Message | ||
+ | * Flit | ||
+ | * Direct/ | ||
+ | * Radix | ||
+ | * Regular/ | ||
+ | * Routing Distance | ||
+ | * Diameter | ||
+ | * Bisection Bandwidth | ||
+ | * Congestion | ||
+ | * Blocking/ | ||
+ | * Crossbar | ||
+ | * Ring | ||
+ | * Tree | ||
+ | * Omega | ||
+ | * Hypercube | ||
+ | * Torus | ||
+ | * Butterfly | ||
+ | * Arbitration | ||
+ | * Point-to-Point | ||
+ | * Multistage Network | ||
+ | * Hop | ||
+ | * Circuit Switching | ||
+ | * Packet Switching | ||
+ | * Tree saturation | ||
+ | * Deadlock | ||
+ | * Circular dependency | ||
+ | * Oblivious Routing | ||
+ | * Adaptive Routing | ||
+ | * Packet Format | ||
+ | * Header | ||
+ | * Payload | ||
+ | * Error Code | ||
+ | * Virtual Channel Flow Control | ||
+ | |||
+ | ===== Lecture 24 (13.12 Thu.) ===== | ||
+ | * Load latency curve | ||
+ | * Performance of interconnection networks | ||
+ | * On-chip networks | ||
+ | * Difference between off-chip and on-chip networks | ||
+ | * Network buffers | ||
+ | * Efficient routing | ||
+ | * Advantages of on-chip interconnects | ||
+ | * Pin constraints | ||
+ | * Wiring resources | ||
+ | * Disadvantages of on-chip interconnects | ||
+ | * Energy/ | ||
+ | * Tradeoffs of interconnect design | ||
+ | * Buffers in NoC routers | ||
+ | * Bufferless routing | ||
+ | * Flit-level routing | ||
+ | * Deflection routing | ||
+ | * Buffer and link energy consumption | ||
+ | * Self-throttling | ||
+ | * Livelock freedom problem | ||
+ | * Golden packet for livelock freedom | ||
+ | * Reassembly buffers | ||
+ | * Packet retransmission | ||
+ | * Packet scheduling |
buzzword.1542204946.txt.gz · Last modified: 2019/02/12 16:33 (external edit)