buzzword
Differences
This shows you the differences between two versions of the page.
buzzword [2019/11/14 19:56] – [Lecture 15 (14.11 Thu.)] firtinac | buzzword [2019/12/14 20:25] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 253: | Line 253: | ||
* BitWeaving | * BitWeaving | ||
* Computing Architectures with Minimal Data Movement | * Computing Architectures with Minimal Data Movement | ||
- | * Mindset on reviewing manuscripts and scientific process | ||
- | * Suggestions on critical paper review | ||
- | * Mindset issues everywhere | ||
- | * Bandwidth bottleneck in Zurich Airport | ||
- | * Wrong methodology in design space exploration: | ||
* 3D-Stacked Logic+Memory | * 3D-Stacked Logic+Memory | ||
* Logic Layer | * Logic Layer | ||
Line 598: | Line 593: | ||
* Core/source throttling | * Core/source throttling | ||
* Fairness via Source Throttling | * Fairness via Source Throttling | ||
+ | ===== Lecture 16a (15.11 Fri.) ===== | ||
+ | * Shared resource contention | ||
+ | * Slowdown estimation | ||
+ | * Application/ | ||
+ | * Multi-core/ | ||
+ | * Application/ | ||
+ | * Application prioritization | ||
+ | * On-chip communication | ||
+ | * Communication distance | ||
+ | * Congestion in Network-on-Chip (NoC) | ||
+ | * Spatial task scheduling | ||
+ | * Clustering | ||
+ | * Load balancing | ||
+ | * Isolation | ||
+ | * Radial mapping | ||
+ | * Distributed Resource Management (DRM) | ||
+ | * Operating-system-level metric | ||
+ | * Microarchitecture-level metric | ||
+ | * Architecture-aware DRM | ||
+ | * Machine learning-based mapping/ | ||
+ | |||
+ | ===== Lecture 16b (15.11 Fri.) ===== | ||
+ | * Emerging memory technology | ||
+ | * Flash memory | ||
+ | * Memory-centric system design | ||
+ | * Phase change memoery | ||
+ | * Charge memory | ||
+ | * Resistive memory | ||
+ | * Multi-level cell | ||
+ | * Spin-Transfer Torque Magnetic RAM (STT-MRAM) | ||
+ | * Memristors | ||
+ | * Resistive RAM (RRAM or ReRAM) | ||
+ | * Intel 3D Xpoint | ||
+ | * Capacity-latency trade-off | ||
+ | * Capacity-reliability trade-off | ||
+ | * Endurance | ||
+ | * Magnetic Tunnel Junction (MTJ) | ||
+ | * Hybrid memory | ||
+ | * Writing filtering | ||
+ | * Data placement | ||
+ | * Data access pattern | ||
+ | * Row-buffer locality | ||
+ | * Overall system performance impact | ||
+ | * Memory-Level Parallelism (MLP) | ||
+ | * Utility-based hybrid memory management | ||
+ | |||
+ | ===== Lecture 17 (21.11 Thu.) ===== | ||
+ | * SIMD | ||
+ | * Multiply-accumulate | ||
+ | * Thread block | ||
+ | * Stream processor | ||
+ | * Tensor core | ||
+ | * Neural network training | ||
+ | * Systolic arrays | ||
+ | * Fine-grain multithreading | ||
+ | * Warp | ||
+ | * GPU programming | ||
+ | * General purpose processing on GPU | ||
+ | * GPU kernels | ||
+ | * CUDA | ||
+ | * OpenCL | ||
+ | * SPMD | ||
+ | * Grid, Block, Threads | ||
+ | * Row major layout | ||
+ | * Warp scheduler | ||
+ | * Coalesced memory accesses | ||
+ | * AoS (Array of Structures) | ||
+ | * SoA (Structure of Arrays) | ||
+ | * Tiling | ||
+ | * Bank conflicts | ||
+ | * Padding | ||
+ | * Randomized mapping | ||
+ | * Hash functions | ||
+ | * Divergence | ||
+ | * Vector reduction | ||
+ | * Divergence-free mapping | ||
+ | * Atomic operations | ||
+ | * PTX | ||
+ | * SASS | ||
+ | * Synchronous and asynchronous transfers | ||
+ | * Stream | ||
+ | * Collaborative Computing | ||
+ | * Unified memory space | ||
+ | * Collaborative patterns | ||
+ | |||
+ | ===== Lecture 18 (22.11 Fri.) ===== | ||
+ | * Instruction prefetching | ||
+ | * Data prefetching | ||
+ | * Memory Hierarchy | ||
+ | * Memory Read/Write Latency | ||
+ | * Memory Bandwidth | ||
+ | * Memory Footprint | ||
+ | * Caches as Bandwidth Filters | ||
+ | * Little' | ||
+ | * Occupancy | ||
+ | * Latency | ||
+ | * Throughput | ||
+ | * Queueing Resources | ||
+ | * Compulsory Miss | ||
+ | * Demand Miss | ||
+ | * Spatial and Temporal Locality | ||
+ | * Fetch Granule | ||
+ | * Hardware Prefetching | ||
+ | * Software Prefetch Instruction | ||
+ | * Code Reordering | ||
+ | * Speculative Execution | ||
+ | * Loop Unrolling | ||
+ | * Load Hoisting | ||
+ | * Prefetch Degree | ||
+ | * Three Prefetch Metrics | ||
+ | * Accuracy | ||
+ | * Coverage | ||
+ | * Timeliness | ||
+ | * Heuristic-Based Next-N-Line Prefetching | ||
+ | * History-Based Target Line Prefetching | ||
+ | * Heuristic-Based Wrong-Path Prefetching | ||
+ | * Hybrid Prefetching | ||
+ | * Branch Predictor | ||
+ | * Branch Target Buffer (BTB) | ||
+ | * Next-Line Prefetchers | ||
+ | * Stride Prefetchers | ||
+ | * Cache-Block Address Based Stride Prefetching | ||
+ | * Correlation-Based Prefetchers | ||
+ | * Content-Birected Prefetchers | ||
+ | * Precomputation or Execution-Based prefetchers | ||
+ | * Address Correlation Based Prefetching | ||
+ | * Markov Model and Markov Prefetchers | ||
+ | * Prefetch Confidence | ||
+ | * Hybrid Hardware Prefetchers | ||
+ | * Execution-based Prefetchers | ||
+ | * Speculative Thread | ||
+ | * Feedback-Directed Prefetcher | ||
+ | * Prefetcher Throttling | ||
+ | ===== Lecture 19 (28.11 Thu.) ===== | ||
+ | |||
+ | * Hybrid Memory Systems | ||
+ | * Large (DRAM) Cache | ||
+ | * TIMBER | ||
+ | * Two-Level Memory/ | ||
+ | * Volatile data | ||
+ | * Persistent data | ||
+ | * Single-level store | ||
+ | * Unified Memory and storage | ||
+ | * The Persistent Memory Manager (PMM) | ||
+ | * ThyNVM | ||
+ | * Heterogeneity | ||
+ | * Asymmetry in design | ||
+ | * Amdahl' | ||
+ | * Synchronization overhead | ||
+ | * Load imbalance overhead | ||
+ | * Resource sharing overhead | ||
+ | * IBM Power4 | ||
+ | * IBM Power5 | ||
+ | * Niagara Processor | ||
+ | * Performance vs. parallelism | ||
+ | * Asymmetric Chip Multiprocessor (ACMP) | ||
+ | * MorphCore | ||
+ | |||
+ | ===== Lecture 20 (29.11 Fri.) ===== | ||
+ | * Heterogeneity | ||
+ | * Pointer-chasing | ||
+ | * Critical section | ||
+ | * Asymmetry | ||
+ | * Accelerated | ||
+ | * Data Marshalling | ||
+ | * False Serialization | ||
+ | * Shared Data | ||
+ | * Private Data | ||
+ | * Amdahl’s Law | ||
+ | * Barriers | ||
+ | * Identification | ||
+ | * Migration | ||
+ | * Private Data | ||
+ | * Feedback Directed Pipelining | ||
+ | * Staged Execution | ||
+ | * Inter-segment Data | ||
+ | * Pipeline Parallelism | ||
+ | * Dynamic heterogeneity | ||
+ | |||
+ | ===== Lecture 21a (5.12 Thu.) ===== | ||
+ | * Persistent memory | ||
+ | * Crash consistency | ||
+ | * Checkpointing | ||
+ | * Flynn' | ||
+ | * Parallelism | ||
+ | * Performance | ||
+ | * Power consumption | ||
+ | * Cost efficiency | ||
+ | * Dependability | ||
+ | * Instruction level parallelism | ||
+ | * Data parallelism | ||
+ | * Task level parallelism | ||
+ | * Utilization | ||
+ | * Redundancy | ||
+ | * Efficiency | ||
+ | * Amdahl' | ||
+ | * Bottlenecks in parallel portion | ||
+ | * Multiprocessor | ||
+ | * Loosely coupled multiprocessors | ||
+ | * Tightly coupled multiprocessors | ||
+ | * Shared global memory address space | ||
+ | * Shared memory synchronization | ||
+ | * Interconnects | ||
+ | * Programming issues in tightly coupled multiprocessor | ||
+ | * Sublinear speedup | ||
+ | * Linear speedup | ||
+ | * Superlinear speedup | ||
+ | * Shared resource management | ||
+ | * Unfair comparison | ||
+ | * Cache/ | ||
+ | |||
+ | | ||
+ | |||
+ | ===== Lecture 21b (5.12 Thu.) ===== | ||
+ | |||
+ | * Memory consistency / memory ordering | ||
+ | * Ordering of operations | ||
+ | * Local ordering | ||
+ | * Global ordering | ||
+ | * Sequential consistency | ||
+ | * Weaker memory consistency | ||
+ | * Memory fence instructions | ||
+ | * Consequences of Sequential Consistency | ||
+ | * Issues with Sequential Consistency | ||
+ | * Global order requirement | ||
+ | * Aggressiveness | ||
+ | * Out-of-order execution | ||
+ | * Higher performance | ||
+ | * Burden on the programmer | ||
+ | * Mutual exclusion | ||
+ | * Protecting shared data | ||
+ | * Ease of debugging | ||
+ | * Correctness | ||
+ | * MIMD processor | ||
+ | * Dataflow processor | ||
+ | | ||
+ | |||
+ | ===== Lecture 22 (6.12 Fri.) ===== | ||
+ | |||
+ | * Cache coherence | ||
+ | * Memory consistency | ||
+ | * Shared memory model | ||
+ | * Software coherence | ||
+ | * Coarse-grained (page-level) | ||
+ | * Non-cacheable | ||
+ | * Fine-grained (cache flush) | ||
+ | * Hardware coherence | ||
+ | * Valid/ | ||
+ | * Write propagation | ||
+ | * Write serialization | ||
+ | * Update vs. Invalid | ||
+ | * Snoopy bus | ||
+ | * Directory | ||
+ | * Exclusive bit | ||
+ | * Directory optimizations (bypassing) | ||
+ | * Snoopy cache | ||
+ | * Shared bus | ||
+ | * VI protocol | ||
+ | * MSI (Modified, Shared, Invalid) | ||
+ | * Exclusive state | ||
+ | * MESI (Modified, Exclusive, Shared, Invalid) | ||
+ | * Illinois Protocol (MESI) | ||
+ | * Broadcast | ||
+ | * Bus request | ||
+ | * Downgrade/ | ||
+ | * Snoopy invalidation | ||
+ | * Cache-to-cache transfer | ||
+ | * Writeback | ||
+ | * MOESI (Modified, Owned, Exclusive, Shared, Invalid) | ||
+ | * Directory coherence | ||
+ | * Race conditions | ||
+ | * Totally-ordered interconnect | ||
+ | * Directory-based protocols | ||
+ | * Set inclusion test | ||
+ | * Linked list | ||
+ | * Bloom filters | ||
+ | * Contention resolution | ||
+ | * Ping-ponging | ||
+ | * Synchronization | ||
+ | * Shared-data-structure | ||
+ | * Token Coherence | ||
+ | * Coherence for NDAs | ||
+ | * Optimistic execution | ||
+ | * Signature | ||
+ | * Commit/ | ||
+ | |||
+ | ===== Lecture 23 (12.12 Thu.) ===== | ||
+ | * Interconnects | ||
+ | * Cache coherence | ||
+ | * Interconnect networks: | ||
+ | * Topology | ||
+ | * Routing | ||
+ | * Buffering and flow control | ||
+ | * Oversubscription of routers | ||
+ | * Terminology: | ||
+ | * Network Interface | ||
+ | * Link | ||
+ | * Switch/ | ||
+ | * Channel | ||
+ | * Node | ||
+ | * Message | ||
+ | * Packet | ||
+ | * Flit | ||
+ | * Direct/ | ||
+ | * Properties of a Network Topology: | ||
+ | * Regular/ | ||
+ | * Routing distance | ||
+ | * Diameter | ||
+ | * Average distance | ||
+ | * Bisection Bandwidth | ||
+ | * Blocking/ | ||
+ | * Topologies: | ||
+ | * Bus | ||
+ | * P2P | ||
+ | * Crossbar | ||
+ | * Ring | ||
+ | * Tree | ||
+ | * Omega | ||
+ | * Hypercube | ||
+ | * Mesh | ||
+ | * Torus | ||
+ | * Butterfly | ||
+ | * cost, latency, contention, energy, bandwidth, overall performance | ||
+ | * Circuit switching network | ||
+ | * Multistage network | ||
+ | * Fetch-and-add | ||
+ | * Unidirectional Ring | ||
+ | * Bidirectional rings | ||
+ | * Hierarchical rings | ||
+ | * Mesh: asymmetricity on the edge | ||
+ | * Torus | ||
+ | * H-tree | ||
+ | * Fat-tree | ||
+ | * Hyper-cube. Cosmic Cube | ||
+ | * Routing mechanism: Arithmetic, Source-based, | ||
+ | * Types of routing algorithm: deterministic, | ||
+ | * Deadlock | ||
+ | * Oblivious routing | ||
+ | * Adaptive Routing: minimal adaptive, non-minimal adaptive | ||
+ | * Flow Control | ||
+ | * Handling contention: buffer, drop or misroute | ||
+ | * Flow control methods: | ||
+ | * Circuit switching | ||
+ | * Bufferless | ||
+ | * Store and forward | ||
+ | * Virtual cut through | ||
+ | * Wormhole | ||
+ | * Performance and congestion at high loads | ||
+ | * Store-and-forward | ||
+ | * Cut-though flow control | ||
+ | * Wormhole flow control | ||
+ | * Head of Line Blocking | ||
+ | |||
+ | ===== Lecture 24 (13.12 Fri.) ===== | ||
+ | * Load latency curve | ||
+ | * Performance of interconnection networks | ||
+ | * On-chip networks | ||
+ | * Difference between off-chip and on-chip networks | ||
+ | * Network buffers | ||
+ | * Efficient routing | ||
+ | * Advantages of on-chip interconnects | ||
+ | * Pin constraints | ||
+ | * Wiring resources | ||
+ | * Disadvantages of on-chip interconnects | ||
+ | * Energy/ | ||
+ | * Tradeoffs of interconnect design | ||
+ | * Buffers in NoC routers | ||
+ | * Bufferless routing | ||
+ | * Flit-level routing | ||
+ | * Deflection routing | ||
+ | * Buffer and link energy consumption | ||
+ | * Self-throttling | ||
+ | * Livelock freedom problem | ||
+ | * Golden packet for livelock freedom | ||
+ | * Reassembly buffers | ||
+ | * Packet retransmission | ||
+ | * Packet scheduling |
buzzword.1573761372.txt.gz · Last modified: 2019/11/14 19:56 by firtinac