User Tools

Site Tools


buzzword

Buzzwords

Buzzwords are terms that are mentioned during lecture which are particularly important to understand thoroughly. This page tracks the buzzwords for each of the lectures and can be used as a reference for finding gaps in your understanding of course material.

Lecture 1 (19.09 Wed.)

  • Computer Architecture
  • Redundancy
  • Bahnhof Stadelhofen
  • Santiago Calatrava
  • Oculus
  • Design constraints
  • Falling Water
  • Frank Lloyd Wright
  • Sustainability
  • Evaluation criteria for designs
    • Functionality
    • Reliability
    • Space requirement
    • Expandability
  • Principled design
  • Role of the (Computer) Architect
  • Systems programming
  • Digital design
  • Levels of transformation
    • Algorithm
    • System software
    • Instruction Set Architecture (ISA)
    • Microarchitecture
    • Logic
  • Abstraction layers
  • Hamming code
  • Hamming distance
  • User-centric view
  • Productivity
  • Multi-core systems
  • Caches
  • DRAM memory controller
  • DRAM banks
  • Energy efficiency
  • Memory performance hog
  • Slowdown
  • Consolidation
  • QoS guarantees
  • Unfairness
  • Row decoder
  • Column address
  • Row buffer hit/miss
  • Row buffer locality
  • FR-FCFS
  • Stream/Random access patterns
  • Memory scheduling policies
  • Scheduling priority
  • DRAM cell
  • Access transistor
  • DRAM refresh
  • DRAM retention time
  • Variable retention time
  • Retention time profile
  • Manufacturing process variation
  • Bloom filter
  • Data pattern dependence
  • Variable retention time
  • Error Correcting Codes (ECC)

Lecture 2 (20.09 Thu.)

  • Rowhammer
  • Memory reliability
  • DRAM, access transistor, capacitor
  • Disturbance errors
  • DRAM refresh
  • DRAM row activation/precharge
  • DRAM cell
  • DRAM scaling problems
  • DRAM vulnerabilities
  • Security implications
  • Page Table Entry
  • DRAM Row remapping
  • Aggressor/victim DRAM row
  • Error correcting codes (ECC)
  • Probabilistic Adjacent Row Activation (PARA)
  • Memory controller
  • Byzantine Failures
  • Variable Retention Time (VRT)
  • Non-volatile memories
  • 3D NAND Flash

Lecture 3a (26.09 Wed.)

  • Amdahl’s law
  • Application Space
  • Dataflow model
  • Decoupled Access Execute
  • Design point
  • Instruction pointer (program counter)
  • ISA vs. Microarchitecture (Interface vs. Implementation)
  • Memory wall/gap
  • Microarchitecture
  • Moore’s law
  • Out-of-order Execution
  • Paradigm shift
  • Pipelining
  • Power/energy constraints
  • Re-examined assumptions
  • Reliability
  • Revolutionary Science
  • SIMD (Single Instruciton Multiple Data)
  • Software innovation
  • Specialized accelerators (e.g., TPU)
  • Superscalar Execution
  • Systolic Array
  • Technology scaling
  • Tradeoffs (ISA-level, uArch-level, System-level)
  • VLIW (Very Long Instruction Word)
  • Von Neumann model

Lecture 3b (26.09 Wed.)

  • A modern memory hierarchy: L1, L2, L3, main, swap disk (demand paging)
  • Average memory access time (AMAT)
  • Backing store
  • Best/Average/Worst case memory latency
  • Cache addressing
  • Cache hierarchy
  • Cache terminology: Block (line), hit or miss, design decisions (placement, replacement, granularity of management, write policy, instructions/data)
  • Cache/Pipeline
  • Caching basics: Temporal locality and spatial locality
  • Concurrent execution paradigms
  • Diminishing returns for higher associativity
  • Direct-mapped cache
  • DRAM and SRAM
  • Fast and small vs. Big and slow
  • Full-associative cache
  • Hierarchical latency analysis
  • Hit rate, miss rate
  • Manual vs. automatic management
  • Memory bank organization and operation
  • Memory hierarchy
  • Memory-level Parallelism
  • Recursive latency equation
  • Row decoder and column decoder
  • Scratchpad memory
  • Set-associative cache
  • Software caches
  • Tag/data store
  • The bookshelf analogy
  • Virtual memory vs. Physical memory
  • Wordlines and bitlines

Lecture 4a (27.09 Thu.)

  • Cache write policies
  • Sectored caches
  • Cache block / subblock
  • Instruction / data caches
  • Separate / unified instruction and data caches
  • Multilevel caches and pipelined design
  • Storing tag and data in caches
  • Cache miss types: compulsory, capacity, and conflict misses.
  • Cache tiling / blocking
  • Victim cache
  • Hashing and pseudo-associativity in Cache
  • Skewed associative caches
  • Restructuring data access patterns
  • Column-major and row-major patterns
  • Loop fusion
  • Array merging
  • Cache blocking
  • Non-blocking Caches
  • Memory Level Parallelism
  • Hybrid Cache Replacement
  • MSHR
  • True multiporting
  • Virtual multiporting
  • Multiple cache copies
  • Banking (Interleaving)
  • Bank conflicts

Lecture 4b (27.09 Thu.)

  • Main memory system
  • Open source tools: Rowhammer, Ramulator, MemSim, NOCulator, SoftMC, MQSim, Mosaic, IMPICA, SMLA, HWASim
  • Runahead execution
  • Energy and reliability perspectives
  • Rowhammer issue
  • Trends and challenges on memory systems
  • Memory issues related to capacity, bandwidth, QoS, energy, and scaling
  • Trends of memory capacity, bandwidth, and latency
  • Emerging memory technologies
  • Limits of charge memory
  • Software/Hardware device cooperation

Lecture 5 (03.10 Wed.)

  • Hybrid cache replacement
  • Heterogeneity
  • True multiporting
  • Semaphores
  • Virtual multiporting
  • Multiple cache copies
  • Bank (interleaving)
  • Bank conflict
  • Crossbar interconnect
  • DRAM
  • The memory problem
  • The energy perspective
  • The reliability perspective
  • Requirements
  • Memory controller
  • Genomics
  • Consolidation
  • The memory capacity gap
  • Memory bandwidth
  • Memory latency
  • International Technology Roadmap for Semiconductors (ITRS)
  • Emerging memory technologies
  • 3D-Stacked DRAM
  • Through-silicon vias
  • Reduced-latency DRAM
  • Low-power DRAM
  • Non-volatile memories
  • Limits of charge memory
  • DRAM scaling problems
  • DRAM vulnerabilities
  • RowHammer
  • Security implications
  • Probabilistic Adjacent Row Activation (PARA)
  • Heterogeneous memories / Hybrid memory systems
  • Software/hardware/device cooperation
  • New memory architectures
  • Phase Change Memory (PCM)
  • Memory error tolerance
  • Vulnerable vs. tolerant data
  • Error Correcting Codes (ECC)
  • Memory interference
  • Quality of Service (QoS)
  • Predictable performance
  • Memory bank organization
  • Physical addressability
  • Alignment
  • Interleaving schemes
  • Rank
  • DRAM row, aka DRAM page
  • Sense amplifiers, aka row buffer
  • DRAM bank structure
  • Sub-bank
  • Asynchronous DRAM
  • DRAM burst
  • DRAM module (e.g., DIMM)
  • DRAM chip
  • Multiple DIMMs
  • DRAM channel

Lecture 6 (04.10 Thu.)

  • DRAM Latency
  • Row Level Temporal Locality (RLTL)
  • ChargeCache
  • Memory Controller
  • DRAM Operations
  • ACTIVATE
  • PRECHARGE
  • Sense-Amplifier
  • Sensing
  • Restore
  • Charge Sharing
  • tRCD
  • tRAS
  • Refresh Operations
  • SoftMC
  • FPGA
  • Tests
  • DRAM Characterization
  • Reliability
  • Flexibility
  • Ease of use
  • API
  • REAPER
  • DRAM Retention Failure Profiling
  • Retention Time
  • Data Pattern Dependence
  • Variable Retention Time
  • LPDDR4
  • ECC
  • Temperature
  • PUF
  • Physical Unclonable Functions

Lecture 7 (10.10 Wed.)

  • SIMD
  • SISD
  • MISD
  • Systolic arrays
  • MIMD
  • Instruction level parallelism (ILP)
  • Array processor
  • Vector processor
  • VLIW: Very long instruction word
  • Vector length register (VLEN)
  • Vector stride register (VSTR)
  • Vector load instruction (VLD)
  • Intra-vector dependencies
  • Regular parallelism
  • Memory bandwidth
  • Vector data register
  • Vector control registers
  • Vector mask register
  • Vector functional units
  • Vector registers
  • VADD
  • Scalar operations
  • Memory data register
  • Memory address register
  • Interleaved memory
  • Memory banking
  • Address generator
  • Monolithic memory
  • Memory access latency
  • Vectorizable loops
  • Vector code performance
  • Vector data forwarding (chaining)
  • Vector chaining
  • Vector stripmining
  • Irregular memory access
  • Gather/Scather operations
  • Sparse vector
  • Masked operations
  • Predicated execution
  • Row/Column major layouts
  • Bank conflicts
  • Randomized mapping
  • Vector instruction level parallelism
  • Automatic code vectorization
  • Packed arithmetic
  • GPUs
  • Programming model vs execution model
  • SPMD
  • Warp (wavefront)
  • SIMD vs. SIMT
  • Warp-level FGMT
  • Vector lanes
  • Warp scheduler
  • Fine-grained multithreading
  • Warp instruction level parallelism
  • Warp-based SIMD vs. traditional SIMD
  • Multiple instruction streams
  • Conditional control flow instructions
  • Branch divergence
  • Dynamic warp formation
  • Functional unit

Lecture 8 (11.10 Thu.)

  • Genome analysis
  • DNA
  • Cell information
  • Genetic content
  • Human genome
  • DNA genotypes
  • RNA
  • Protein / Phenotypes
  • Adenine (A), Thymine (T), Guanine (G), Cytosine (C)
  • Supercoiled
  • Chromosomes
  • HeLa's cells (Henrietta Lacks)
  • Reference genome
  • Sequence alignment
  • High-throughput sequencing (HTS)
  • Read mapping
  • Hash based seed-and-extend
  • K-mers
  • Burrows-Wheeler Transform
  • Ferragina-Manzini Index
  • Edit distance
  • Match / Mismatch
  • Deletion / Insertion / Substitution
  • Dynamic programming
  • MrFAST
  • Verification
  • Seed filtering
  • Adjacency filtering
  • Cheap k-mer selection
  • FastHASH
  • Pre-alignment filtering
  • Hamming distance
  • Shifted Hamming distance
  • Needleman-Wunsch
  • Neighborhood map
  • GateKeeper
  • Magnet
  • Slider
  • GRIM-filter
  • 3D-stacked memory (HMC)
  • Nanopore genome assembly

Lecture 9 (17.10 Wed.)

  • Control dependence handling
  • Control-flow instruction
  • Perceptron-based branch predictor
  • Taken path, not-taken path
  • Program counter, instruction pointer
  • Branch types: conditional, unconditional, call, return, indirect
  • Pipeline stall
  • Branch prediction
  • Predicated execution
  • Fine-grained multithreading
  • Multipath execution
  • Branch prediction accuracy, misprediction
  • Branch resolution latency
  • Wrong-path instruction
  • Branch target address
  • Pipeline flush
  • Branch misprediction penalty
  • Direction predictor
  • Branch Target Buffer (BTB)
  • Global branch history
  • Branch metadata
  • Compile time (static) branch prediction
  • Profile-based prediction, hint bits
  • Program-based branch prediction
  • Programmer-based prediction, likely-taken likely-not-taken pragmas
  • Dynamic compiler
  • Dynamic branch prediction
  • Last time predictor
  • Branch History Table (BHT)
  • Two-bit counter based prediction (adding hysteresis), a.k.a bimodal prediction
  • Two-level prediction
  • Global branch correlation
  • Global History Register (GHR)
  • Pattern History Table (PHT)
  • (Global history) per set of branches (multiple PHTs, Intel Pentium Pro)
  • Gshare predictor
  • Branch filtering
  • Biased branches
  • Agree prediction
  • Bias bits
  • Gskew predictor
  • Bi-mode predictor, the YAGS predictor, Alpha EV8 brach predictor
  • Local branch correlation
  • Local History Registers (LHR)
  • Two-level local history branch predictor
  • Two-level predictor taxonomy
  • Hybrid (heterogeneous) branch predictors
  • Branch predictor warmup time
  • Tournament Predictor
  • Loop branch detector and predictor
  • Perceptron branch predictor
  • Hybrid history length branch predictor
  • Tagged and prediction by the longest history matching entry (TAGE)
  • Helper Threading (a.k.a microthreading)
  • Branch Confidence Estimation
  • Pipeline Gating

Lecture 10a (18.10 Thu.)

  • Analytical modeling/estimation
  • Simulation
  • Prototyping (FPGA)
  • Real implementation
  • Workload dependent
  • Design choices
  • System parameters
  • Exploration of many dreams
  • High-level simulation
  • Relative effects
  • Speed
  • Flexibility
  • Accuracy
  • DRAM types
  • Ramulator

Lecture 10b (18.10 Thu.)

  • DDR
  • Low power DRAMs (LPDDR)
  • High bandwidth DRAM (GDDR)
  • Low latency DRAM (EDRAM, ELDRAM)
  • 3D-stacked DRAM (WIO, HBM, HMC)
  • DRAM controller design
  • Ramulator
  • Sense amplifier
  • Shared internal bus
  • Subarray
  • DRAM long latency
  • Maximize capacity
  • Memory latency tolerance
  • One size fits all approach
  • Tiered-latency DRAM
  • Near-segment
  • Far-segment
  • Low-Cost Interlinked Subarrays (LISA)
  • Bulk data movement
  • Manufacturing variation (i.e., process variation)
  • DRAM timing parameters

Lecture 11a (24.10 Wed.)

  • Latency Variation
  • Solar-DRAM
  • Design-induced Variation
  • DIVA-DRAM
  • Reliability
  • Voltage-Latency-Reliability
  • Voltron
  • DRAM Latency PUF
  • ChargeCache
  • CAL
  • Low Latency
  • DRAM Power Consumption
  • Data Pattern Dependence
  • Structural Variation

Lecture 11b (24.10 Wed.)

  • Intelligent Memory Controllers
  • Processing-in-memory
  • 3D-stacked memory
  • Near-data processing
  • in-memory computation
  • imbalanced system
  • Communication Dominates Arithmetic
  • Page copy
  • Page initialization
  • Through-Silicon Vias
  • Micron Automata
  • Hybrid Memory Cube
  • High-bandwidth memory
  • Rowclone
  • LISA
  • Ambit
  • Majority Function

Lecture 12 (25.10 Thu.)

  • Processing-in-memory
  • 3D-stacked memory
  • Processing-in-Memory (PIM)
  • 2.5D Integration
  • Graph Processing
  • Tesseract
  • Remote Function Call
  • Data movement bottleneck
  • Google Workloads
  • Function offloading
  • Transparent Offloading Mechanism (TOM)
  • Pointer Chasing
  • In-Memory Pointer-Chasing Accelerator (IMPICA)
  • PIM-Enabled Instructions (PEI)
  • LazyPIM
  • GRIM Filter

Lecture 13 (31.10 Wed.)

  • Charge memory
  • New memory architectures
  • Resistive memory technologies
  • Phase change memory (PCM)
  • STT-MRAM
  • Memristors (ReRaM)
  • Multi-level cell PCM (MLC-PCM)
  • Endurance
  • Hybrid memory systems
  • Multiple memory technologies
  • PCM-DRAM
  • Data placement
  • Memory-level parallelism (MLP)
  • Utility-Based Hybrid Memory Management
  • Tags in Memory
  • TIMBER
  • Data Granularity
  • Non-Volatile Memory (NVM)
  • Persistent Memory (PM)
  • Persistent Memory Manager (PMM)
  • Content Delivery Network (CDN)

Lecture 14a (1.11 Thu.)

  • Persistent Memory
  • (Non)Volatile Data and non-volatile data
  • Block Remapping
  • Crash Consistency
  • Emerging Memory Technologies
  • Journaling
  • Page Writeback
  • Data Checkpointing
  • Shadow Paging

Lecture 14b (1.11 Thu.)

  • 3D NAND Flash Memory
  • BCH ECC
  • Bit Error Rate
  • Capacitative Coupling
  • Charge Trap Based 3D Flash Cell
  • Control Gate
  • Data Retention in Flash Memory
  • Early Retention Loss
  • ECC Correction Capability
  • Error Correction Code
  • Fast-/slow-leaking Flash Memory Cells
  • Flash Correct-and-Refresh (FCR)
  • Flash Memory
  • Flash Memory Block
  • Flash Memory Cell Threshold Voltage
  • Flash Memory Endurance
  • Flash Memory Page
  • Flash Memory String
  • Flash Read Disturb Errors
  • Floating Gate Transistor
  • Incremental Step Pulse Programming (ISPP0)
  • LI-RAID: Layer-Interleaved RAID
  • NAND Flash
  • NOR Flash
  • P/E Cycle
  • P/E cycle count (PEC)
  • Planar vs. 3D NAND Flash Memory
  • Process Variation
  • Raw Bit Error Rate (RBER)
  • Read Reference Voltage
  • Read Reference Voltage Prediction
  • Read-Disturbance Error
  • Read-Retry
  • Sense Amplifiers
  • Solid-State Drive
  • Threshold Voltage Distribution
  • Uncorrectable Bit Error Rate (UBER)
  • Uncorrectable Errors
  • Wearout Period

Lecture 15 (14.11 Wed.)

  • Program Interference
  • Voltage threshold shift learning
  • Read reference voltage prediction
  • Victim/Aggressor word line
  • Conditional Reading
  • Neighbor assisted correction
  • Read Disturb Errors in Flash memory
  • weak programming
  • pass-through voltage
  • Vpass Tuning
  • Unused ECC capabilities
  • Disturb-Prone cells
  • Disturb-resistant cells
  • Read Disturb Oriented Error Recovery (RDR)
  • Retention Error Handling
  • Retention Failure Recovery (RFR)
  • Retention Optimized Reading
  • bathtub curve
  • Read Disturbance
  • Write Amplification
  • Retention interference
  • Retention loss
  • process variation
  • Raw Bit Error Rate (RBER)
  • Hot/Cold Page
  • Write-hotness aware management

Lecture 16 (15.11 Thu.)

  • Resource sharing
  • Contention for resources
  • Multiple hardware contexts
  • Performance isolation
  • Revolving door analogy
  • Quality of Service (QoS)
  • Shared resource management
  • Utilization/efficiency
  • Uncontrolled (free-for-all) sharing
  • Unfair sharing
  • Unpredictable performance (or lack of QoS)
  • Service Level Agreement (SLA)
  • Overprovision
  • Partitioning (dedicated space)
  • Memory performance hog
  • FR-FCFS
  • Denial of Service (DoS)
  • Distributed DoS
  • Packet-switched routers
  • Inter-thread interference
  • QoS-aware memory systems
  • Smart resources
  • QoS-aware memory controller
  • QoS-aware interconnect
  • QoS-aware caches
  • Dumb resources
  • Injection control
  • Data mapping
  • Prioritization / requests scheduling
  • Fair memory scheduling
  • Stall-time tracking / estimation
  • Bank parallelism interference
  • Parallelism-aware scheduler
  • Request batching
  • PAR-BS
  • Within-batch scheduling
  • ATLAS
  • Thread ranking
  • Shortest job first
  • Shortest stall-time first
  • Multiple memory controllers
  • Thread clusters
  • TCM
  • Niceness
  • Throughput biased
  • Fairness biased
  • Misses Per Kilo Instructions (MPKI)
  • Priority shuffle
  • Vulnerability to interference
  • Tunable knobs
  • Blacklisting
  • Performance vs fairness vs simplicity

Lecture 17 (21.11 Wed.)

  • Memory System
  • Memory Controller
  • Heterogeneous Agents
  • DASH
  • Staged memory scheduling (SMS)
  • First-ready first-come first-serve (FR-FCFS)
  • Memory Interference-Induced Slowdown Estimation (MISE)
  • Service Level Agreements (SLA)
  • Stall-time Fair Memory (STFM)
  • Quality of Service (QoS)
  • Soft QoS
  • Multithreaded applications
  • Row-Buffer Locality
  • Channel Partitioning
  • Page mapping
  • request buffer

Lecture 18a (22.11 Thu.)

  • Fundamental interference control techniques
  • Core/Source throttling
  • Smart resources
  • Dynamic unfairness estimation
  • Throttling cores' memory access rates
  • FST: Fairness via Source Throttling
  • Runtime unfairness evaluation
  • Dynamic request throttling
  • Request injection rate
  • Application/Thread scheduling
  • Many-core on-chip communication
  • Shared cache bank
  • Spatial task scheduling
  • Clustering, balancing, isolation, and radial mapping
  • Network power
  • Microarchitecture unawareness
  • Operating-system-level metrics and microarchitecture-level metrics
  • Architecture-aware distributed resource management (DRM)
  • Interference-aware thread scheduling
  • Memory quality of service (QoS) approaches and techniques
  • Smart vs dump components
  • Cache interference management
  • Interconnect interference management
  • DRAM designs to reduce interference
  • SoftMC
  • PIM accelerators
  • Decoupled direct memory access (DDMA)

Lecture 18b (22.11 Thu.)

  • Multi-core issues in caching
  • Cache coherence
  • Flush-local and flush-global
  • Snoopy cache coherence
  • Free for all sharing
  • Controlled cache sharing
  • Hardware -based cache partitioning
  • Marginal utility of a cache way
  • Dynamic set sampling
  • UCP
  • Optimal partitioning: Greedy and look-ahead algorithms
  • Dynamic fair caching
  • Software-based shared cache partitioning
  • Page coloring
  • Static cache partitioning
  • Dynamic cache partitioning via page re-coloring

Lecture 19a (28.11 Thu.)

  • Controlled Shared Caching
  • Cache spilling
  • Cooperative caching
  • DSR: Dynamic spill-receive
  • Set dueling
  • Cooperative caching
  • Handling shared data in provate caches
  • Non-uniform cache access
  • Multi-core cache efficiency
  • Cache compression
  • Decompression latency
  • Compression ratio
  • Zero compression
  • Frequent value compression
  • Frequent pattern compression
  • Base-delta immediate compression
  • Toggle-aware compression for GPU systems
  • Core-assisted bottleneck acceleration in GPUs
  • Cache placement
  • Cache insertion policies: MRU, LRU
  • LIP: LRU insertion position (Low-prioirity insertion policy)
  • BIP: Bimodal insertion policy
  • DIP: Dynamic insertion policy
  • Circular reference model
  • Cache pollution
  • Cache thrashing
  • Reuse prediciton
  • EAF: Evicted-address filter
  • TA-DIP: Thread-aware dynamic insertion policy
  • Run-time bypassing
  • Single-usage block prediction
  • SHIP: Signature-based block prediction
  • Miss classification table
  • s-curve
  • ASM: Application slowdown model
  • Cache access rate
  • Memory access rate
  • Auxilary tag store

Lecture 19b (28.11 Thu.)

  • Heterogeneity and asymmetry
  • CRAY-1 design
  • Scalar machine and vector pipeline machine
  • RAIDR
  • DRAM + Phase change memory
  • Reliable, costly DRAM + Unreliable, cheap DRAM
  • Heterogeneous retention time
  • Tilera
  • Packet switching and circuit switching
  • TDN, MDN, IDN, UDN, and STN
  • General purpose vs special purpose
  • Heterogeneity of CPU and GPUs
  • Predictability and robustness

Lecture 20 (29.11 Thu.)

  • DRAM
  • NVM
  • Flash
  • Processing in Memory
  • Hardware Security
  • Heterogeneous Multi-Core Systems
  • Bottleneck Acceleration
  • Heterogeneity (Asymmetry)
  • Symmetric design
  • One-size-fits-all
  • Quality of Service (QoS)
  • Hybrid Memory Controllers
  • Heterogeneous agents (e.g., CPUs, GPUs, and HWAs)
  • Heterogeneous memories: Fast vs. Slow DRAM
  • Heterogeneous interconnects: Control, Data, Synchronization
  • Amdahl’s Law
  • Synchronization overhead
  • Load imbalance overhead
  • Resource sharing overhead
  • Sequential portions (Amdahl’s “serial part”)
  • Critical sections
  • Barriers
  • Asymmetric Chip Multiprocessor (ACMP)
  • Bottleneck Acceleration
  • Staged Execution
  • Data Marshaling
  • Phase Change Memory

Lecture 21 (05.12 Wed.)

  • GPU
  • Programming model
  • Sequential
  • SIMD
  • SPMD
  • SIMT
  • Warp (wavefront)
  • Multithreading of warps
  • Warp-level FGMT
  • Latency-hiding
  • Interleave warp execution
  • Registers of thread ID
  • Warp-based SIMD vs. Traditional SIMD
  • GPGPU programming
  • Inherent parallelism
  • Data parallelism
  • GPU main bottlenecks
  • CPU-GPU data transfers
  • DRAM memory
  • Task offloading
  • Serial code (host)
  • Parallel code (device)
  • Bulk synchronization
  • Transparent scalability
  • Memory hierarchy
  • Indexing and memory access
  • Streaming multiprocessor (SM)
  • Streaming processor (Vector lane)
  • Occupancy
  • Memory coalescing
  • Shared memory tiling
  • Bank conflict
  • Padding
  • SIMD utilization
  • Atomic operations
  • Histogram calculation
  • CUDA streams
  • Asynchronous transfers
  • Heterogeneous systems
  • Unified memory
  • System-wide atomic operations
  • Collaborative computing
  • CPU+GPU collaboration
  • Collaborative patterns
    • Data partitioning
    • Task partitioning
      • Coarse-grained
      • Fine-grained
  • Bézier surfaces
  • NVIDIA Pascal
  • NVIDIA Volta
  • Padding
  • Chai benchmark suite

Lecture 22 (6.12 Thu.)

  • Persistent memory
  • Crash consistency
  • Checkpointing
  • Flynn´s taxonomy of computers
  • Parallelism
  • Performance
  • Power consumption
  • Cost efficiency
  • Dependability
  • Instruction level parallelism
  • Data parallelism
  • Task level parallelism
  • Multiprocessor
  • Loosely coupled
  • Tightly coupled
  • Shared global memory address space
  • Shared memory synchronization
  • Cache coherence
  • Memory consistency
  • Shared resource management
  • Interconnects
  • Programming issues in tightly coupled multiprocessor
  • Sublinear speedup
  • Linear speedup
  • Superlinear speedup
  • Unfair comparison
  • Cache/memory effect
  • Utilization
  • Redundancy
  • Efficiency
  • Amdahl's law
  • Bottlenecks in parallel portion
  • Ordering of operations
  • Sequential consistency
  • Weaker memory consistency
  • Memory fence instructions
  • Higher performance
  • Burden on the programmer
  • Coherence scheme
  • Valid/invalid
  • Write propagation
  • Write serialization
  • Update vs. Invalid
  • Cache coherence
  • Snoopy bus
  • Directory
  • Directory optimizations
  • Directory bypassing
  • Snoopy cache
  • Shared bus
  • VI protocol
  • MSI (Modified, Shared, Invalid)
  • Exclusive state
  • MESI (Modified, Exclusive, Shared, Invalid)
  • Illinois Protocol (MESI)
  • Broadcast
  • Bus request
  • Downgrade
  • Upgrade
  • Snoopy invalidation
  • Cache-to-cache transfer
  • Writeback
  • MOESI (Modified, Owned, Exclusive, Shared, Invalid)
  • Directory coherence
  • Race conditions
  • Totally-ordered interconnect
  • Directory-based protocols
  • Set inclusion test
  • Linked list
  • Bloom filters
  • Contention resolution
  • Ping-ponging
  • Synchronization
  • Shared-data-structure
  • Token Coherence
  • Virtual bus

Lecture 23 (12.12 Wed.)

  • Interconnection Network, Interconnect
  • Topology
  • Routing
  • Buffering and Flow Control
  • Switch/Router
  • Channel
  • Wire
  • Packet
  • Path
  • Bus
  • Mesh, 2D Mesh
  • Throttling
  • Oversubscription
  • Network Interface
  • Link
  • Node
  • Message
  • Flit
  • Direct/Indirect Network
  • Radix
  • Regular/Irregular Topology
  • Routing Distance
  • Diameter
  • Bisection Bandwidth
  • Congestion
  • Blocking/non-blocking Interconnect
  • Crossbar
  • Ring
  • Tree
  • Omega
  • Hypercube
  • Torus
  • Butterfly
  • Arbitration
  • Point-to-Point
  • Multistage Network
  • Hop
  • Circuit Switching
  • Packet Switching
  • Tree saturation
  • Deadlock
  • Circular dependency
  • Oblivious Routing
  • Adaptive Routing
  • Packet Format
  • Header
  • Payload
  • Error Code
  • Virtual Channel Flow Control

Lecture 24 (13.12 Thu.)

  • Load latency curve
  • Performance of interconnection networks
  • On-chip networks
  • Difference between off-chip and on-chip networks
  • Network buffers
  • Efficient routing
  • Advantages of on-chip interconnects
  • Pin constraints
  • Wiring resources
  • Disadvantages of on-chip interconnects
  • Energy/power constraint
  • Tradeoffs of interconnect design
  • Buffers in NoC routers
  • Bufferless routing
  • Flit-level routing
  • Deflection routing
  • Buffer and link energy consumption
  • Self-throttling
  • Livelock freedom problem
  • Golden packet for livelock freedom
  • Reassembly buffers
  • Packet retransmission
  • Packet scheduling
buzzword.txt · Last modified: 2019/09/20 11:09 by juang