User Tools

Site Tools


buzzword

Buzzwords

Buzzwords are terms that are mentioned during lecture which are particularly important to understand thoroughly. This page tracks the buzzwords for each of the lectures and can be used as a reference for finding gaps in your understanding of course material.

Lecture 1 (20.09 Wed.)

  • Hardware/Software codesign
  • Principles, not precedent
  • Level of transformation
    • Algorithm
    • System software
    • Compiler
  • Cross abstraction layers
  • Tradeoffs
  • Caches
  • DRAM/memory controller
  • DRAM banks
  • Row buffer hit/miss
  • Row buffer locality
  • Unfairness
  • Memory performance hog
  • Shared DRAM memory system
  • Streaming access vs. random access
  • Memory scheduling policies
  • Scheduling priority
  • Retention time of DRAM
  • Manufacturing process variation
  • Variable retention time
  • Retention time profile
  • Power consumption
  • Bloom filter
  • Hamming code
  • Hamming distance
  • DRAM row hammer
  • Byzantine failures

Lecture 2 (21.09 Thu.)

  • Pasteur: “Chance favors the prepared mind”
  • Amdahl’s law
  • Moore’s law
  • Specialized accelerators (e.g., TPU)
  • Software innovation
  • Paradigm shift
  • Technology scaling
  • Power/energy constraints
  • Memory wall/gap
  • Reliability
  • Revolutionary Science
  • Re-examined assumptions
  • Von Neumann model
  • Dataflow model
  • Instruction pointer (program counter)
  • Microarchitecture
  • ISA vs. Microarchitecture (Interface vs. Implementation)
  • Design point
  • Application Space
  • Tradeoffs (ISA-level, uArch-level, System-level)
  • Concurrent execution paradigms
  • Virtual memory vs. Physical memory
  • Backing store
  • DRAM and SRAM
  • Wordlines and bitlines
  • Memory bank organization and operation
  • Row decoder and column decoder
  • Memory hierarchy
  • Fast and small vs. Big and slow
  • Cache/Pipeline
  • Caching basics: Temporal locality and spatial locality
  • The bookshelf analogy
  • Cache hierarchy
  • Manual vs. automatic management
  • A modern memory hierarchy: L1, L2, L3, main, swap disk (demand paging)
  • Scratchpad memory
  • Hierarchical latency analysis
  • Hit rate, miss rate
  • Recursive latency equation
  • Software caches
  • Cache terminology: Block (line), hit or miss, design decisions (placement, replacement, granularity of management, write policy, instructions/data)
  • Best/Average/Worst case memory latency
  • Average memory access time (AMAT)
  • Cache addressing
  • Tag/data store
  • Direct-mapped cache
  • Set-associative cache
  • Full-associative cache
  • Diminishing returns for higher associativity

Lecture 3 (27.09 Wed.)

  • ISCA - International Symposium on Computer Architecture
  • Locality
  • Access pattern
  • Random access
  • Priorities
  • Insertion
  • Promotion
  • Eviction/replacement
  • Replacement policy
  • LRU
  • FIFO
  • Approximations of LRU
  • Not MRU
  • Hierarchical LRU
  • Victim/Next-Victim replacement
  • Random replacement policy
  • Dirty bit
  • Set thrashing
  • Optimal replacement policy
  • Page replacement
  • Write-back vs. write through
  • Allocation vs. no-allocation on write miss
  • Subblocked or Sectored caches
  • Instruction vs. Data caches
  • Separate vs Unified
  • Multi-level caching
  • Serial vs. parallel access of levels
  • Cache size
  • Working set
  • Block size
  • Critical word first
  • Associativity
  • Classification of cache misses
    • Compulsory miss
    • Capacity miss
    • Conflict miss
  • Prefetching
  • Victim cache
  • Hashing
  • Pseudo-associativity (Poor man’s associative cache)
  • Skewed associative cache
  • Restructuring data access patterns
  • Column-major vs. row-major
  • Loop interchange
  • Blocking
  • Restructuring data layout
  • Miss latency/cost
  • Memory Level Parallelism (MLP)
  • Isolated vs. parallel misses
  • Belady’s vs. MLP-aware replacement
  • Hybrid cache replacement
  • Tag store/directory
  • Auxiliary tag store
  • Tournament selection
  • Policy adaptation (SBAR)
  • Multiple outstanding accesses
  • Non-blocking or Lockup-free caches
  • Miss Status Handling Registers (MSHR)

Lecture 4 (28.09 Thu.)

  • Hybrid cache replacement
  • Heterogeneity
  • True multiporting
  • Semaphores
  • Virtual multiporting
  • Multiple cache copies
  • Bank (interleaving)
  • Bank conflict
  • Crossbar interconnect
  • DRAM
  • The memory problem
  • The energy perspective
  • The reliability perspective
  • Requirements
  • Memory controller
  • Genomics
  • Consolidation
  • The memory capacity gap
  • Memory bandwidth
  • Memory latency
  • International Technology Roadmap for Semiconductors (ITRS)
  • Emerging memory technologies
  • 3D-Stacked DRAM
  • Through-silicon vias
  • Reduced-latency DRAM
  • Low-power DRAM
  • Non-volatile memories
  • Limits of charge memory
  • DRAM scaling problems
  • DRAM vulnerabilities
  • RowHammer
  • Security implications
  • Probabilistic Adjacent Row Activation (PARA)
  • Heterogeneous memories / Hybrid memory systems
  • Software/hardware/device cooperation
  • New memory architectures
  • Phase Change Memory (PCM)
  • Memory error tolerance
  • Vulnerable vs. tolerant data
  • Error Correcting Codes (ECC)
  • Memory interference
  • Quality of Service (QoS)
  • Predictable performance
  • Memory bank organization
  • Physical addressability
  • Alignment
  • Interleaving schemes
  • Rank
  • DRAM row, aka DRAM page
  • Sense amplifiers, aka row buffer
  • DRAM bank structure
  • Sub-bank
  • Asynchronous DRAM
  • DRAM burst
  • DRAM module (e.g., DIMM)
  • DRAM chip
  • Multiple DIMMs
  • DRAM channel

Lecture 5 (04.10 Wed.)

  • Main memory
  • Memory controller
  • Serial Presence Detector (SPD)
  • Cache block access
  • Controller transfer time
  • Queuing/Scheduling delay
  • DRAM transfer time
  • DRAM bank latency
  • Worst case (row conflict)
  • Multiple banks
  • Multiple channels
  • Lower order bits
  • Hash function
  • Address mapping
  • Row interleaving
  • Operating system
  • Virtual-address/Physical-address
  • Page coloring
  • DRAM refresh
  • Burst refresh
  • Distributed refresh
  • QoS
  • SSD controller
  • DRAM types
  • DDR
  • Low power DRAMs (LPDDR)
  • High bandwidth DRAM (GDDR)
  • Low latency DRAM (EDRAM, ELDRAM)
  • 3D-stacked DRAM (WIO, HBM, HMC)
  • Controller functions
  • Correct operation
  • Refresh/timing
  • Buffer/scheduling
  • Power/thermal
  • Command scheduling
  • FCFS
  • FR-FCFS
  • Maximize row buffer hit rate
  • Request age
  • Request type
  • Request criticality
  • Interference caused to other cores
  • Row buffer management
  • Open/Close
  • Adaptive policies
  • DRAM power management
  • Active (highest power)
  • All banks idle
  • Power down
  • Self-refresh (lowest power)
  • State transition
  • DRAM controller design
  • Heterogeneous agents (CPUs, GPUs, and HWAs)
  • Self-optimizing dram controllers
  • Maximize the long-term bus utilization
  • How to evaluate?
  • Theoretical proof
  • Analytical modeling/estimation
  • Simulation
  • Prototyping (FPGA)
  • Real implementation
  • Workload dependent
  • Design choices
  • System parameters
  • Exploration of many dreams
  • High-level simulation
  • Relative effects
  • Speed
  • Flexibility
  • Accuracy
  • Ramulator
  • Sense amplifier
  • Shared internal bus
  • Subarray
  • DRAM long latency
  • Maximize capacity
  • One size fits all approach
  • Tiered-latency DRAM
  • Near-segment
  • Far-segment

Lecture 6 (05.10 Thu.)

  • Memory latency
  • Memory latency-voltage-reliability relationship
  • Processing in memory (PIM)
  • Heterogeneous (imperfect) manufacturing
  • Manufacturing variation
  • Standard latency
  • Yield
  • Conservative timing margins
  • Charge leakage - Temperature
  • Restoration time
  • Reliable timing parameters
  • DRAM temperature
  • AL-DRAM
  • Multi-programmed and multi-threaded workloads
  • Latency reduction and energy reduction
  • Activation errors
  • Flexible-Latency (FLY) DRAM
  • Spatial latency variation
  • Design-induced variation
  • Systematic variation
  • Design-Induced-Variation-Aware (DIVA) DRAM
  • Error Correcting Codes (ECC)
  • Memory voltage
  • Low-voltage memory
  • Spatial locality of errors
  • Voltage-induced errors
  • Energy savings
  • Bounded performance
  • Dynamic voltage and frequency scaling
  • Data movement
  • In-memory computation/processing
  • Near-data processing (NDP)
  • Processor-centric design
  • Grossly-imbalanced systems
  • Paradigm shift
  • Data-centric architectures
  • Hybrid Memory Cube (HMC)
  • Automata processor
  • Bulk data copy
  • RowClone
  • Bulk initialization
  • Cache coherence
  • Bulk bitwise operations
  • Dual Contact Cell (DCC)
  • Ambit
  • BitWeaving
  • 3D-stacked logic and memory
  • Graph processing
  • Tesseract in-memory accelerator

Lecture 7 (11.10 Wed.)

  • PIM
  • Emerging memory technologies
  • Hybrid memory systems
  • Perform computation where the data resides
  • Hybrid Memory Cube (HMC)
  • 3D-stacked: HBM, WIO, HMC
  • Vaults
  • In-order cores in the logic lager
  • Heat problem
  • Communications across chips
  • Remote function calls
  • Message pasing
  • Tesseract
  • Non-blocking remote function call
  • Message queue
  • Prefetching
  • High Bandwidth
  • OoO (Out-of-Order) cores
  • In-order cores
  • Accelerator
  • Graph processing algorithms
  • Memory bandwidth comsumption
  • Simulation
  • Model of execution
  • Programming model
  • Memory layers
  • Logic layers
  • Specialized cores
  • Off-chip link
  • Scalability
  • PIM-enabled instructions
  • Cache coherence
  • Virtual memory
  • Virtual address
  • Cache block
  • Locality predictor: execute in the processor or in memory?
  • Page rank
  • PIM fence (pfence)
  • Stream processing
  • Hash table
  • Single-cache-block restriction
  • Virtual memory pages
  • Microarchitecture
  • PIM computation unit
  • Locality monitor
  • Computation units
  • Functional units
  • Data-intensive workloads
  • Off-chip transfer
  • Memory side, processor side
  • Offload instructions
  • Data mapping
  • IMPICA (In-memory Pointer chasing accelerator)
  • Address-access decoupling
  • Linked list
  • Hash table
  • B-tree
  • Chasing pointers: serial code
  • Irregular access pattern
  • Memory access latency
  • Memory level parallelism (MLP)
  • Address translation challenge
  • Page table walk
  • TLB
  • MMU
  • Microbenchmarks
  • Reconfigurable logic
  • LAZY PIM
  • DRAM scaling problem
  • Memory-centric system design
  • PCM: Phase change memory
  • STT-MRAM
  • Flash memory
  • Charge memory
  • Resistive memory
  • Memristors
  • Polarity
  • PCM states: amorphous, crystalline
  • Multi-level Cell PCM (MLC-PCM)
  • Write endurance
  • Persistent
  • Reliability issues
  • Resistance drift
  • Hybrid DRAM-PCM
  • Buffering
  • Security problem
  • Magnetic Tunnel Junction (MTJ)
  • Heterogeneity
  • Wear out problem
  • HW/SW management
  • Data migration
  • Granularity of data movement
  • Write filtering techniques: Lazy write, partial writes, page bypass
  • System level simulation
  • Streaming accesses
  • Row miss latency
  • Row hit latency
  • Data placement
  • Performance model
  • Tag store
  • Tag cache
  • Dynamic data transfer granularity
  • TIMBER
  • Banshee
  • DRAM cache
  • TLB coherence problem
  • Merging of memory and storage
  • File system
  • Load/store interface
  • Operating system
  • Persistent memory management
  • System calls
  • Unified interface to all data

Lecture 8 (18.10 Wed.)

  • SIMD
  • GPU
  • Flynn’s taxonomy
  • Systolic arrays
  • Google’s TPU
  • Array processor
  • Vector processor
  • Data parallelism
  • Vector register
  • VLIW
  • Vector instruction
  • Vector length
  • Vector stride
  • Mask register
  • Vector functional units
  • CRAY-1
  • Memory banks
  • Vectorizable loop
  • Chaining (data forwarding)
  • Conditional operations
  • Multipart memory
  • Vector stripmining
  • Scatter/gather operations
  • Address indirection
  • Sparse vector
  • Row/column major
  • Vector Instruction Level Parallelism
  • Automatic code vectorization
  • SIMD ISA extensions
  • Intel Pentium MMX
  • Multimedia registers
  • Image overlapping

Lecture 9 (19.10 Thu.)

  • GPU
  • Programming model
  • Sequential
  • SIMD
  • Multi-threaded
  • SPMD
  • Execution model
  • SIMT
  • Warp (wavefront)
  • Multithreading of warps
  • Warp-level FGMT
  • Shader core
  • Scalar pipeline
  • Latency-hiding
  • Interleave warp execution
  • Registers of thread ID
  • Warp instruction level parallelism
  • Warp-based SIMD vs. Traditional SIMD
  • Control flow path
  • Branch divergence
  • SIMD utilization
  • Dynamic warp formation
  • GPGPU programming
  • Inherent parallelism
  • Data parallelism
  • GPU main bottlenecks
  • CPU-GPU data transfers
  • DRAM memory
  • Task offloading
  • Serial code (host)
  • Parallel code (device)
  • Bulk synchronization
  • Transparent scalability
  • Memory hierarchy
  • CUDA programming language
  • Indexing and memory access
  • Streaming multiprocessor (SM)
  • Streaming processor (cuda core)
  • Occupancy
  • Memory coalescing
  • Shared memory tiling
  • Bank conflict
  • Padding

Lecture 10 (25.10 Wed.)

  • Branch prediction, branch
  • Branch prediction accuracy, misprediction
  • Perceptron-based branch predictor
  • Taken path, not-taken path
  • Branch divergence
  • Predicated execution
  • Control dependence handling
  • Program counter, instruction pointer
  • Variable-size instruction
  • Control-flow instruction
  • (Un)Conditional, call, return, indirect branches
  • Branch target
  • Branch delay slot
  • Fine-grained multithreading
  • Multipath execution
  • Branch resolution latency
  • Wrong-path instruction
  • Branch Target Buffer (BTB)
  • Branch metadata
  • Profile-based prediction, hint bits
  • Static branch prediction
  • Program-based branch prediction
  • Programmer-based prediction, likely-taken likely-not-taken pragmas
  • Hot-path of the code, basic block
  • Last time predictor
  • Branch History Table, Pattern History Table
  • Two-bit counter based prediction, adding hysteresis to the prediction, bimodal prediction
  • Global/local branch correlation
  • Global history register
  • Gshare predictor
  • Branch filtering
  • Gskew predictor
  • Agree prediction
  • Bias bits
  • Bi-mode predictor, the YAGS predictor, Alpha EV8 brach predictor
  • Hybrid branch predictors
  • Branch predictor warmup time
  • Tournament Predictor
  • Loop branch detector and predictor
  • Perceptron branch predictor
  • Hybrid history length branch predictor
  • Helper Threading
  • Branch Confidence Estimation
  • Pipeline Gating
  • Line & Way Prediction
  • Wide Fetch Engines, Superscalar, VLIW, SIMT
  • I-Cache instruction alignment
  • Fetch break
  • Split-line fetch
  • Code reordering, basic block reordering
  • Superblock, superblock formation
  • Trace cache

Lecture 11 (26.10 Thu.)

  • Control dependence handling
  • Superblock
  • Basic block reordering
  • Trace cache
  • Instruction cache
  • Multiple branch predictor
  • Partial segments
  • Branch promotion
  • Highly-biased branches
  • Fill unit optimization
  • Intel Pentium 4 Trace Cache
  • Delayed branching
  • Branch delay slot
  • Delayed branch with squashing
  • Fine-grained multi-threading
  • Multi-threaded pipeline
  • GPU warps
  • Predicate combining
  • Predication (Predicated execution)
  • Conditional move operations
  • Hard-to-predict branches
  • Easy-to-predict branches
  • Predicated Execution in Intel Itanium
  • Misprediction cost
  • Conditional execution
  • Wish branches
  • Dynamic predicated execution
  • Multi-path execution
  • Call and return prediction
  • Return address stack
  • Indirect branch prediction
  • Virtual conditional branches
  • Virtual program counter (VPC) prediction
  • Branch target buffer (BTB)
  • Prediction latency

Lecture 12 (01.11 Wed.)

  • Revolving door analogy
  • Resource sharing
  • Quality of Service (QoS)
  • Multiple hardware contexts
  • Utilization/efficiency
  • Contention for resources
  • Performance isolation
  • Uncontrolled (free-for-all) sharing
  • Unfair sharing
  • Shared resource management
  • Unpredictable performance (or lack of QoS)
  • Service Level Agreement (SLA)
  • Overprovision
  • Partitioning (dedicated space)
  • Memory performance hog
  • FR-FCFS
  • Denial of Service (DoS)
  • Distributed DoS
  • Packet-switched routers
  • Inter-thread interference
  • QoS-aware memory systems
  • Smart resources
    • QoS-aware memory controller
    • QoS-aware interconnect
    • QoS-aware caches
  • Dumb resources
    • Injection control
    • Data mapping
  • Prioritization / requests scheduling
  • Fair memory scheduling
  • Stall-time tracking / estimation
  • Bank parallelism interference
  • Parallelism-aware scheduler
  • Request batching
  • PAR-BS
  • Within-batch scheduling
  • Thread ranking
  • Shortest job first
  • Shortest stall-time first
  • Multiple memory controllers
  • Throughput biased
  • Fairness biased
  • Misses Per Kilo Instructions (MPKI)
  • Priority shuffle
  • Vulnerability to interference
  • Tunable knobs
  • Blacklisting
  • Performance vs fairness vs simplicity
  • Heterogeneous CPU-GPU systems
  • Staged memory scheduling
  • Heterogenous agents
  • Hardware accelerator
  • Accelerator deadlines
  • Memory-intensive vs memory-non-intensive cores
  • DASH scheduling policy

Lecture 13 (02.11 Thu.)

  • Memory scheduling
  • Predictable performance
  • Satisfy performance / SLA requirements
  • Estimate performance loss
  • Resource partitioning / Prioritization
  • Unpredictable application slowdowns
  • Memory bound applications
  • Memory request service rate
  • Interval-based operation
  • Inaccuracy in estimating slowdown
  • Interference cycles
  • Soft slowdown
  • Parallel applications
  • Serialization
  • Critical sections
  • Barriers
  • Pipeline stages
  • Prioritize requests from limiter threads
  • Data mapping
  • Partitioning memory channels
  • Source throttling
  • Throttle down/up
  • Run-time unfairness evaluation
  • MSHR quota / Request injection rate
  • Latency-load curve
  • Saturation throughput
  • Many-core on-chip communication
  • Memory controller placement
  • Shared cache bank placement
  • Application-to-core mapping
  • Clustering
  • Balancing
  • Isolation
  • Radical mapping
  • Interference-aware thread scheduling
  • Microarchitecture-aware
  • Decoupled DMA

Lecture 14 (08.11 Wed.)

  • GPU computing
  • GPU kernel
  • Massively parallel sections
  • Shared memory
  • Data transfers
  • Kernel launch
  • Latency hiding
  • Occupancy
  • Memory coalescing
  • Data reuse
  • Shared memory tiling
  • SIMD utilization
  • Atomic operations
  • Histogram calculation
  • CUDA streams
  • Asynchronous transfers
  • Heterogeneous systems
  • Unified memory
  • System-wide atomic operations
  • Collaborative computing
  • CPU+GPU collaboration
  • Collaborative patterns
    • Data partitioning
    • Task partitioning
      • Coarse-grained
      • Fine-grained
  • Bézier surfaces
  • NVIDIA Pascal
  • NVIDIA Volta
  • HSA
  • AMD Kaveri
  • Padding
  • Stream compaction
  • Breadth-First Search
  • Atomic-based global synchronization
  • RANSAC
  • Chai benchmark suite
  • CPU+FPGA collaboration

Lecture 15 (15.11 Wed.)

  • Caches
  • Multi-core
  • Multithreading
  • Pressure in the memory/cache hierarchy
  • Private/shared caches
  • Fairness
  • Quality of Service - QoS
  • Shared data
  • Interference
  • Cache Coherence
  • Consistency problem
  • Software Cache Coherence
  • Hardware Cache Coherence
  • Flush-local
  • Flush-global
  • Snooping
  • Share bus interconnect
  • Write-through
  • Hit rate
  • cache capacity
  • LRU
  • Cache-friendly/unfriendly
  • Controlled cache sharing
  • Cache utilization
  • Hardware Cache partitioning
  • Utility based share cache partitioning
  • Cache way
  • Cache set
  • Utility
  • Marginal utility of a cache way
  • Utility monitors (UMON)
  • Partitioning Algorithm (PA)
  • ATD (auxiliary tag directory/store)
  • Hit counters
  • Dynamic Set sampling
  • Way partitioning
  • Weighted Speedup
  • Throughput
  • Hmean-fairness
  • Greedy Algorithm (GA)
  • Lookahead algorithm
  • Fair shared Cache partitioning
  • slowdown
  • Repartition
  • Block granularity partitioning
  • Thread scheduling
  • Page coloring
  • Static cache partitioning
  • Dynamic cache partitioning
  • Cache quota
  • Page re-coloring
  • Performance isolation
  • Cooperative caching
  • Spill-Receive Architecture
  • Distributed caches
  • Locks ping-pong between processors
  • Critical section
  • Non-Uniform cache access
  • Cache efficiency
  • Memory bandwidth requirement
  • Bandwidth filter
  • Cache placement
  • Streaming accesses
  • Non-streaming accesses
  • DoA Blocks
  • Cache insertion policy
  • Circular reference model
  • Dynamic insertion policy
  • High-reuse blocks
  • Low-reuse blocks
  • Cache pollution
  • Cache Trashing
  • Cache Pollution
  • Bimodal insertion policy
  • Evicted-Address filter (EAF)
  • Bloom filter
  • Compression
  • Decompression
  • Frequent value
  • Frequent pattern
  • Narrow values
  • Base+Delta encoding
  • Base Delta Immediate compression

Lecture 16 (16.11 Thu.)

  • Heterogeneity/specialization/customization
  • Asymmetry in design
  • Hybrid memory systems
  • Retention time, refresh rate
  • Packet switched network
  • Circuit switched network
  • Mesh interconnect
  • serial/parallel code sections
  • Serial bottleneck
  • Synchronization overhead
  • Load imbalance
  • Resource contention
  • Critical section, barrier
  • Tile small, tile large cores
  • Asymmetric Chip Multiprocessor
  • Scalability
  • False serialization
  • Private/shared data
  • Staged execution model
  • Segment spawning
  • Producer-consumer pipeline parallelism
  • Data Marshaling
  • Frequency/Voltage scaling

Lecture 17 (22.11 Wed.)

  • Asymmetry
  • Boosting of frequency
  • Dynamic Voltage and Frequency Scaling (DVFS)
  • Energy expended per instruction (EPI)
  • EPI throttling
  • Memory latency tolerance
  • Out-of-order execution
  • Long-latency instructions
  • Instruction window
  • Full-window stall
  • Caching
  • Prefetching
  • Multithreading
  • Runahead execution
  • Memory Level Parallelism (MLP)
  • Pseudo-retirement
  • Runahead cache
  • IBM POWER6
  • Sun Rock
  • Pre-execution based prefetching
  • Dependent cache misses
  • Value prediction
  • Address-value delta prediction (AVD)
  • Regularity in data structures
  • Traversal address loads
  • Leaf address loads
  • Wrong path events

Lecture 18 (23.11 Thu.)

  • Prefetching
  • Speculation
  • Compulsory cache misses
  • Slipstreaming processing
  • Misprediction penalty
  • Reduce cache miss rate
  • Reduce cache miss latency
  • Prefetch accuracy
  • Challenges of prefetching
  • What to prefetch?
  • When to prefetch?
  • Timeliness of the prefetcher
  • Aggressive prefetching
  • Where to place the prefetched data?
  • Prefetch buffer
  • Which level of cache to prefetch into?
  • Where to place the hardware prefetcher?
  • Software prefetching
  • Hardware prefetching
  • Execution-based prefetchers
  • Skip list
  • Next-line prefetchers
  • Access stride
  • Stride prefetchers
  • Instruction based stride prefetching
  • Stream buffer
  • Tradeoffs in stride prefetching
  • Prefetcher performance: accuracy, coverage, timeliness
  • Bandwidth consumption of prefetcher
  • Cache pollution of prefetcher
  • Prefetch distance
  • Prefetch degree
  • Aggressive vs. conservative prefetcher
  • Regular vs. irregular access patterns
  • Address correlation based prefetching
  • Correlation table
  • Markov model
  • Content directed prefetching
  • Speculative thread in execution-based prefetchers
  • Hybrid hardware prefetcher
  • Speculative thread
  • Thread-based pre-execution
  • Runahead execution
  • Prefetching for multicore
  • Utility-based prioritization
  • Hierarchical prefetcher throttling

Lecture 19 (29.11 Wed.)

  • Persistent memory
  • Crash consistency
  • Checkpointing
  • Flynn´s taxonomy of computers
  • Parallelism
  • Performance
  • Power consumption
  • Cost efficiency
  • Dependability
  • Instruction level parallelism
  • Data parallelism
  • Task level parallelism
  • Multiprocessor
  • Loosely coupled
  • Tightly coupled
  • Shared global memory address space
  • Shared memory synchronization
  • Cache coherence
  • Memory consistency
  • Shared resource management
  • Interconnects
  • Programming issues in tightly coupled multiprocessor
  • Sublinear speedup
  • Linear speedup
  • Superlinear speedup
  • Unfair comparison
  • Cache/memory effect
  • Utilization
  • Redundancy
  • Efficiency
  • Amdahl's law
  • Bottlenecks in parallel portion
  • Ordering of operations
  • Sequential consistency
  • Weaker memory consistency
  • Memory fence instructions
  • Higher performance
  • Burden on the programmer
  • Coherence scheme
  • Valid/invalid
  • Write propagation
  • Write serialization
  • Update vs. Invalid
  • Snoopy bus
  • Directory

Lecture 20 (30.11 Thu.)

  • Cache coherence
  • Snoopy bus
  • Directory
  • Directory optimizations
  • Directory bypassing
  • Snoopy cache
  • Shared bus
  • VI protocol
  • MSI (Modified, Shared, Invalid)
  • Exclusive state
  • MESI (Modified, Exclusive, Shared, Invalid)
  • Illinois Protocol (MESI)
  • Broadcast
  • Bus request
  • Downgrade
  • Upgrade
  • Snoopy invalidation
  • Cache-to-cache transfer
  • Writeback
  • MOESI (Modified, Owned, Exclusive, Shared, Invalid)
  • Directory coherence
  • Race conditions
  • Totally-ordered interconnect
  • Directory-based protocols
  • Set inclusion test
  • Linked list
  • Bloom filters
  • Contention resolution
  • Ping-ponging
  • Synchronization
  • Shared-data-structure
  • Token Coherence
  • Virtual bus

Lecture 21 (14.12 Thu.)

  • Interconnection Network, Interconnect
  • Topology
  • Routing
  • Buffering and Flow Control
  • Switch/Router
  • Channel
  • Wire
  • Packet
  • Path
  • Bus
  • Mesh, 2D Mesh
  • Throttling
  • Oversubscription
  • Network Interface
  • Link
  • Node
  • Message
  • Flit
  • Direct/Indirect Network
  • Radix
  • Regular/Irregular Topology
  • Routing Distance
  • Diameter
  • Bisection Bandwidth
  • Congestion
  • Blocking/non-blocking Interconnect
  • Crossbar
  • Ring
  • Tree
  • Omega
  • Hypercube
  • Torus
  • Butterfly
  • Arbitration
  • Point-to-Point
  • Multistage Network
  • Hop
  • Circuit Switching
  • Packet Switching
  • Tree saturation
  • Deadlock
  • Circular dependency
  • Oblivious Routing
  • Adaptive Routing
  • Packet Format
  • Header
  • Payload
  • Error Code
  • Virtual Channel Flow Control

Lecture 22 (20.12 Wed.)

  • Load latency curve
  • Performance of interconnection networks
  • On-chip networks
  • Difference between off-chip and on-chip networks
  • Network buffers
  • Efficient routing
  • Advantages of on-chip interconnects
  • Pin constraints
  • Wiring resources
  • Disadvantages of on-chip interconnects
  • Energy/power constraint
  • Tradeoffs of interconnect design
  • Buffers in NoC routers
  • Bufferless routing
  • Flit-level routing
  • Deflection routing
  • Buffer and link energy consumption
  • Self-throttling
  • Livelock freedom problem
  • Golden packet for livelock freedom
  • Reassembly buffers
  • Packet retransmission
  • Packet scheduling
buzzword.txt · Last modified: 2019/02/12 17:33 (external edit)