This is an old revision of the document!

Buzzwords

Buzzwords are terms that are mentioned during lecture which are particularly important to understand thoroughly. This page tracks the buzzwords for each of the lectures and can be used as a reference for finding gaps in your understanding of course material.

Lecture 1 (19.09 Thu.)

Computer Architecture
Redundancy
Bahnhof Stadelhofen
Santiago Calatrava
Oculus
Design constraints
Falling Water
Frank Lloyd Wright
Sustainability
Evaluation criteria for designs
- Functionality
- Reliability
- Space requirement
- Expandability
Principled design
Role of the (Computer) Architect
Systems programming
Digital design
Levels of transformation
- Algorithm
- System software
- Instruction Set Architecture (ISA)
- Microarchitecture
- Logic
Abstraction layers
Hamming code
Hamming distance
User-centric view
Productivity
Multi-core systems
Caches
DRAM memory controller
DRAM banks
Energy efficiency
Memory performance hog
Slowdown
Consolidation
QoS guarantees
Unfairness
Row decoder
Column address
Row buffer hit/miss
Row buffer locality
FR-FCFS
Stream/Random access patterns
Memory scheduling policies
Scheduling priority
DRAM cell
Access transistor
DRAM refresh
DRAM retention time
Variable retention time
Retention time profile
Manufacturing process variation
Bloom filter
Data pattern dependence
Variable retention time
Error Correcting Codes (ECC)

Lecture 2a (20.09 Fri.)

DRAM refresh
DRAM cell
Wordline
Bitline
Refresh overhead
Retention time
Manufacturing process variation
Data Pattern Dependence (DPD)
Variable Retention Time (VRT)
DRAM retention failures
Bloom filter

Lecture 3 (26.09 Thu.)

Fundamentally Secure/Reliable/Safe Architectures
Fundamentally Energy-Efficient Architectures
Memory-centric (Data-centric) Architectures
Fundamentally Low-Latency Architectures
Architectures for Genomics, Medicine, Health
Genome Sequence Analysis
Reference Genome
Read Mapping
Read Alignment/Verification
Edit Distance
In-Memory DNA Sequence Analysis
Memory Bottleneck
Main Memory
Storage (SSD/HDD)
The Memory Capacity Gap
DRAM Capacity, Bandwidth & Latency
Flash Memory
RowHammer
Non-Volatile Memory (NVM) (e.g., PCM, STTRAM, ReRAM, 3D Xpoint)
Emerging Memory Technologies
3D-Stacked DRAM
Hybrid Main Memory
System-Memory Co-Design
Microarchitecture
Memory-Centric System Design
Memory Interference
Memory Controllers

Lecture 4a (27.09 Fri.)

Memory problem
System-memory co-design
Heterogeneous memories
Memory scaling
Memory-centric system design
Waste management
Reliability
Intelligent memory controllers
Computations close to data
Emerging memory technologies
Resistive memory technologies
Non-volatile
Phase Change Memory (PCM)
3DXPoint
Hybrid Memories
Error Tolerance
Tolerant data
Vulnerable data
ECC
Heterogeneous-Reliability Memory
Memory Interference
QoS-aware memory
Fairness
SLA (Service Level Agreement)
Performance loss
Resource partitioning/prioritization
DRAM controllers
Machine learning
DRAM scaling

Lecture 4b (27.09 Fri.)

Rowhammer
Security
Safety
Bit flip
Maslow Hierarchy
Charge-based memory
Data retention
Flash memory
Disturbance errors
Hammered row
Victim row
Electrical interference
Cell-to-cell coupling
Security attack
kernel privileges
Page Table Entry (PTE)
Electromagnetic coupling
Conductive bridges
Hot-Carrier injection
Aggressor row
Refresh rate
Data pattern
Victim cells
weak cells
ECC
SECDED
Variable retention time
Rowhammer solutions
PARA (Probabilistic Adjacent Row Activation)

Lecture 5 (03.10 Thu.)

Genome analysis
DNA
Cell information
Genetic content
Human genome
DNA genotypes
RNA
Protein / Phenotypes
Adenine (A), Thymine (T), Guanine (G), Cytosine (C)
Supercoiled
Chromosomes
HeLa's cells (Henrietta Lacks)
Reference genome
Sequence alignment
High-throughput sequencing (HTS)
Read mapping
Hash based seed-and-extend
K-mers
Burrows-Wheeler Transform
Ferragina-Manzini Index
Edit distance
Match / Mismatch
Deletion / Insertion / Substitution
Dynamic programming
MrFAST
Verification
Seed filtering
Adjacency filtering
Cheap k-mer selection
FastHASH
Pre-alignment filtering
Hamming distance
Shifted Hamming distance
Needleman-Wunsch
Neighborhood map
GateKeeper
Magnet
Slider
GRIM-filter
Apollo
Hercules
3D-stacked memory (HMC)
Nanopore genome assembly

Lecture 6 (04.10 Thu.)

RowHammer
Security implications
Probabilistic Adjacent Row Activation (PARA)
Intelligent Controller
NAND Flash
Retention Time
Data Pattern Dependence (DPD)
Variable Retention Time (VRT)
Architecting for Security
Byzantine Failures
Computation in Memory
In-Memory Computation
Data Movement Bottlenecks
Hybrid Memory Cube (HMC)
Bulk Data Copy
Bulk Copy Initialization
RowClone
Inter Subarray Copy
Inter-Bank Copy
Memory as an Accelerator

Lecture 7 (10.10 Thu.)

Computation in Memory
Processing in Memory
Minimally Changing Memory Chips
3D-Stacked Memory
RowClone
Memory as an Accelerator
In-memory bulk bitwise operations
- Ambit
- Destructive reads
- Triple row activation
- Majority Function
- Dual contact cell
- Concurrent addition in space and in time
- Bit-serial operations
  - Connection machine
- Bitmap Index
- BitWeaving
Computing Architectures with Minimal Data Movement
Mindset on reviewing manuscripts and scientific process
Suggestions on critical paper review
Mindset issues everywhere
- Bandwidth bottleneck in Zurich Airport
- Wrong methodology in design space exploration: Building bridges across Manhattan and Brooklyn
3D-Stacked Logic+Memory
Logic Layer
Hybrid Memory Cube
High-Bandwidth Memory, Wide-IO
In-Memory Graph Processing
Key Bottlenecks in Graph Processing
Tesseract System for Graph Processing
Crossbar network

Lecture 8 (11.10 Fri.)

Processing-in-memory
3D-stacked memory
Processing-in-Memory (PIM)
2.5D Integration
Graph Processing
Tesseract
Accelerating GPU Execution
Remote Function Call
Data movement bottleneck
Google Workloads
Chrome Tab Switching
Function offloading
Transparent Offloading Mechanism (TOM)
Pointer Chasing
CoNDA
In-Memory Pointer-Chasing Accelerator (IMPICA)
PIM-Enabled Instructions (PEI)
LazyPIM
GRIM Filter

Lecture 9a (17.10 Thu.)

Target metric
Theoretical proof
Analytical modeling/estimation
Abstraction
Accuracy
Workload
RTL simulations
Design choices
Cycle-level accuracy
Design space exploration
Flexibility
High-level simulations
Low-level models
Ramulator
Modular
Extensible
IPC (instructions per cycle)
3D-stacked DRAM
DDR3
GDDR5
HBM
HMC
Wide I/O
LPDDR
Spatial locality
Bank-level parallelism

Lecture 9b (17.10 Thu.)

Data-centric architecture
Low latency memory
Low energy memory
Memory contention
QoS problem
Caching
Prefetching
Multithreading
Out-of-order execution
Runahead execution
Instruction Window
Speculative execution
DRAM Module
DRAM Chip
DIMM
Bank
Subarray
Sense Amplifier
Row buffer
I/O logic
Cross-coupled inverters
SRAM
Access transistor
Enable signal
DRAM cell
Bitline
Memory channel
Activate
Precharge
Isolation transistor
near/far segments
Profile-based page mapping
Hardware-managed cache
LRU
Inter-segment migration
Page Fault

Lecture 10 (18.10 Fri.)

Long memory latency
Tiered-latency DRAM
Bulk data movement
Inter-subarray copy
Isolation transistors
Row buffer movement (RBM)
Variable latency DRAM (VILLA)
Linked precharge (LIP)
Copy row substrate (CROW)
Multiple row activation
Subarray-level parallelism (SALP)
Bank conflicts
Row decoder
Global structures (global decoder, global row buffer, global bitlines)
Per-subarray latches
Designated latches
DRAM timing parameters
“Fixed latency mindset”
Process variation
Sensing, restore, precharge
Activation errors
Spatial latency variation
Spatial distribution of failures
DRAM aging
Systematic variation in DRAM cells
Dynamic profiling
Voltage reduction

Lecture 11 (24.10 Thu.)

PUF: Physical Unclonable Function
Challenge-response protocol
Trusted and untrusted devices
Device authentication
Runtime-accessible PUFs
Repeatability
Diffuseness
Uniform randomness
DRAM Latency PUF
DRAM Retention PUF
TRNG: True Random Number Generator
Sense Amplification
tRCD: Activation latency
D-RaNGe
RNG Cell
NIST statistical test suite
DRAM Command Scheduling RNG
Retention-based TRNGs
Start-up Values as Random Numbers
VOLTRON
Voltage reduction
DDR3L, LPDDR4
Dynamic Power
Activation latency
Spatial locality of Voltage reduction induced errors
Memory intensity
Memory stall time
Memory DVFS (Dynamic Voltage and Frequency Scaling)
EDEN
Approximate computing
Approximate DRAM
Deep neural networks (DNN)
DNN training
DNN inference
DNN Weights
Input Feature Maps (IFM)
Output Feature Maps (OFM)
Layer
Convolutional Layer
DNN Error Tolerance
Bit Error Rate (BER)
Retraining
DNN Accuracy
Accuracy collapse

Lecture 12 (25.10 Fri.)

EIN: Error INference
ECC: error correction code
Unstandardised, invisible ECC
Post-correction, pre-correction
Recover pre-correction information
Deliberately induce bit-flips
Error distribution
Error characteristics comparison
Obfuscation of error distribution
Predictable and intrinsic DRAM characteristics
Uniform-random spatial distribution
Maximum-a-posteriori (MAP)
Monte-carlo simulation
SoftMC
Characterise, analyse and understand DRAM cell
Flexible and easy-to-use API
Violating latency
Reliability
Custom timing
Simple, minimal, accessible
Retention time study
Highly-charged cell, low latency
Non-volatile memory
Flexible, easy-to-use
CROW: Copy DRAM Row
High latency
Refresh overhead
Vulnerabilities
CROW-cache, CROW-ref
Duplication, remapping
Row-copy, and two-row activation
Weak regular row, strong copy row
Eliminate refresh
Remap row hammer victim
SMASH: Sparse Matrix Acceleration using Software and hardware cooperation
Pagerank, Sparse DNN
Expensive discovery
Compressed Sparse Row
High compression ratio
Special compression formats
Hierarchy of Bitmaps
Bitmap Management Unit
Bitmap Buffers
Cross-Layer Interface

Lecture 13a (31.10 Thu.)

Memory controller
DRAM latency
DRAM throughput
Phase Change Memory, Spin-Transfer Torque Magnetic Memory
Flash memory
SSD controller
DRAM types: DDR, LPDDR, GDDR, WideIO, HBM, HMC
DRAM request
Request buffer
DRAM scheduling policy
FCFS (first come first served)
FR-FCFS (first ready, first come first served)
Row buffer management policy
DRAM timing constraints
Memory contention
Self-optimizing DRAM controller

Lecture 13b (31.10 Thu.)

Resource sharing
Partitioning
Performance isolation
Quality of service (QoS)
Fairness
Inter-thread/application interference
Unfair slowdown
Memory performance attack
Request scheduling
Bank parallelism interference
Request batching

Lecture 14 (8.11 Fri.)

SIMD
SISD
MISD
Systolic arrays
MIMD
Instruction level parallelism (ILP)
Array processor
Vector processor
VLIW: Very long instruction word
Vector length register (VLEN)
Vector stride register (VSTR)
Vector load instruction (VLD)
Intra-vector dependencies
Regular parallelism
Memory bandwidth
Vector data register
Vector control registers
Vector mask register
Vector functional units
Vector registers
VADD
Scalar operations
Memory data register
Memory address register
Interleaved memory
Memory banking
Address generator
Monolithic memory
Memory access latency
Vectorizable loops
Vector code performance
Vector data forwarding (chaining)
Vector chaining
Vector stripmining
Irregular memory access
Gather/Scather operations
Sparse vector
Masked operations
Predicated execution
Row/Column major layouts
Bank conflicts
Randomized mapping
Vector instruction level parallelism
Automatic code vectorization
Packed arithmetic
GPUs
Programming model vs execution model
SPMD
Warp (wavefront)
SIMD vs. SIMT
Warp-level FGMT
Vector lanes
Warp scheduler
Fine-grained multithreading
Warp instruction level parallelism
Warp-based SIMD vs. traditional SIMD
Multiple instruction streams
Conditional control flow instructions
Branch divergence
Dynamic warp formation
Functional unit

Lecture 15 (14.11 Thu.)

Memory Interference
Quality of Service
QoS-Aware Memory Systems
Stall-Time Fair Memory Scheduling
Parallelism-Aware Batch Scheduling
PAR-BS
ATLAS Memory Scheduler
Thread Cluster Memory Scheduling
TCM
Throughput vs. Fairness
Clustering Threads
STFM
FR-FCFS
The Blacklisting Memory Scheduler
BLISS
Staged Memory Scheduling
SMS
DASH
Current SoC Architectures
Strong Memory Service Guarantees
Predictable Performance
Handling Memory Interference In Multithreaded Applications
Barriers
Critical Sections
Data mapping
Memory Channel Partitioning
Core/source throttling
Fairness via Source Throttling

Lecture 16a (15.11 Fri.)

Shared resource contention
Slowdown estimation
Application/thread scheduling
Multi-core/many-core systems
Application/data mapping
Application prioritization
On-chip communication
Communication distance
Congestion in Network-on-Chip (NoC)
Spatial task scheduling
Clustering
Load balancing
Isolation
Radial mapping
Distributed Resource Management (DRM)
Operating-system-level metric
Microarchitecture-level metric
Architecture-aware DRM
Machine learning-based mapping/scheduling

Lecture 16b (15.11 Fri.)

Emerging memory technology
Flash memory
Memory-centric system design
Phase change memoery
Charge memory
Resistive memory
Multi-level cell
Spin-Transfer Torque Magnetic RAM (STT-MRAM)
Memristors
Resistive RAM (RRAM or ReRAM)
Intel 3D Xpoint
Capacity-latency trade-off
Capacity-reliability trade-off
Endurance
Magnetic Tunnel Junction (MTJ)
Hybrid memory
Writing filtering
Data placement
Data access pattern
Row-buffer locality
Overall system performance impact
Memory-Level Parallelism (MLP)
Utility-based hybrid memory management

Lecture 17 (21.11 Thu.)

SIMD
Multiply-accumulate
Thread block
Stream processor
Tensor core
Neural network training
Systolic arrays
Fine-grain multithreading
Warp
GPU programming
General purpose processing on GPU
GPU kernels
CUDA
OpenCL
SPMD
Grid, Block, Threads
Row major layout
Warp scheduler
Coalesced memory accesses
AoS (Array of Structures)
SoA (Structure of Arrays)
Tiling
Bank conflicts
Padding
Randomized mapping
Hash functions
Divergence
Vector reduction
Divergence-free mapping
Atomic operations
PTX
SASS
Synchronous and asynchronous transfers
Stream
Collaborative Computing
Unified memory space
Collaborative patterns

Lecture 19 (28.11 Thu.)

Hybrid Memory Systems
Large (DRAM) Cache
TIMBER
Two-Level Memory/Storage model
Volatile data
Persistent data
Single-level store
Unified Memory and storage
The Persistent Memory Manager (PMM)
ThyNVM
Heterogeneity
Asymmetry in design
Amdahl's Law
Synchronization overhead
Load imbalance overhead
Resource sharing overhead
IBM Power4
IBM Power5
Niagara Processor
Performance vs. parallelism
Asymmetric Chip Multiprocessor (ACMP)
MorphCore

Lecture 20 (29.11 Fri.)

Heterogeneity
Pointer-chasing
Critical section
Asymmetry
Accelerated
Data Marshalling
False Serialization
Shared Data
Private Data
Amdahl’s Law
Barriers
Identification
Migration
Private Data
Feedback Directed Pipelining
Staged Execution
Inter-segment Data
Pipeline Parallelism
Dynamic heterogeneity

Lecture 21a (5.12 Thu.)

Persistent memory
Crash consistency
Checkpointing
Flynn's taxonomy of computers
Parallelism
Performance
Power consumption
Cost efficiency
Dependability
Instruction level parallelism
Data parallelism
Task level parallelism
Utilization
Redundancy
Efficiency
Amdahl's law
Bottlenecks in parallel portion
Multiprocessor
Loosely coupled multiprocessors
Tightly coupled multiprocessors
Shared global memory address space
Shared memory synchronization
Interconnects
Programming issues in tightly coupled multiprocessor
Sublinear speedup
Linear speedup
Superlinear speedup
Shared resource management
Unfair comparison
Cache/memory effect

Lecture 21b (5.12 Thu.)

Memory consistency / memory ordering
Ordering of operations
Local ordering
Global ordering
Sequential consistency
Weaker memory consistency
Memory fence instructions
Consequences of Sequential Consistency
Issues with Sequential Consistency
Global order requirement
Aggressiveness
Out-of-order execution
Higher performance
Burden on the programmer
Mutual exclusion
Protecting shared data
Ease of debugging
Correctness
MIMD processor
Dataflow processor

Lecture 22 (6.12 Fri.)

Cache coherence
Memory consistency
Shared memory model
Software coherence
- Coarse-grained (page-level)
- Non-cacheable
- Fine-grained (cache flush)
Hardware coherence
Valid/invalid
Write propagation
Write serialization
Update vs. Invalid
Snoopy bus
Directory
- Exclusive bit
Directory optimizations (bypassing)
Snoopy cache
Shared bus
VI protocol
MSI (Modified, Shared, Invalid)
Exclusive state
MESI (Modified, Exclusive, Shared, Invalid)
Illinois Protocol (MESI)
Broadcast
Bus request
Downgrade/upgrade
Snoopy invalidation
Cache-to-cache transfer
Writeback
MOESI (Modified, Owned, Exclusive, Shared, Invalid)
Directory coherence
Race conditions
Totally-ordered interconnect
Directory-based protocols
Set inclusion test
Linked list
Bloom filters
Contention resolution
Ping-ponging
Synchronization
Shared-data-structure
Token Coherence
Coherence for NDAs
Optimistic execution
Signature
Commit/re-execute

Lecture 23 (12.12 Thu.)

Interconnects
Cache coherence
Interconnect networks:
- Topology
- Routing
- Buffering and flow control
  - Oversubscription of routers
Terminology:
- Network Interface
- Link
- Switch/router
- Channel
- Node
- Message
- Packet
- Flit
- Direct/Indirect network
Properties of a Network Topology:
- Regular/Irregular
- Routing distance
- Diameter
- Average distance
- Bisection Bandwidth
- Blocking/non-blocking. Rearrangeable non-blocking
Topologies:
- Bus
- P2P
- Crossbar
- Ring
- Tree
- Omega
- Hypercube
- Mesh
- Torus
- Butterfly
cost, latency, contention, energy, bandwidth, overall performance
Circuit switching network
Multistage network
Fetch-and-add
Unidirectional Ring
Bidirectional rings
Hierarchical rings
Mesh: asymmetricity on the edge
Torus
H-tree
Fat-tree
Hyper-cube. Caltech's “The Cosmic Cube”
Routing mechanism: Arithmetic, Source-based, Table-based lookup
Types of routing algorithm: deterministic, oblivious, adaptive
Deadlock
Oblivious routing: Valiant’s algorithm
Adaptive Routing: minimal adaptive, non-minimal adaptive
Flow Control
Handling contention: buffer, drop or misroute
Flow control methods:
- Circuit switching
- Bufferless
- Store and forward
- Virtual cut through
- Wormhole
Performance and congestion at high loads
Store-and-forward
Cut-though flow control
Wormhole flow control
Head of Line Blocking

Computer Architecture - Fall 2019

Table of Contents