User Tools

Site Tools


readings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

readings [2017/12/01 13:02] – [Lecture 19 (29.11 Wed.)] mohammadreadings [2019/02/12 16:35] (current) – external edit 127.0.0.1
Line 2: Line 2:
 ====== Readings ====== ====== Readings ======
  
-===== Guides on how to review papers critically ===== 
-  * Lecture slides: {{onur-CompArch-f17-how-to-do-the-paper-reviews.pdf | pdf}} {{onur-CompArch-f17-how-to-do-the-paper-reviews.ppt | Slides ppt}} 
-  * Example reviews on "Main Memory Scaling: Challenges and Solution Directions" [[https://people.inf.ethz.ch/omutlu/pub/main-memory-scaling_springer15.pdf|(link to the paper)]] 
-      * {{review-chapter.pdf | Review 1}} 
-      * {{review-chapter-2.pdf | Review 2}} 
-  * Example review on "Staged memory scheduling: Achieving high performance and scalability in heterogeneous systems" [[https://people.inf.ethz.ch/omutlu/pub/staged-memory-scheduling_isca12.pdf|(link to the paper)]] 
-      * {{review-sms.pdf | Review 1}} 
  
-===== Lecture 1 (20.09 Wed.) ===== +==== Papers for Review ==== 
-=== Described in detail during lecture 1: === +{{ :paper_review_guidelines.pdf |Paper Review Guidelines}}
-  * {{https://people.inf.ethz.ch/omutlu/pub/mph_usenix_security07.pdf|T. Moscibroda and O. Mutlu. "Memory performance attacks: denial of memory service in multi-core systems," USENIX Security Symposium 2007}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/raidr-dram-refresh_isca12.pdf|J. Liu, B. Jaiyen, R. Veras, O. Mutlu, "RAIDR: Retention-Aware Intelligent DRAM Refresh," ISCA 2012}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/dram-retention-time-characterization_isca13.pdf|J. Liu, B. Jaiyen, Y. Kim, C. Wilkerson, O. Mutlu, "An Experimental Study of Data Retention Behavior in Modern DRAM Devices: Implications for Retention Time Profiling Mechanisms," ISCA 2013}} +
-   * {{p422-bloom.pdf|B.H. Bloom, "Space/Time Trade-offs in Hash Coding with Allowable Errors," CACM, 1970}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/dram-row-hammer_isca14.pdf|Y. Kim, R. Daly, J. Kim, C. Fallin, J.H. Lee, D. Lee, C. Wilkerson, K. Lai, O. Mutlu, "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors," ISCA 2014}}+
  
-=== Suggested (lecture 1): === +  * {{kim-isca14.pdf| Kim et al.,"Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors", ISCA 2014}}  
-  * {{youandyourresearch.pdf|R.W. Hamming, "You and Your Research," Transcription of the Bell Communications Research Colloquium Seminar, 1986}} +  * {{p105-ahn.pdf |Ahn et al., "A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing", ISCA 2015}} 
-    * [[http://www.youtube.com/watch?v=a1zDuOPkMSw|youtube]] +  * {{parbs_isca08-old.pdf |Mutlu et al.,"Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems", ISCA 2008}}
-  * {{p128-rixner.pdf|S. Rixner, W.J. Dally, U.J. Kapasi, P. Mattson, J.D. Owens, "Memory access scheduling,ISCA 2000}} +
-  * {{US5630096.pdf|Zuravleff and Robinson"Controller for a synchronous DRAM the maximizes throughput by allowing memory requests and commands to be issued out of order," US Patent 5,630,096, 1997}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/stfm_micro07.pdf|O. Mutlu and T. Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors,MICRO 2007}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/parbs_isca08.pdf|O. Mutlu and TMoscibroda, "Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems,ISCA 2008}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/memory-channel-partitioning-micro11.pdf|S.P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, T. Moscibroda, "Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning," MICRO 2011}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/memory-scaling_memcon13.pdf|O. Mutlu, "Memory Scaling: A Systems Architecture Perspective," Technical talk at MEMCON 2013}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/dram-access-refresh-parallelization_hpca14.pdf|K. Chang, D. Lee, Z. Chishti, A. Alameldeen, C. Wilkerson, Y. Kim, O. Mutlu, "Improving DRAM Performance by Parallelizing Refreshes with Accesses," HPCA 2014}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/eaf-cache_pact12.pdf|V. Seshadri, O. Mutlu, M.A. Kozuch, T.C. Mowry, "The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing," PACT 2012}} +
-   * {{https://googleprojectzero.blogspot.ch/2015/03/exploiting-dram-rowhammer-bug-to-gain.html | M. Seaborn and T. Dullien, "Exploiting the DRAM rowhammer bug to gain kernel privileges," Google Project Zero, 2015}} +
-   * {{10.1007-978-3-319-40667-1_15.pdf| D. Gruss, C. Maurice, S. Mangard, "Rowhammer.js: A Remote Software-Induced Fault Attack in JavaScript," DIMVA 2016}} +
-   * {{p1675-van-der-veen.pdf| V. van der Veen, Y. Fratantonio, M. Lindorfer, D. Gruss, C. Maurice, G. Vigna, H. Bos, K. Razavi, C. Giuffrida, "Drammer: Deterministic Rowhammer Attacks on Mobile Platforms," CCS 2016}} +
-   * {{p382-lamport.pdf| L. Lamport, R. Showtak, M. Pease, "The Byzantine Generals Problem," ACM TOPLAS, 1982}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/rowhammer-and-other-memory-issues_date17.pdf|O. Mutlu, "The RowHammer Problem and Other Issues We May Face as Memory Becomes Denser," DATE 2017}} +
-   * {{bstj29-2-147.pdf|R.W. Hamming. "Error Detecting and Error Correcting Codes". Bell System Technical Journal, 1950}}+
  
-===== Lecture 2 (21.09 Thu.) ===== 
-=== Required (lecture 2): === 
-   * {{patt_ieee2001.pdf|Y.N. Patt. "Requirements, bottlenecks, and good fortune: agents for microprocessor evolution". Proceedings of the IEEE, 2001}} 
  
-=== Required for review as part of HW1: ==+==== Other Referenced Readings ==== 
-   * {{https://people.inf.ethz.ch/omutlu/pub/mph_usenix_security07.pdf|T. Moscibroda and O. Mutlu. "Memory performance attacks: denial of memory service in multi-core systems," USENIX Security Symposium 2007}} +For many other readings covered in lecturesplease visit [[https://people.inf.ethz.ch/omutlu/projects.htm|https://people.inf.ethz.ch/omutlu/projects.htm]]
-   * {{https://people.inf.ethz.ch/omutlu/pub/raidr-dram-refresh_isca12.pdf|J. Liu, B. Jaiyen, R. Veras, O. Mutlu, “RAIDR: Retention-Aware Intelligent DRAM Refresh,” ISCA 2012}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/dram-row-hammer_isca14.pdf|Y. Kim, R. Daly, J. Kim, C. Fallin, J.H. Lee, D. Lee, C. Wilkerson, K. Lai, O. Mutlu, "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors," ISCA 2014}} +
- +
-=== Described in detail during lecture 2: === +
-   * {{gordon_moore_1965_article.pdf| G.E. Moore. "Cramming more components onto integrated circuits," Electronics magazine, 1965}} +
-   * {{https://en.wikipedia.org/wiki/The_Structure_of_Scientific_Revolutions| T.S. Kuhn, "The Structure of Scientific Revolutions," 1962}} +
-   * {{Burks_vonNeumann.pdf| A.W. Burks, H.H. Goldstein, J. von Neumann, "Preliminary discussion of the logical design of an electronic computing instrument," 1946}} +
-   * {{04_chapter_4.pdf| Y.N. Patt and S.J. Patel, "Introduction to Computing Systems: Chapter 4, The von Neumann Model”, 2004}} +
-   * {{p126-dennis.pdf | J.B. Dennis, D. Misunas, "A preliminary architecture for a basic data-flow processor," ISCA 1974}} +
-   * {{Wilkes_1965.pdf| M.V. Wilkes, "Slave Memories and Dynamic Storage Allocation," IEEE Trans. On Electronic Computers, 1965}} +
- +
-=== Suggested  (lecture 2): === +
-   * {{Amdahl_1964.pdf| G.M. Amdahl, G.A. Blaauw, F.P. Brooks. "Architecture of the IBM System/360," IBM Journal of Research and Development, 1964}} +
-   * {{p34-gurd-2.pdf| J.R. Gurd, C.C. Kirkham, I. Watson, "Manchester data flow computer," CACM, 1985}} +
-   * {{P&H_CH5.pdf| D.A. Patterson and J.L. Hennessy, "Computer Organization and Design: Chapter 5, Large and fast: exploiting memory hierarchy”, 2012}} +
-   * {{Hamacher_Ch8_2012.pdf| C. Hamacher, Z. Vranesic, S. Zaky, N. Manjikian, "Computer Organization and Embedded Systems: Chapter 8, The memory system”, 2012}} +
-   * {{liptay68.pdf| J.S. Liptay, "Structural aspects of the System/360 Model 85 II: the cache," IBM Systems Journal, 1968}} +
-   * {{p435-fotheringham.pdf| J. Fotheringham, "Dynamic Storage Allocation in the Atlas ComputerIncluding an Automatic Use of a Backing Store," CACM, 1961}} +
-   * {{Bloom62.pdf| L. Bloom, M. Cohen, S. Porter, "Considerations in the Design of a Computer with High Logic-to-Memory Speed Ratio," AIEE Gigacycle Computing Systems Winter Meeting 1962}} +
- +
-===== Lecture 3 (27.09 Wed.) ===== +
-=== Required  (lecture 3): === +
-   * {{https://people.inf.ethz.ch/omutlu/pub/qureshi_isca06.pdf|M.K. Qureshi, D.N. Lynch, O. Mutlu, Y.N. Patt. "A Case for MLP-Aware Cache Replacement". ISCA 2006}} +
- +
-=== Described in detail during lecture 3: === +
-   * {{Belady_IBM1966.pdf| L.A. Belady, “A study of replacement algorithms for a virtual- storage computer,” IBM Systems Journal, 1966}} +
-   * {{npjouppi_ISCA1990.pdf| N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” ISCA 1990}} +
-   * {{p169-seznec.pdf| A. Seznec, “A Case for Two-Way Skewed-Associative Caches,” ISCA 1993}} +
-   * {{p81-kroft.pdf| D. Kroft, “Lockup-Free Instruction Fetch/Prefetch Cache Organization," ISCA 1981}} +
- +
-=== Suggested (lecture 3): === +
-   * {{andrew_glew.pdf| A. Glew, “MLP Yes! ILP No!,” ASPLOS Wild and Crazy Ideas Session 1998}} +
-   * {{p381-qureshi.pdf| M.K. Qureshi, A. Jaleel, Y.N. Patt, S.C. Steely, J. Emer, “Adaptive Insertion Policies for High Performance Caching”, ISCA 2007}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/eaf-cache_pact12.pdf| V. Seshadri, O. Mutlu, M.A. Kozuch, T.C. Mowry, “The Evicted-Address Filter: A Unified Mechanism to Address Both Cache Pollution and Thrashing,” PACT 2012}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/bdi-compression_pact12.pdf| G. Pekhimenko, V. Seshadri, O. Mutlu, P.B. Gibbons, M.A. Kozuch, T.C. Mowry, “Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches,” PACT 2012}} +
-   * {{Kharbutli_HPCA2004.pdf| M. Kharbutli, K. Irwin, Y. Solihin, J. Lee, "Using prime numbers for cache indexing to eliminate conflict misses," HPCA 2004}} +
- +
-===== Lecture 4 (28.09 Thu.) ===== +
-=== Described in detail during lecture 4: === +
-   * {{https://people.inf.ethz.ch/omutlu/pub/dram-row-hammer_isca14.pdf|Y. Kim, R. Daly, J. Kim, C. Fallin, J.H. Lee, D. Lee, C. Wilkerson, K. Lai, O. Mutlu, "Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors," ISCA 2014}} +
-   * {{isca09-disaggregate.pdf|K. Lim, J. Chang, T. Mudge, P. Ranganathan, S.K. Reinhardt, T.F. Wenisch, "Disaggregated Memory for Expansion and Sharing in Blade Servers," ISCA 2009}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/softMC_hpca17.pdf|H. Hassan, N. Vijaykumar, S. Khan, S. Ghose, K. Chang, G. Pekhimenko, D. Lee, O. Ergin, O. Mutlu, "SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies," HPCA 2017}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/pcm_isca09.pdf|B.C. Lee, E. Ipek, O. Mutlu, D. Burger, "Architecting Phase Change Memory as a Scalable DRAM Alternative," ISCA 2009}} +
- +
-=== Suggested (lecture 4): === +
-   * {{andrew_glew.pdf| A. Glew, “MLP Yes! ILP No!,” ASPLOS Wild and Crazy Ideas Session 1998}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/mutlu_ieee_micro03.pdf| O. Mutlu, J. Stark, C. Wilkerson, Y.N. Patt, “Runahead Execution: An Effective Alternative to Large Instruction Windows,” IEEE Micro 2003}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/utility-based-hybrid-memory-management_cluster17.pdf| Y. Li, S. Ghose, J. Choi, J. Sun, H. Wang, O. Mutlu, “Utility-Based Hybrid Memory Management,” CLUSTER 2017}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/parbs_isca08.pdf| O. Mutlu and T. Moscibroda, “Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems,” ISCA 2008}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/memory-systems-research_superfri14.pdf| O. Mutlu and L. Subramanian, “Research Problems and Opportunities in Memory Systems,” SUPERFRI 2014}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/tldram_hpca13.pdf|D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, O. Mutlu, "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," HPCA 2013}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/salp-dram_isca12.pdf|Y. Kim, V. Seshadri, D. Lee, J. Liu, O. Mutlu, "A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM," ISCA 2012}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/raidr-dram-refresh_isca12.pdf|J. Liu, B. Jaiyen, R. Veras, O. Mutlu, "RAIDR: Retention-Aware Intelligent DRAM Refresh," ISCA 2012}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/ramulator_dram_simulator-ieee-cal15.pdf|Y. Kim, W. Yang, O. Mutlu, "Ramulator: A Fast and Extensible DRAM Simulator," IEEE CAL 2015}} +
-   * {{stupid_architects_look_to_future.pdf|R. Sites, "It’s the Memory, Stupid!," Microprocessor report 1996}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/mutlu_hpca03.pdf|O. Mutlu, J. Stark, C. Wilkerson, Y.N. Patt, "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors," HPCA 2003}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/memory-errors-at-facebook_dsn15.pdf|J. Meza, Q. Wu, S. Kumar, O. Mutlu, "Revisiting Memory Errors in Large-Scale Production Data Centers: Analysis and Modeling of New Trends from the Field," DSN 2015}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/adaptive-latency-dram_hpca15.pdf|D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. Chang, O. Mutlu, "Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case," HPCA 2015}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/understanding-latency-variation-in-DRAM-chips_sigmetrics16.pdf|K. Chang, A. Kashyap, H. Hassan, S. Khan, K. Hsieh, D. Lee, S. Ghose, G. Pekhimenko, T. Li, O. Mutlu, "Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization," SIGMETRICS 2016}} +
-   * {{kang-memoryforum14.pdf|U. Kang, H.-S. Yu, C. Park, H. Zheng, J. Halbert, K. Bains, S. Jang, J.S. Choi, "Co-Architecting Controllers and DRAM to Enhance DRAM Process Scaling," The Memory Forum 2014}} +
-   * {{https://arxiv.org/pdf/1706.08642.pdf|Y. Cai, S. Ghose, E.F. Haratsch, Y. Luo, O. Mutlu, "Error Characterization, Mitigation, and Recovery in Flash Memory Based Solid State Drives," Proceedings of the IEEE 2017}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/flash-memory-programming-vulnerabilities_hpca17.pdf|Y. Cai, S. Ghose, Y. Luo, K. Mai, O. Mutlu, E.F. Haratsch, "Vulnerabilities in MLC NAND Flash Memory Programming: Experimental Analysis, Exploits, and Mitigation Techniques," HPCA 2017}} +
-   * {{ekman-ISCA05.pdf|M. Ekman and P. Stenstrom, "A Robust Main-Memory Compression Scheme," ISCA 2005}} +
-   * {{PCM_IBMJRD.pdf|S. Raoux, G. W. Burr, M. J. Breitwisch, C. T. Rentner, Y.-C. Chen, R. M. Shelby, M. Salinga, D. Krebs, S.-H. Chen, H.-L. Lung, C. H. Lam, "Phase-change random access memory: A scalable technology," IBM JRD 2008}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/heterogeneous-reliability-memory-for-data-centers_dsn14.pdf|
Y. Luo, S. Govindan, B. Sharma, M. Santaniello, J. Meza, A. Kansal, J. Liu, B. Khessib, K. Vaid, O. Mutlu, "Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory," DSN 2014}} +
-  * {{chandra.pdf|T. Chandra, "Sibyl: A system for large scale machine learning at Google," Keynote at DSN 2014}} +
-    * [[https://www.youtube.com/watch?v=3SaZ5UAQrQM|youtube]] +
- +
-===== Lecture 5 (04.10 Wed.) ===== +
-=== Required (lecture 5): === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/tldram_hpca13.pdf|D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, and O. Mutlu, "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," HPCA 2013}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/salp-dram_isca12.pdf|Y. Kim, V. Seshadri, D. Lee, J. Liu, and O. Mutlu, "A Case for Exploiting Subarray-Level Parallelism (SALP) in DRAM," ISCA 2012}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/raidr-dram-refresh_isca12.pdf|J. Liu, B. Jaiyen, R. Veras, O. Mutlu, "RAIDR: Retention-Aware Intelligent DRAM Refresh," ISCA 2012}} +
-=== Described in detail during lecture 5: === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/ramulator_dram_simulator-ieee-cal15.pdf|Y. Kim, W. Yang, and O. Mutlu, "Ramulator: A Fast and Extensible DRAM Simulator," IEEE CAL 2015}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/rlmc_isca08.pdf|E. Ipek, O., J. F. Martínez, and R. Caruana, "Self-Optimizing Memory Controllers: A Reinforcement Learning Approach," ISCA 2008}} +
-=== Suggested (lecture 5): === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/flash-correct-and-refresh_iccd12.pdf|Y. Cai, G. Yalcin, O. Mutlu, E. F. Haratsch, A. Cristal, O. Unsal, and K. Mai, "Flash Correct-and-Refresh: Retention-Aware Error Management for Increased Flash Memory Lifetime," ICCD 2012}} +
-  * {{https://arxiv.org/pdf/1706.08642.pdf|Y. Cai, S. Ghose, E. F. Haratsch, Y. Luo, and O. Mutlu, "Error Characterization, Mitigation, and Recovery in Flash-Memory-Based Solid-State Drives," Proc. IEEE 2017}} +
-  * {{pseudo-randomly_interleaved_memory.pdf|B. R. Rau, "Pseudo-Randomly Interleaved Memory," ISCA 1991}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/dram-aware-caches-TR-HPS-2010-002.pdf|C. J. Lee, V. Narasiman, E. Ebrahimi, O. Mutlu, and Y. N. Patt, "DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems," HPS Technical Report, April 2010}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/memory-channel-partitioning-micro11.pdf|S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda, "Reducing memory interference in multicore systems via application-aware memory channel partitioning," MICRO 2011}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/application-to-core-mapping_hpca13.pdf|R. Das, R. Ausavarungnirun, O. Mutlu, A. Kumar, and M. Azimi, "Application-to-core mapping policies to reduce memory system interference in multi-core systems," HPCA 2013}} +
-  * {{https://pdos.csail.mit.edu/papers/masstree:eurosys12.pdf|Y. Mao, E. Kohler, and R. Morris, "Cache Craftiness for Fast Multicore Key-Value Storage," Eurosys 2012}} +
-  * {{quantifying_the_performance_impact_of_memory_latency_and_bandwidth_for_big_data_workloads.pdf|R. Clapp, M. Dimitrov, K. Kumar, V. Viswanathan, and T. Willhalm, "Quantifying the Performance Impact of Memory Latency and Bandwidth for Big Data Workloads," IISWC 2015}} +
-  * {{graph_processing_on_gpus_where_are_the_bottlenecks.pdf|Q. Xu, H. Jeon, and M. Annavaram, "Graph Processing on GPUs: Where are the Bottlenecks?," IISWC 2014}} +
-  * {{preprint-hybridbfs-fpl15.pdf|Y. Umuroglu, D. Morrison, and M. Jahre, "Hybrid breadth-first search on a single-chip FPGA-CPU heterogeneous platform," FPL 2015}} +
-  * {{identifying_the_potential_of_near_data_computing_for_apache_spark.pdf|A. J. Awan, V. Vlassov, E. Ayguade, and M. Brorsson, "Identifying the potential of Near Data Computing for Apache Spark," BDCloud 2015}} +
-  * {{profiling_a_warehouse-scale_computer.pdf|S. Kanev, J. P. Darago, K. M. Hazelwood, P. Ranganathan, T. Moseley, G. Wei, D. M. Brooks, "Profiling a warehouse-scale computer," ISCA 2015}} +
- +
-===== Lecture 6 (05.10 Thu.) ===== +
-=== Required (lecture 6): === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/ambit-bulk-bitwise-dram_micro17.pdf|V. Seshadri, D. Lee, T. Mullins, H. Hassan, A. Boroumand, J. Kim, M.A. Kozuch, O. Mutlu, P.B. Gibbons, T.C. Mowry, “Ambit: In-Memory Accelerator for Bulk Bitwise Operations Using Commodity DRAM Technology,” MICRO 2017}} +
- +
-=== Described in detail during lecture 6: === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/adaptive-latency-dram_hpca15.pdf|D. Lee, Y. Kim, G. Pekhimenko, S. Khan, V. Seshadri, K. Chang, O. Mutlu, "Adaptive-Latency DRAM: Optimizing DRAM Timing for the Common-Case," HPCA 2015}}     +
-  *{{https://people.inf.ethz.ch/omutlu/pub/understanding-latency-variation-in-DRAM-chips_sigmetrics16.pdf|K. Chang, A. Kashyap, H. Hassan, S. Khan, K. Hsieh, D. Lee, S. Ghose, G. Pekhimenko, T. Li, O. Mutlu, "Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization," SIGMETRICS 2016}} +
-  *{{https://people.inf.ethz.ch/omutlu/pub/DIVA-low-latency-DRAM_sigmetrics17-paper.pdf|D. Lee, S. Khan, L. Subramanian, S. Ghose, R. Ausavarungnirun, G. Pekhimenko, V. Seshadri, O. Mutlu, "Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms," SIGMETRICS 2017}} +
-  *{{https://people.inf.ethz.ch/omutlu/pub/Voltron-reduced-voltage-DRAM-sigmetrics17-paper.pdf|K. Chang, A.G. Yaglikci, S. Ghose, A. Agrawal, N. Chatterjee, A. Kashyap, D. Lee, M. O'Connor, H. Hassan, O. Mutlu, "Understanding Reduced-Voltage Operation in Modern DRAM Devices: Experimental Characterization, Analysis, and Mechanisms," SIGMETRICS 2017}} +
-  *{{https://people.inf.ethz.ch/omutlu/pub/rowclone_micro13.pdf|V. Seshadri, Y. Kim, C. Fallin, D. Lee, R. Ausavarungnirun, G. Pekhimenko, Y. Luo, O. Mutlu, M.A. Kozuch, P.B. Gibbons, T.C. Mowry, "RowClone: Fast and Energy-Efficient In-DRAM Bulk Data Copy and Initialization," MICRO 2013}} +
-  *{{https://people.inf.ethz.ch/omutlu/pub/in-DRAM-bulk-AND-OR-ieee_cal15.pdf|V. Seshadri, K. Hsieh, A. Boroumand, D. Lee, M.A. Kozuch, O.Mutlu, P.B. Gibbons, T.C. Mowry, "Fast Bulk Bitwise AND and OR in DRAM," IEEE CAL 2015}} +
-  *{{https://people.inf.ethz.ch/omutlu/pub/tesseract-pim-architecture-for-graph-processing_isca15.pdf|J. Ahn, S. Hong, S. Yoo, O. Mutlu, K. Choi, "A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing," ISCA 2015}} +
- +
-=== Suggested (lecture 6): === +
-  *{{p163-elsayed.pdf|N. El-Sayed, I. Stefanovici, G. Amvrosiadis. A.A. Hwang, B. Schroeder, "Temperature Management in Data Centers: Why Some (Might) Like It Hot," SIGMETRICS 2012}} +
-  *{{https://people.inf.ethz.ch/omutlu/pub/memory-dvfs_icac11.pdf|H. David, C. Fallin, E. Gorbatov, U.R. Hanebutte, O. Mutlu, "Memory Power Management via Dynamic Voltage/Frequency Scaling," ICAC 2011}} +
-  *{{lin_ISCA2007.pdf|J. Lin, H. Zheng, Z. Zhu, H. David, Z. Zhang, "Thermal Modeling and Management of DRAM Memory Systems," ISCA 2007}} +
-  *{{Zhu_ITHERM2008.pdf|Q. Zhu, X. Li, Y. Wu, "Thermal managerment of high power memory module for server platforms," ITHERM 2008}} +
-  *{{Ware_HPCA2010.pdf|M.S. Ware, K. Rajamani, M.S. Floyd, B.Brock, J.C. Rubio, F.L. Rawson III, J.B. Carter, "Architecting for power management: The IBM POWER7 approach," HPCA 2010}} +
-  *{{Paul_ISCA2015.pdf|I. Paul, W. Huang, M. Arora, S. Yalamanchili, "Harmonia: Balancing Compute and Memory Power in High-Performance GPUs," ISCA 2015}} +
-   * {{Burks_vonNeumann.pdf| A.W. Burks, H.H. Goldstein, J. von Neumann, "Preliminary discussion of the logical design of an electronic computing instrument," 1946}} +
-   * {{04_chapter_4.pdf| Y.N. Patt and S.J. Patel, "Introduction to Computing Systems: Chapter 4, The von Neumann Model”, 2004}} +
-  *{{https://people.inf.ethz.ch/omutlu/pub/GSDRAM-gather-scatter-dram_micro15.pdf|V. Seshadri, T. Mullins, A. Boroumand, O. Mutlu, P.B. Gibbons, M.A. Kozuch, T.C. Mowry, "Gather-Scatter DRAM: In-DRAM Address Translation to Improve the Spatial Locality of Non-unit Strided Accesses," MICRO 2015}} +
-  *{{https://people.inf.ethz.ch/omutlu/pub/pim-enabled-instructons-for-low-overhead-pim_isca15.pdf|J. Ahn, S. Yoo, O. Mutlu, K. Choi, "PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture," ISCA 2015}} +
-  *{{https://people.inf.ethz.ch/omutlu/pub/in-memory-pointer-chasing-accelerator_iccd16.pdf|K. Hsieh, S. Khan, N. Vijaykumar, K.K. Chang, A. Boroumand, S. Ghose, O. Mutlu, "Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation," ICCD 2016}} +
-  * {{profiling_a_warehouse-scale_computer.pdf|S. Kanev, J. P. Darago, K. M. Hazelwood, P. Ranganathan, T. Moseley, G. Wei, D. M. Brooks, "Profiling a warehouse-scale computer," ISCA 2015}} +
-  *{{PR_1999-66.pdf|L. Page, S. Brin, R. Motwani, T. Winograd, "The PageRank citation ranking: Bringing order to the web," Stanford Digital Library Technologies Project 1998}} +
- +
-===== Lecture 7 (11.10 Wed.) ===== +
-=== Required (lecture 7): === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/pcm_isca09.pdf|B. C. Lee, E. Ipek, O. Mutlu and D. Burger. "Architecting phase change memory as a scalable dram alternative." ISCA 2009.}} +
-=== Described in detail during lecture 7: === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/tesseract-pim-architecture-for-graph-processing_isca15.pdf| J. Ahn, S. Hong, S. Yoo, O. Mutlu, and Kiyoung Choi. "A Scalable Processing-in-Memory Accelerator for Parallel Graph Processing" ISCA 2015}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/pim-enabled-instructons-for-low-overhead-pim_isca15.pdf|J. Ahn, S. Yoo, O. Mutlu, and K. Choi, "PIM-Enabled Instructions: A Low-Overhead, Locality-Aware Processing-in-Memory Architecture" ISCA 2015}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/TOM-programmer-transparent-GPU-near-data-processing_isca16.pdf|K. Hsieh, E. Ebrahimi, G. Kim, N. Chatterjee, M. +
-O'Connor, N. Vijaykumar, O. Mutlu, and S. W. Keckler, "Transparent Offloading and Mapping (TOM): Enabling Programmer-Transparent Near-Data Processing in GPU Systems" ISCA 2016}}  +
-  * {{https://people.inf.ethz.ch/omutlu/pub/in-memory-pointer-chasing-accelerator_iccd16.pdf|K. Hsieh, S. Khan, N. Vijaykumar, K. K. Chang, A. Boroumand, S. Ghose, and O. Mutlu, "Accelerating Pointer Chasing in 3D-Stacked Memory: Challenges, Mechanisms, Evaluation" ICCD 2016}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/sttram_ispass13.pdf |E. Kultursay, M. Kandemir, A. Sivasubramaniam, and O. Mutlu, "Evaluating STT-RAM as an Energy-Efficient Main Memory Alternative" ISPASS 2013}} +
-  * {{scalablehigh-performancemainmemorysystemusingphase-changememorytechnology.pdf | Moinuddin K. Qureshi, Viji Srinivasan, and Jude A. Rivers "Scalable High-Performance Main Memory System Using Phase-Change Memory Technology" ISCA 2009}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/rowbuffer-aware-caching_iccd12.pdf|H. Yoon, J. Meza, R. Ausavarungnirun, R. Harding, and O. Mutlu, "Row Buffer Locality Aware Caching Policies for Hybrid Memories" ICCD 2012}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/utility-based-hybrid-memory-management_cluster17.pdf|Y. Li, S. Ghose, J. Choi, J. Sun, H. Wang, and O. Mutlu, "Utility-Based Hybrid Memory Management" CLUSTER 2017}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/timber-fine-grained-dram-cache_ieee-cal12.pdf|J. Meza, J. Chang, H. Yoon, O. Mutlu, and P. Ranganathan, "Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management" CAL 2012}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/persistent-memory-management_weed13.pdf|J. Meza, Y. Luo, S. Khan, J. Zhao, Y. Xie, and O. Mutlu, "A Case for Efficient Hardware-Software Cooperative Management of Storage and Memory" WEED 2013}} +
- === Suggested (lecture 7): === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/scheduling-for-GPU-processing-in-memory_pact16.pdf|A. Pattnaik, X. Tang, A. Jog, O. Kayiran, A. K. Mishra, M. T. Kandemir, O. Mutlu, and Chita R. Das, "Scheduling Techniques for GPU Architectures with Processing-In-Memory Capabilities" PACT 2016}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/enhanced-memory-controller-for-dependent-loads_isca16.pdf|M. Hashemi, Khubaib, E. Ebrahimi, O. Mutlu, and Y. N. Patt, "Accelerating Dependent Cache Misses with an Enhanced Memory Controller" ISCA 2016}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/LazyPIM-coherence-for-processing-in-memory_ieee-cal16.pdf|A. Boroumand, S. Ghose, M. Patel, H. Hassan, B. Lucia, K. Hsieh, K. T. Malladi, H. Zheng, and O. Mutlu, "LazyPIM: An Efficient Cache Coherence Mechanism for Processing-in-Memory" CAL 2016}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/concurrent-data-structures-for-PIM_spaa17.pdf|Z. Liu, I. Calciu, M. Herlihy, and O. Mutlu, "Concurrent Data Structures for Near-Memory Computing" SPAA 2017}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/ramulator_dram_simulator-ieee-cal15.pdf|Y. Kim, W. Yang, and O. Mutlu, "Ramulator: A Fast and Extensible DRAM Simulator," IEEE CAL 2015}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/softMC_hpca17.pdf|H. Hassan, N. Vijaykumar, S. Khan, S. Ghose, K. Chang, G. Pekhimenko, D. Lee, O. Ergin, O. Mutlu, "SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies," HPCA 2017}} +
-  * {{phase-changetechnologyandthefutureofmainmemory.pdf|B. C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, O. Mutlu, and D. Burger "Phase-Change Technology and the Future of Main Memory" IEEE Micro Top Picks 2010}} +
-  * {{pdram_ahybridpramanddrammainmemorysystem.pdf | G. Dhiman, R. Ayoub, T. Rosing "PDRAM: A hybrid PRAM and DRAM main memory system" DAC 2009}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/banshee-bandwidth-efficient-DRAM-cache_micro17.pdf|X. Yu, C. J. Hughes, N. Satish, O. Mutlu, and S. Devadas,"Banshee: Bandwidth-Efficient DRAM Caching via Software/Hardware Cooperation" MICRO 2017}} +
- +
-===== Lecture 8 (18.10 Wed.) ===== +
- === Suggested (lecture 8): === +
-  * {{Flynn_1966.pdf|M.J. Flynn, “Very high-speed computing systems,” Proc. of IEEE 1966}} +
-  * {{p140-fisher.pdf|J.A.Fisher, "Very Long Instruction Word architectures and the ELI-512,” ISCA 1983}} +
-  * {{p63-russell.pdf|R.M. Russell, "The CRAY-1 computer system,” CACM 1978}} +
-  * {{mmx_technology_1996.pdf|A. Peleg and U. Weiser, "MMX technology extension to the Intel architecture,” IEEE Micro 1996}} +
- +
-===== Lecture 10 (25.10 Wed.) ===== +
- === Required (lecture 10): === +
-  * {{ :mcfarling_combining.pdf | S. McFarling, "Combining Branch Predictors" DEC WRL Technical Report 1993}} +
-  * {{ :two-level-branch-pred.pdf | T. Yeh and Y. Patt, "Two-Level Adaptive Training Branch Prediction" MICRO 1991}} +
- === Suggested (lecture 10): === +
-  * {{ :the_microarchitecture_of_superscalar_processors.pdf | J. Smith, "The Microarchitecture of Superscalar Processors" IEEE 1995}} +
-  * {{ :alpha_21264.pdf | R. E. Kessler, "The Alpha 21264 Microprocessor" IEEE Micro 1999}} +
-  * {{ :jsmith.pdf | J. Smith, "A study of Branch Prediction Strategies" ISCA 1981}} +
-  * {{ :r2_evers_2lbranch_isca98.pdf | M. Evers et al., "An Analysis of Correlation and Predictability: What Makes Two-level Branch Predictors Work" ISCA 1998}} +
-  * {{ :bf03356745.pdf |P. Chang et al., "Branch Classification: A New Mechanism for Improving Branch Predictor Performance" MICRO 1994}} +
-  * {{ :agree_isca24.pdf | E. Sprangle et al., "The Agree Predictor: A Mechanism for Reducing Negative Branch History Interference" ISCA 1997}} +
-  * {{ :optim2bcgskew.pdf | A. Seznec, "An Optimized 2bcgskew Branch Predictor" IRISA Tech. Report 1993}} +
-  * {{ :michaud97trading.pdf | P. Michaud et al., "Trading Conflict and Capacity Aliasing in Conditional Branch Predictors" ISCA 1997}} +
-  * {{ :p4-lee.pdf | C. Lee et al., "The Bi-Mode Branch Predictor" MICRO 1997}} +
-  * {{ :p69-eden.pdf | A. N. Eden and T. Mudge, "The YAGS Branch Prediction Scheme" MICRO 1998}} +
-  * {{ :seznec02.pdf | A. Seznec et al., "Design Tradeoffs for the Alpha EV8 Conditional Branch Predictor" ISCA 2002}} +
-  * {{ :ssmt.pdf | R. Chappell et al., "Simultaneous Subordinate Microthreading (SSMT)" ISCA 1999}} +
-  * {{ :aaaa09d97139a5076ad0a24bd5bb69bea1e1.pdf | D. Jimenez and C. Lin, "Dynamic Branch Prediction with Perceptrons" HPCA 2001}} +
-  * {{ :ad48737158334a46763c8e0b29fd53975e10.pdf | A. Seznec, "Analysis of the O-GEometric History Length Branch Predictor" ISCA 2005}} +
-  * {{ :centrino_microarchitecture_and_performance.pdf | S. Gochman et al., "The Intel Pentium M Processor: Microarchitecture and Performance" Intel Technology Journal 2003}} +
-  * {{ :v8paper1.pdf | A. Seznec and P. Michaud, "A Case for (Partially) TAgged GEometric History Length Branch Prediction" JILP 2006}} +
-  * {{ :andresezneclimited.pdf | A. Seznec, "TAGE-SC-L Branch Predictors Again" CBP 2016}} +
-  * {{ :micro.confidence.pdf | Jacobsen et al., "Assigning Confidence to Conditional Branch Predictions" MICRO 1996}} +
-  * {{ :10.1.1.33.9918.pdf | Manne et al., "Pipeline Gating: Speculation Control for Energy Reduction" ISCA 1998}} +
-  * {{ :p16-pettis.pdf | Pettis and Hansen, "Profile Guided Code Positioning" PLDI 1990}} +
-  * {{ :10.1007_2fbf01205185.pdf | Hwu et al., "The Superblock: An effective technique for VLIW  +
-     and superscalar compilation" Journal of Supercomputing 1993}} +
-      +
-===== Lecture 11 (26.10 Thu.) ===== +
- === Suggested (lecture 11): === +
-  * {{ :the_microarchitecture_of_superscalar_processors.pdf | J. Smith and G. Sohi, "The Microarchitecture of Superscalar Processors" IEEE 1995}} +
-  * {{ :alpha_21264.pdf | R. E. Kessler, "The Alpha 21264 Microprocessor" IEEE Micro 1999}} +
-  * {{ :rotenberg_trace_cache.pdf | E. Rotenberg, S. Bennett, J. E. Smith, "Trace Cache: a Low Latency Approach to High Bandwidth Instruction Fetching", MICRO 1996}} +
-  * {{ :critical_issues_regarding_the_trace_cache_fetch_mechanism.pdf | S. J. Patel, D. H. Friendly, Y. N. Patt, "Critical Issues Regarding the Trace Cache Fetch Mechanism", Umich TR, 1997}} +
-  * {{ :trace_cache_design_for_wide_issue_superscalar_processors.pdf | S. J. Patel, "Trace Cache Design for Wide Issue Superscalar Processors", PhD Thesis, University of Michigan, 1999}} +
-  * {{ :putting_the_fill_unit_to_work_dynamic_optimizations_for_trace_cache_microprocessors.pdf | D. H. Friendly,  S. J. Patel, Y. N. Patt, "Putting the Fill Unit to Work: Dynamic Optimizations for Trace Cache Microprocessors", MICRO 1998}} +
-  * {{ ::us5381533-dynamic_flow_instruction_cache_memory_organized_around_trace_segments_independent_of_virtual_address_line.pdf | A. Peleg and U. Weiser, "Dynamic Flow Instruction Cache Memory Organized Around Trace Segments Independent of Virtual Address Line", US Patent, 1995}} +
-  * {{ :parallel_operation_in_the_control_data_6600.pdf | J. E. Thornton, "Parallel Operation in the Control Data 6600", AFIPS 1964}} +
-  * {{ :the_impact_of_if-conversion_and_branch_prediction_on_program_execution_on_the_intel_itanium_processor.pdf | Y. Choi, A. Knies, L. Gerke, T.-F. N, "The Impact of If-Conversion and Branch Prediction on Program Execution on the Intel Itanium Processor", MICRO 2001}} +
-  * {{ :vpc_prediction_reducing_the_cost_of_indirect_branches_via_hardware-based_dynamic_devirtualization.pdf | H. Kim, J. A. Joao, O. Mutlu, C. J. Lee, Y. N. Patt, R. Cohn, "VPC prediction: Reducing the Cost of Indirect Branches via Dardware-based Dynamic Devirtualization", ISCA 2007}} +
-  * {{ :conversion_of_control_dependence_to_data_dependence.pdf | J. R. Allen, K. Kennedy, C. Porterfield, J. Warren, "Conversion of Control Dependence to Data Dependence", POPL 1983}} +
-  * {{ :wish_branches_combining_conditional_branching_and_predication_for_adaptive_predicated_execution.pdf | H. Kim, O. Mutlu, J. Stark, Y. N. Patt, "Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution", MICRO 2005}} +
-  * {{ :the_inhibition_of_potential_parallelism_by_conditional_jumps.pdf | E. M. Riseman and C.C. Foster, "The Inhibition of Potential Parallelism by Conditional Jumps", IEEE TC 1972}} +
-  * {{ :niagara_a_32-way_multithreaded_sparc_processor.pdf | P. Kongetira, K. Aingaran, K. Olukotun, "Niagara: A 32-Way Multithreaded Sparc Processor", IEEE Micro 2005}} +
-  * {{ :a_pipelined_shared_resource_mimd_computer.pdf | B. J. Smith, "A Pipelined, Shared Resource MIMD Computer", ICPP 1978}} +
- +
-===== Lecture 12 (01.11 Wed.) ===== +
-=== Described in detail during lecture 12: === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/stfm_micro07.pdf|O. Mutlu and T. Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors," MICRO 2007}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/parbs_isca08.pdf|O. Mutlu and T. Moscibroda, "Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems," ISCA 2008}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/atlas_hpca10.pdf|Y. Kim, D. Han, O. Mutlu, M. Harchol-Balter, “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/tcm_micro10.pdf|Y. Kim, M. Papamichel, O. Mutlu, M. Harchol-Balter, “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” MICRO 2010}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/bliss-memory-scheduler_iccd14.pdf|L. Subramanian, D. Lee, V. Seshadri, H. Rastogi, O. Mutlu, “The Blacklisting Memory Scheduler: Achieving High performance and Fairness at Low Cost,” ICCD 2014}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/staged-memory-scheduling_isca12.pdf|R. Ausavarungnirun, K. Chang, L. Subramanian, G. Loh, O. Mutlu, “Staged Memory Scheduling: Achieving High Performance and Scalability in Heterogeneous Systems,” ISCA 2012}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/dash_deadline-aware-heterogeneous-memory-scheduler_taco16.pdf|H. Usui, L. Subramanian, K. Chang, O. Mutlu, “DASH: Deadline-Aware High-Performance Memory Scheduler for Heterogeneous Systems with Hardware Accelerators,” ACM TACO 2016}} +
- +
- === Suggested (lecture 12): === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/parallel-memory-scheduling_micro11.pdf|E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C.J. Lee, O. Mutlu, Y.N. Patt, “Parallel Application Memory Scheduling,” MICRO 2011}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/memory-channel-partitioning-micro11.pdf|S.P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, T. Moscibroda, “Reducing Memory Interference in Multicore Systems via Application-aware Memory Channel Partitioning,” MICRO 2011}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/pvc-qos_micro09.pdf|B. Grot, S.W. Keckler, O. Mutlu, “Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QoS Scheme for Networks-on-Chip,” MICRO 2009}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/bliss-memory-scheduler_ieee-tpds16.pdf|L. Subramanian, D. Lee, V. Seshadri, H. Rastogi, O. Mutlu, “BLISS: Balancing Performance, Fairness, and Complexity in Memory Access Scheduling,” IEEE TPDS 2016}} +
- +
-===== Lecture 13 (02.11 Thu.) ===== +
-=== Described in detail during lecture 13: === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/mise-predictable_memory_performance-hpca13.pdf|L. Subramanian, V. Seshadri, Y. Kim, B. Jaiyen, and O. Mutlu, "MISE: Providing Performance Predictability and Improving Fairness +
-in Shared Main Memory Systems," HPCA 2013}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/parallel-memory-scheduling_micro11.pdf|E. Ebrahimi, R. Miftakhutdinov, C. Fallin, C.J. Lee, O. Mutlu, Y.N. Patt, “Parallel Application Memory Scheduling,” MICRO 2011}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/memory-channel-partitioning-micro11.pdf|S. P. Muralidhara, L. Subramanian, O. Mutlu, M. Kandemir, and T. Moscibroda, "Reducing Memory Interference in Multicore Systems via Application-Aware Memory Channel Partitioning," MICRO 2011}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/fst_asplos10.pdf|E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt, "Fairness via Source Throttling: A Configurable and High-Performance Fairness Substrate for Multi-Core Memory Systems," ASPLOS 2010}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/application-to-core-mapping_hpca13.pdf|R. Das, R. Ausavarungnirun, O. Mutlu, A. Kumar, and M. Azimi, "Application-to-Core Mapping Policies to Reduce Memory System Interference in Multi-Core Systems," HPCA 2013}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/architecture-aware-distributed-resource-management_vee15.pdf|H. Wang, C. Isci, L. Subramanian, J. Choi, D. Qian, and O. Mutlu, "A-DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters," VEE 2015}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/decoupled-dma_pact15.pdf|D. Lee, L. Subramanian, R. Ausavarungnirun, J. Choi, and O. Mutlu, "Decoupled Direct Memory Access: Isolating CPU and IO Traffic by Leveraging a Dual-Data-Port DRAM," PACT 2015}} +
-=== Suggested (lecture 13): === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/stfm_micro07.pdf|O. Mutlu and T. Moscibroda, "Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors," MICRO 2007}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/application-slowdown-model_micro15.pdf|L. Subramanian, V. Seshadri, A. Ghosh, S. Khan, and O. Mutlu, "The Application Slowdown Model: Quantifying and Controlling the Impact of Inter-Application Interference at Shared Caches and Main Memory," MICRO 2015}} +
-  * {{per-thread_cycle_accounting_in_multicore_processors.pdf|K. D. Bois, S. Eyerman, L. Eeckhout, "Per-thread cycle accounting in multicore processors," TACO 2013}} +
-  * {{qos_policies_and_architecture_for_cache_memory_in_cmp_platforms.pdf|R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, S. Reinhardt, "QoS policies and architecture for cache/memory in CMP platforms," SIGMETRICS 2017}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/hetero-adaptive-source-throttling_sbacpad12.pdf|K. Chang, R. Ausavarungnirun, C. Fallin, and O. Mutlu, "HAT: Heterogeneous Adaptive Throttling for On-Chip Networks," SBAC-PAD 2012}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/onchip-network-congestion-scalability_sigcomm2012.pdf|G. Nychis, C. Fallin, T. Moscibroda, O. Mutlu, and S. Seshan, "On-Chip Networks from a Networking Perspective: Congestion and Scalability in Many-core Interconnects," SIGCOMM 2012}} +
-  * {{memory_resource_management_in_vmware_esx_server.pdf|C. A. Waldspurger, "Memory Resource Management in VMware ESX Server," OSDI 2002}} +
-  * {{lottery_scheduling_flexible_proportional-share_resource_management.pdf|C. A. Waldspurger and W. E. Weihl, "Lottery Scheduling: Flexible Proportional-Share Resource Mangement," OSDI 1994}} +
-  * {{stride_scheduling_deterministic_proportional-share_resource_management.pdf|C. A. Waldspurger and W. E. Weihl, "Stride Scheduling: Deterministic Proportional-Share Resource Mangement," Technical Memorandum MIT/LCS/TM-528, MIT Laboratory for Computer Science, June 1995}} +
-  * {{lottery_and_stride_scheduling_flexible_proportional-share_resource_management.pdf|C. A. Waldspurger, "Lottery and Stride Scheduling: Flexible Proportional-Share Resource Management," Ph.D. dissertation, Massachusetts Institute of Technology, September 1995}} +
-===== Lecture 15 (15.11 Wed.) ===== +
-=== Required (lecture 15): === +
-  * {{p381-qureshi.pdf|Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. Adaptive insertion policies for high performance caching. ISCA '07}} +
-=== Described in detail during lecture 15: === +
-  * {{hpca02.pdf|G. Edward Suh, Srinivas Devadas, and Larry Rudolph. A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. HPCA '02}} +
-  * {{utility-based-partitioning.pdf|Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches. MICRO 2006}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/qureshi_isca06.pdf|Moinuddin K. Qureshi, Daniel N. Lynch, Onur Mutlu, and Yale N. Patt. A Case for MLP-Aware Cache Replacement. ISCA '06}} +
-  * {{optimal_partitioning.pdf|Harold S. Stone, John Turek, and Joel L. Wolf. 1992. Optimal Partitioning of Cache Memory. IEEE TC 1992}} +
-  * {{faircache.pdf|Seongbeom Kim, Dhruba Chandra, and Yan Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. PACT '04}} +
-  * {{gaininginsights.pdf|Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang and P. Sadayappan, "Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems," HPCA'08}} +
-  * {{managingdistributed.pdf|Sangyeun Cho and Lei Jin. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation. MICRO'06}} +
-  * {{cooperativecaching.pdf|Jichuan Chang and Gurindar S. Sohi. 2006. Cooperative Caching for Chip Multiprocessors. ISCA '06}} +
-  * {{adaptive.pdf|M. K. Qureshi, Adaptive Spill-Receive for robust high-performance caching in CMPs, HPCA.2009}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/eaf-cache_pact12.pdf|V. Seshadri, O. Mutlu, M. A. Kozuch and T. C. Mowry, "The evicted-address filter: A unified mechanism to address both cache pollution and thrashing," PACT'12}} +
-=== Suggested (lecture 15): === +
-  * {{reactivenuca.pdf|Nikos Hardavellas, Michael Ferdman, Babak Falsafi, and Anastasia Ailamaki. Reactive NUCA: near-optimal block placement and replication in distributed caches. ISCA '09}} +
-  * {{cqos.pdf|Ravi Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. ICS '04}} +
-  * {{improvingperformanceisolation.pdf|Alexandra Fedorova, Margo Seltzer, and Michael D. Smith. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler. PACT '07}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/acs_asplos09.pdf|M. Aater Suleman, Onur Mutlu, Moinuddin K. Qureshi, and Yale N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. ASPLOS'09}} +
-  * {{p211-kim.pdf|Changkyu Kim, Doug Burger, and Stephen W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS '02}} +
-  * {{p93-tyson.pdf|Gary Tyson, Matthew Farrens, John Matthews, and Andrew R. Pleszkun. A modified approach to data cache management. MICRO'95}} +
-  * {{deadblock.pdf|An-Chow Lai, C. Fide and B. Falsafi, "Dead-block prediction & dead-block correlating prefetchers," ISCA'01}} +
-  * {{p422-bloom.pdf|Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. CACM 1970}} +
-  * {{p315-johnson.pdf|Teresa L. Johnson and Wen-mei W. Hwu. Run-time adaptive cache hierarchy management via reference analysis. ISCA '97}} +
-  * {{piquet.pdf|Thomas Piquet, Olivier Rochecouste, and André Seznec. Exploiting Single-Usage for Effective Memory Management. ACSAC '07}} +
-  * {{p430-wu.pdf|Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon C. Steely, Jr., and Joel Emer. SHiP: signature-based hit predictor for high performance caching. MICRO'11}} +
-  * {{p126-collins.pdf|Jamison D. Collins and Dean M. Tullsen. Hardware identification of cache conflict misses. MICRO'99}} +
-  * {{p208-jaleel.pdf|Aamer Jaleel, William Hasenplaugh, Moinuddin Qureshi, Julien Sebot, Simon Steely, Jr., and Joel Emer. Adaptive insertion policies for managing shared caches. PACT '08}} +
-  * {{p60-jaleel.pdf|Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. High performance cache replacement using re-reference interval prediction (RRIP). ISCA '10}} +
-  * {{p46-dusser.pdf|Julien Dusser, Thomas Piquet, and André Seznec. Zero-content augmented caches. ICS '09}} +
-  * {{zero.pdf|M. M. Islam and P. Stenstrom, "Zero-Value Caches: Cancelling Loads that Return Zero," PACT'09}} +
-  * {{p258-yang.pdf|Jun Yang, Youtao Zhang, and Rajiv Gupta. Frequent value compression in data caches. MICRO'00}} +
-  * {{21430212.pdf|Alaa R. Alameldeen and David A. Wood. Adaptive Cache Compression for High-Performance Processors. ISCA '04}} +
-  * {{c-pack.pdf|X. Chen, L. Yang, R. P. Dick, L. Shang and H. Lekatsas, "C-Pack: A High-Performance Microprocessor Cache Compression Algorithm," T-VLSI'09}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/linearly-compressed-pages_micro13.pdf|Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. Linearly compressed pages: a low-complexity, low-latency main memory compression framework. MICRO'13}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/compression-aware-cache-management_hpca15.pdf|Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip P. Gibbons, Michael A. Kozuch, and Todd C. Mowry, +
-"Exploiting Compressed Block Size as an Indicator of Future Reuse". HPCA'15}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/toggle-aware-compression-for-GPUs_hpca16.pdf|Gennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C. Mowry, and Stephen W. Keckler, "A Case for Toggle-Aware Compression for GPU Systems". HPCA'16}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/caba-gpu-assist-warps_isca15.pdf|Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Abhishek Bhowmick, Rachata Ausavarungnirun, Chita Das, Mahmut Kandemir, Todd C. Mowry, and Onur Mutlu, "A Case for Core-Assisted Bottleneck Acceleration in GPUs: Enabling Flexible Data Compression with Assist Warps". ISCA'15}} +
- +
-===== Lecture 16 (16.11 Thu.) ===== +
-=== Described in detail during lecture 16: === +
-  * {{Amdahl.pdf|Gene M. Amdahl, "Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities". AFIPS 1967}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/acs_asplos09.pdf|M. Aater Suleman, Onur Mutlu, Moinuddin K. Qureshi, and Yale N. Patt. "Accelerating critical section execution with asymmetric multi-core architectures". ASPLOS'09}} +
-  * {{bottleneck-identification-and-scheduling_asplos12.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, "Bottleneck Identification and Scheduling in Multithreaded Applications". ASPLOS'12}} +
-  * {{p441-suleman.pdf|M. Aater Suleman, Onur Mutlu, Jose A. Joao, Khubaib, Yale N. Patt, "Data Marshaling for Multi-Core Architectures". ISCA'10, IEEE Micro Top Picks 2011}} +
-  * {{dk52.pdf|Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen, "Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction". MICRO 2003}} +
-=== Suggested (lecture 16): === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/acs_asplos09.pdf|M. Aater Suleman, Onur Mutlu, Moinuddin K. Qureshi, and Yale N. Patt. "Accelerating critical section execution with asymmetric multi-core architectures". ASPLOS'09}} +
-  * {{bottleneck-identification-and-scheduling_asplos12.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, "Bottleneck Identification and Scheduling in Multithreaded Applications". ASPLOS'12}} +
-  * {{d7ce51c62671d5ffc1506786b0b7861ce00a.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, "Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs". ISCA'13}} +
-  * {{22310236.pdf| Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang, "Best of Both Latency and Throughput". ICCD 2004}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/timber-fine-grained-dram-cache_ieee-cal12.pdf|J. Meza, J. Chang, H. Yoon, O. Mutlu, and P. Ranganathan, "Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management" CAL 2012}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/rowbuffer-aware-caching_iccd12.pdf|H. Yoon, J. Meza, R. Ausavarungnirun, R. Harding, and O. Mutlu, "Row Buffer Locality Aware Caching Policies for Hybrid Memories" ICCD 2012}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/tcm_micro10.pdf|Y. Kim, M. Papamichel, O. Mutlu, M. Harchol-Balter, “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” MICRO 2010}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/raidr-dram-refresh_isca12.pdf|J. Liu, B. Jaiyen, R. Veras, O. Mutlu, "RAIDR: Retention-Aware Intelligent DRAM Refresh," ISCA 2012}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/tldram_hpca13.pdf|D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, O. Mutlu, "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," HPCA 2013}} +
-  * {{2007.TileInterconnection.IEEEMicro.pdf|David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal, "On-Chip Interconnection Architecture of the Tile Processor". IEEE Micro 2007}} +
-  * {{05389044.pdf|J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy, "POWER4 System Microarchitecture". IBM J R&D 2002}} +
-  * {{719990eaab63a6bfa2988b5fd57a03b13229.pdf| Ron Kalla, Balaram Sinharoy, and Joel M. Tendler, "IBM Power5 Chip: A Dual-Core Multithreaded Processor". IEEE Micro 2004}} +
-  * {{ :niagara_a_32-way_multithreaded_sparc_processor.pdf | P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-Way Multithreaded Sparc Processor", IEEE Micro 2005}} +
- +
-===== Lecture 17 (22.11 Wed.) ===== +
-=== Required (lecture 17): === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/mutlu_hpca03.pdf|O. Mutlu, J. Stark, C. Wilkerson, Y.N. Patt, "Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-order Processors," HPCA 2003}} +
-=== Described in detail during lecture 17: === +
-  * {{https://people.inf.ethz.ch/omutlu/pub/mutlu_ieee_micro03.pdf| O. Mutlu, J. Stark, C. Wilkerson, Y.N. Patt, “Runahead Execution: An Effective Alternative to Large Instruction Windows,” IEEE Micro Top Picks 2003}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/mutlu_ieee_micro06.pdf| O. Mutlu, H. Kim, and Y.N. Patt, “Efficient Runahead Execution: Power-Efficient Memory Latency Tolerance,” ISCA 2005, IEEE Micro Top Picks 2006}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/mutlu_ieee_tc06.pdf| O. Mutlu, H. Kim, and Y.N. Patt, “Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses,” IEEE TC 2006}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/armstrong_micro04.pdf| D.N. Armstrong, H. Kim, O. Mutlu, and Y.N. Patt, “Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery,” MICRO 2004}} +
-=== Suggested (lecture 17): ==== +
-  * {{01431565.pdf| M. Annavaram, E. Grochowski, J. Shen, “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005}} +
-  * {{22310236.pdf| Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang, "Best of Both Latency and Throughput". ICCD 2004}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/mutlu_isca05.pdf| O. Mutlu, H. Kim, and Y.N. Patt, “Techniques for Efficient Processing in Runahead Execution Engines,” ISCA 2005}} +
-  * {{p332-chen.pdf| W.-k. Chen, S. Bhansali, T.M. Chilimbi, X. Gao, and W. Chuang, “Profile-guided proactive garbage collection for locality optimization,” PLDI 2006}} +
-  * {{p226-lipasti.pdf| M.K. Lipasti and J.P. Shen, “Exceeding the dataflow limit via value prediction,” MICRO 1996}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/mutlu_wmpi04.pdf| Onur Mutlu, Hyesoon Kim, David N. Armstrong, and Yale N. Patt, “Understanding The Effects of Wrong-Path Memory References on Processor Performance,” WMPI 2004}} +
-  * {{p267-adl-tabatabai.pdf| A.-R. Adl-Tabatabai, R.L. Hudson, M.J. Serrano, S. Subramoney, “Prefetch injection based on hardware monitoring and object metadata,” PLDI 2004}} +
-=== Recommended lectures (lecture 17): === +
-  * {{https://youtu.be/R5G05HstI3A?list=PL5Q2soXY2Zi-IXWTT7xoNYpst5-zdZQ6y| Onur Mutlu, “Lecture 18: Out-of-Order Execution,” Design of Digital Circuits (Spring 2017)}} +
-  * {{https://youtu.be/XE9ogMPEMLw| Onur Mutlu, “Lecture 19: Approaches to Concurrency (until ~35:00),” Design of Digital Circuits (Spring 2017)}} +
- +
-===== Lecture 18 (23.11 Thu.) ===== +
-=== Required (lecture 18): === +
-  * {{1-jouppi.pdf|N. P. Jouppi, "Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers," ISCA '90}} +
-  * {{18-2-joseph-prefetching.pdf| D. Josephand and D. Grunwald, "Prefetching using Markov predictors," ISCA '97}} +
-=== Described in detail during lecture 18: === +
-  * {{18-3-mowry.pdf|T. C. Mowry, M.S. Lam, and A. Gupta, "Design and evaluation of a compiler algorithm for prefetching," ASPLOS 1992}} +
-  * {{18-4-baer.pdf|J. L. Baer and T. F. Chen, "Effective on-chip preloading scheme to reduce data access penalty," SC 1991}} +
-  * {{18-5-Srinath.pdf|S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt, "Feedback directed prefetching: Improving the performance and bandwidth-efficiency of hardware prefetchers," HPCA 2007}} +
-  * {{18-6-cooksey.pdf|R. Cooksey, S. Jourdan, D. Grunwald, "A stateless, content-directed data prefetching mechanism," ASPLOS 2002}} +
-  * {{18-7-ebrahimi.pdf|E. Ebrahimi, O. Mutlu, and Y. N. Patt, "Techniques for bandwidth-efficient prefetching of linked data structures in hybrid prefetching systems," HPCA 2009}} +
-  * {{18-8-luk.pdf|C. K Luk, "Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors," ISCA 2001}} +
-  * {{18-9-zilles.pdf| C. Zilles and G. Sohi, “Understanding the backward slices of performance degrading instructions,” ISCA 2000}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/mutlu_hpca03.pdf|O. Mutlu, J. Stark, C. Wilkerson, and Y.N. Patt, "Runahead execution: An alternative to very large instruction windows for out-of-order processors," HPCA 2003}} +
-   * {{18-10-ebrahimi.pdf|E. Ebrahimi, O. Mutlu, C. J. Joo Lee, and Y. N. Patt, "Coordinated control of multiple prefetchers in multi-core systems," MICRO 2009}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/prefetch-dram_micro08.pdf|C. J. Lee, O. Mutlu, V. Narasiman, and Y. N. Patt, "Prefetch-aware DRAM controllers," MICRO 2008}} +
-=== Suggested (lecture 18): === +
-  * {{18-suggested-ibrahim.pdf|K. Z. Ibrahim, G. T. Byrd, and E. Rotenberg, "Slipstream execution mode for CMP-based multiprocessors," HPCA 2003}} +
-  * {{18-suggested-purser.pdf|Z. Purser, K. Sundaramoorthy, and E. Rotenberg, "A study of slipstream processors," MICRO 2000}} +
-  * {{19-suggested-dubois.pdf| M. Dubois and Y. Song, “Assisted execution,” USC Tech Report 1998}} +
-  * {{18-suggested-chappell.pdf| R. S. Chappell, S. Stark, S. P. Kim, S. K. Reinhardt, Y. N. Patt, “Simultaneous subordinate microthreading (SSMT),” ISCA 1999}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/dram-blp_micro09.pdf|C. J. Lee, V. Narasiman, O. Mutlu, and Y. N. Patt, "Improving memory bank-Level parallelism in the presence of prefetching," MICRO 2009}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/prefetchaware-shared-resources_isca11.pdf|E. Ebrahimi, C. J. Lee, O. Mutlu, and Y. N. Patt, "Prefetch-aware shared resource management for multi-core systems," ISCA 2011}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/informed-caching-for-prefetching_taco15.pdf|V. Seshadri, S. Yedkar, H. Xin, O. Mutlu, P. P. Gibbons, M. A. Kozuch, and T. C. Mowry, "Mitigating prefetcher-caused pollution using informed caching policies for prefetched blocks," TACO 2015}} +
-   * {{https://people.inf.ethz.ch/omutlu/pub/orchestrated-gpgpu-scheduling-prefetching_isca13.pdf|A. Jog, O. Kayiran, A. K. Mishra, M. T. Kandemir, O. Mutlu, R. Iyer, and C. R. Das, "Orchestrated scheduling and prefetching for GPGPUs," ISCA 2013}} +
- +
-===== Lecture 19 (29.11 Wed.) ===== +
-=== Required (lecture 19): === +
-  * {{amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}} +
-  * {{lamport.pdf|L. Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," IEEE Transactions on Computers, 1979}} +
-  * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," ISCA 1984}} +
-=== Suggested (lecture 19): === +
-  * {{flynn.pdf|M. J. Flynn, "Very High-Speed Computing Systems," Proc. of IEEE, 1966}} +
-  * {{multiprocessors-multicomputers.pdf|M. D. Hill, N. P. Jouppi, G. S. Sohi, "Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture.}} +
-  * {{memory_consistency_and_event_ordering_in_scalable_shared-memory_multiprocessors.pdf|K. Gharachorloo, D.  +
-Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors," ISCA 1990}} +
-  * {{two_techniques_to_enhance_the_performanc_of_memory_consistency_models.pdf|K. Gharachorloo, A. Gupta, and J. Hennessy, "Two Techniques to Enhance the Performance of Memory Consistency Models," ICPP 1991}} +
-  * {{bulksc_bulk_enforcement_of_sequential_consistency.pdf|L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas, "BulkSC: bulk enforcement of sequential consistency," ISCA 2007}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/ThyNVM-transparent-crash-consistency-for-persistent-memory_micro15.pdf|J. Ren, J. Zhao, S. Khan, J., Y. Wu, and O. Mutlu, "ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems," MICRO 2015}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/NVMove-byte-based-persistence-tool_inflow16.pdf|H. Chauhan, I. Calciu, V. Chidambaram, E. Schkufza, O. Mutlu, and P. Subrahmanyam, "NVMove: Helping Programmers Move to Byte-Based Persistence," INFLOW 2016}} +
-  * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," IEEE Trans. Computers, 1978}} +
-  * {{using_cache_memory_to_reduce_processor-memory_traffic.pdf|J. R. Goodman, "Using cache memory to reduce processor-memory traffic," ISCA 1983}} +
-  * {{the_sgi_origin_a_ccnuma_highly_scalable_server.pdf|J. Laudon and D. Lenoski, "The SGI Origin: A ccNUMA Highly Scalable Server," ISCA 1997}} +
-  * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," ISCA 2003}} +
-  * {{on_the_inclusion_properties_for_multi-level_cache_hierarchies.pdf|J. Baer and W. Wang, "On the inclusion properties for multi-level cache hierarchies," ISCA 1988}} +
-  * {{designofacomputer_cdc6600.pdf|J. E. Thornton, "CDC 6600: Design of a Computer,” 1970}} +
-  * {{a_pipelined_shared_resource_mimd_computer.pdf | B. J. Smith, "A Pipelined, Shared Resource MIMD Computer", ICPP 1978}} +
-  * {{a_new_method_of_solving_numerical_equations_of_all_orders_by_continuous_.pdf|W. G. Horner, "A new method of solving numerical equations of all orders, by continuous approximation," Philosophical Transactions of the Royal Society, 1819}} +
-  * {{https://people.inf.ethz.ch/omutlu/pub/acs_asplos09.pdf|M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt, "Accelerating critical section execution with asymmetric multi-core architectures," ASPLOS'09}} +
-  * {{co-operating_sequential_processes.pdf|E. W. Dijkstra, "Cooperating Sequential Processes," 1965}} +
-  * {{culler_parcomparch_5.1.pdf|Culler and Singh, Parallel Computer Architecture, Chapter 5.1 (pp 269–283)}} +
-  * {{culler_parcomparch_5.3.pdf|Culler and Singh, Parallel Computer Architecture, Chapter 5.3 (pp 291-305)}} +
-  * {{ph_computerorganizationanddesignthehardwaresoftwareinterface5th_5.10.pdf|P&H, Computer Organization and Design, Chapter 5.10 (pp 466-470)}} +
- +
-===== Lecture 20 (30.11 Thu.) ===== +
-=== Described in detail during lecture 20: === +
-  * {{using_cache_memory_to_reduce_processor-memory_traffic.pdf|J. R. Goodman, "Using cache memory to reduce processor-memory traffic," ISCA 1983}} +
-  * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," ISCA 1984}} +
-  * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," IEEE Trans. Computers, 1978}} +
-  * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," ISCA 2003}}+
readings.1512133372.txt.gz · Last modified: 2019/02/12 16:34 (external edit)