readings
Differences
This shows you the differences between two versions of the page.
readings [2017/12/01 13:02] – [Lecture 19 (29.11 Wed.)] mohammad | readings [2019/02/12 16:35] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 2: | Line 2: | ||
====== Readings ====== | ====== Readings ====== | ||
- | ===== Guides on how to review papers critically ===== | ||
- | * Lecture slides: {{onur-CompArch-f17-how-to-do-the-paper-reviews.pdf | pdf}} {{onur-CompArch-f17-how-to-do-the-paper-reviews.ppt | Slides ppt}} | ||
- | * Example reviews on "Main Memory Scaling: Challenges and Solution Directions" | ||
- | * {{review-chapter.pdf | Review 1}} | ||
- | * {{review-chapter-2.pdf | Review 2}} | ||
- | * Example review on " | ||
- | * {{review-sms.pdf | Review 1}} | ||
- | ===== Lecture 1 (20.09 Wed.) ===== | + | ==== Papers for Review |
- | === Described in detail during lecture 1: === | + | {{ :paper_review_guidelines.pdf |Paper Review Guidelines}} |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{p422-bloom.pdf|B.H. Bloom, " | + | |
- | * {{https:// | + | |
- | === Suggested (lecture 1): === | + | |
- | | + | * {{p105-ahn.pdf |Ahn et al., "A Scalable Processing-in-Memory |
- | * [[http:// | + | * {{parbs_isca08-old.pdf |Mutlu |
- | * {{p128-rixner.pdf|S. Rixner, W.J. Dally, U.J. Kapasi, P. Mattson, J.D. Owens, | + | |
- | * {{US5630096.pdf|Zuravleff and Robinson. " | + | |
- | * {{https:// | + | |
- | | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{10.1007-978-3-319-40667-1_15.pdf| D. Gruss, C. Maurice, S. Mangard, " | + | |
- | * {{p1675-van-der-veen.pdf| V. van der Veen, Y. Fratantonio, | + | |
- | * {{p382-lamport.pdf| L. Lamport, R. Showtak, M. Pease, "The Byzantine Generals Problem," | + | |
- | * {{https:// | + | |
- | * {{bstj29-2-147.pdf|R.W. Hamming. "Error Detecting and Error Correcting Codes" | + | |
- | ===== Lecture 2 (21.09 Thu.) ===== | ||
- | === Required (lecture 2): === | ||
- | * {{patt_ieee2001.pdf|Y.N. Patt. " | ||
- | === Required for review as part of HW1: === | + | ==== Other Referenced Readings |
- | * {{https:// | + | For many other readings covered |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | + | ||
- | === Described in detail during lecture 2: === | + | |
- | * {{gordon_moore_1965_article.pdf| G.E. Moore. " | + | |
- | * {{https:// | + | |
- | * {{Burks_vonNeumann.pdf| A.W. Burks, H.H. Goldstein, J. von Neumann, " | + | |
- | * {{04_chapter_4.pdf| Y.N. Patt and S.J. Patel, " | + | |
- | * {{p126-dennis.pdf | J.B. Dennis, D. Misunas, "A preliminary architecture for a basic data-flow processor," | + | |
- | * {{Wilkes_1965.pdf| M.V. Wilkes, "Slave Memories and Dynamic Storage Allocation," | + | |
- | + | ||
- | === Suggested | + | |
- | * {{Amdahl_1964.pdf| G.M. Amdahl, G.A. Blaauw, F.P. Brooks. " | + | |
- | * {{p34-gurd-2.pdf| J.R. Gurd, C.C. Kirkham, I. Watson, " | + | |
- | * {{P& | + | |
- | * {{Hamacher_Ch8_2012.pdf| C. Hamacher, Z. Vranesic, S. Zaky, N. Manjikian, " | + | |
- | * {{liptay68.pdf| J.S. Liptay, " | + | |
- | * {{p435-fotheringham.pdf| J. Fotheringham, | + | |
- | * {{Bloom62.pdf| L. Bloom, M. Cohen, S. Porter, " | + | |
- | + | ||
- | ===== Lecture 3 (27.09 Wed.) ===== | + | |
- | === Required | + | |
- | * {{https:// | + | |
- | + | ||
- | === Described in detail during lecture 3: === | + | |
- | * {{Belady_IBM1966.pdf| L.A. Belady, “A study of replacement algorithms for a virtual- storage computer, | + | |
- | * {{npjouppi_ISCA1990.pdf| N.P. Jouppi, “Improving Direct-Mapped Cache Performance by the Addition of a Small Fully-Associative Cache and Prefetch Buffers,” ISCA 1990}} | + | |
- | * {{p169-seznec.pdf| A. Seznec, “A Case for Two-Way Skewed-Associative Caches,” ISCA 1993}} | + | |
- | * {{p81-kroft.pdf| D. Kroft, “Lockup-Free Instruction Fetch/ | + | |
- | + | ||
- | === Suggested (lecture 3): === | + | |
- | * {{andrew_glew.pdf| A. Glew, “MLP Yes! ILP No!,” ASPLOS Wild and Crazy Ideas Session 1998}} | + | |
- | * {{p381-qureshi.pdf| M.K. Qureshi, A. Jaleel, Y.N. Patt, S.C. Steely, J. Emer, “Adaptive Insertion Policies for High Performance Caching”, ISCA 2007}} | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{Kharbutli_HPCA2004.pdf| M. Kharbutli, K. Irwin, Y. Solihin, J. Lee, "Using prime numbers for cache indexing to eliminate conflict misses," | + | |
- | + | ||
- | ===== Lecture 4 (28.09 Thu.) ===== | + | |
- | === Described in detail during lecture 4: === | + | |
- | * {{https:// | + | |
- | * {{isca09-disaggregate.pdf|K. Lim, J. Chang, T. Mudge, P. Ranganathan, | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | + | ||
- | === Suggested (lecture 4): === | + | |
- | * {{andrew_glew.pdf| A. Glew, “MLP Yes! ILP No!,” ASPLOS Wild and Crazy Ideas Session 1998}} | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{stupid_architects_look_to_future.pdf|R. Sites, " | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{kang-memoryforum14.pdf|U. Kang, H.-S. Yu, C. Park, H. Zheng, J. Halbert, K. Bains, S. Jang, J.S. Choi, " | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{ekman-ISCA05.pdf|M. Ekman and P. Stenstrom, "A Robust Main-Memory Compression Scheme," | + | |
- | * {{PCM_IBMJRD.pdf|S. Raoux, G. W. Burr, M. J. Breitwisch, C. T. Rentner, Y.-C. Chen, R. M. Shelby, M. Salinga, D. Krebs, S.-H. Chen, H.-L. Lung, C. H. Lam, " | + | |
- | * {{https:// | + | |
- | * {{chandra.pdf|T. Chandra, " | + | |
- | * [[https:// | + | |
- | + | ||
- | ===== Lecture 5 (04.10 Wed.) ===== | + | |
- | === Required (lecture 5): === | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | === Described in detail during lecture 5: === | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | === Suggested (lecture 5): === | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{pseudo-randomly_interleaved_memory.pdf|B. R. Rau, " | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{quantifying_the_performance_impact_of_memory_latency_and_bandwidth_for_big_data_workloads.pdf|R. Clapp, M. Dimitrov, K. Kumar, V. Viswanathan, | + | |
- | * {{graph_processing_on_gpus_where_are_the_bottlenecks.pdf|Q. Xu, H. Jeon, and M. Annavaram, "Graph Processing on GPUs: Where are the Bottlenecks?," | + | |
- | * {{preprint-hybridbfs-fpl15.pdf|Y. Umuroglu, D. Morrison, and M. Jahre, " | + | |
- | * {{identifying_the_potential_of_near_data_computing_for_apache_spark.pdf|A. J. Awan, V. Vlassov, E. Ayguade, and M. Brorsson, " | + | |
- | * {{profiling_a_warehouse-scale_computer.pdf|S. Kanev, J. P. Darago, K. M. Hazelwood, P. Ranganathan, | + | |
- | + | ||
- | ===== Lecture 6 (05.10 Thu.) ===== | + | |
- | === Required (lecture 6): === | + | |
- | * {{https:// | + | |
- | + | ||
- | === Described in detail during lecture 6: === | + | |
- | * {{https:// | + | |
- | *{{https:// | + | |
- | *{{https:// | + | |
- | *{{https:// | + | |
- | *{{https:// | + | |
- | *{{https:// | + | |
- | *{{https:// | + | |
- | + | ||
- | === Suggested (lecture 6): === | + | |
- | *{{p163-elsayed.pdf|N. El-Sayed, I. Stefanovici, | + | |
- | *{{https:// | + | |
- | *{{lin_ISCA2007.pdf|J. Lin, H. Zheng, Z. Zhu, H. David, Z. Zhang, " | + | |
- | *{{Zhu_ITHERM2008.pdf|Q. Zhu, X. Li, Y. Wu, " | + | |
- | *{{Ware_HPCA2010.pdf|M.S. Ware, K. Rajamani, M.S. Floyd, B.Brock, J.C. Rubio, F.L. Rawson III, J.B. Carter, " | + | |
- | *{{Paul_ISCA2015.pdf|I. Paul, W. Huang, M. Arora, S. Yalamanchili, | + | |
- | * {{Burks_vonNeumann.pdf| A.W. Burks, H.H. Goldstein, J. von Neumann, " | + | |
- | * {{04_chapter_4.pdf| Y.N. Patt and S.J. Patel, " | + | |
- | *{{https:// | + | |
- | *{{https:// | + | |
- | *{{https:// | + | |
- | * {{profiling_a_warehouse-scale_computer.pdf|S. Kanev, J. P. Darago, K. M. Hazelwood, P. Ranganathan, | + | |
- | *{{PR_1999-66.pdf|L. Page, S. Brin, R. Motwani, T. Winograd, "The PageRank citation ranking: Bringing order to the web," Stanford Digital Library Technologies Project 1998}} | + | |
- | + | ||
- | ===== Lecture 7 (11.10 Wed.) ===== | + | |
- | === Required (lecture 7): === | + | |
- | * {{https:// | + | |
- | === Described in detail during lecture 7: === | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | O' | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{scalablehigh-performancemainmemorysystemusingphase-changememorytechnology.pdf | Moinuddin K. Qureshi, Viji Srinivasan, and Jude A. Rivers " | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | === Suggested (lecture 7): === | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{phase-changetechnologyandthefutureofmainmemory.pdf|B. C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, E. Ipek, O. Mutlu, and D. Burger " | + | |
- | * {{pdram_ahybridpramanddrammainmemorysystem.pdf | G. Dhiman, R. Ayoub, T. Rosing " | + | |
- | * {{https:// | + | |
- | + | ||
- | ===== Lecture 8 (18.10 Wed.) ===== | + | |
- | === Suggested (lecture 8): === | + | |
- | * {{Flynn_1966.pdf|M.J. Flynn, “Very high-speed computing systems,” Proc. of IEEE 1966}} | + | |
- | * {{p140-fisher.pdf|J.A.Fisher, | + | |
- | * {{p63-russell.pdf|R.M. Russell, "The CRAY-1 computer system,” CACM 1978}} | + | |
- | * {{mmx_technology_1996.pdf|A. Peleg and U. Weiser, "MMX technology extension to the Intel architecture, | + | |
- | + | ||
- | ===== Lecture 10 (25.10 Wed.) ===== | + | |
- | === Required (lecture 10): === | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | === Suggested (lecture 10): === | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ :jsmith.pdf | J. Smith, "A study of Branch Prediction Strategies" | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ :p4-lee.pdf | C. Lee et al., "The Bi-Mode Branch Predictor" | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ :ssmt.pdf | R. Chappell et al., " | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | and superscalar compilation" | + | |
- | + | ||
- | ===== Lecture 11 (26.10 Thu.) ===== | + | |
- | === Suggested (lecture 11): === | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ :: | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | * {{ : | + | |
- | + | ||
- | ===== Lecture 12 (01.11 Wed.) ===== | + | |
- | === Described in detail during lecture 12: === | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | + | ||
- | === Suggested (lecture 12): === | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | + | ||
- | ===== Lecture 13 (02.11 Thu.) ===== | + | |
- | === Described in detail during lecture 13: === | + | |
- | * {{https:// | + | |
- | in Shared Main Memory Systems," | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | === Suggested (lecture 13): === | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{per-thread_cycle_accounting_in_multicore_processors.pdf|K. D. Bois, S. Eyerman, L. Eeckhout, " | + | |
- | * {{qos_policies_and_architecture_for_cache_memory_in_cmp_platforms.pdf|R. Iyer, L. Zhao, F. Guo, R. Illikkal, S. Makineni, D. Newell, Y. Solihin, L. Hsu, S. Reinhardt, "QoS policies and architecture for cache/ | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{memory_resource_management_in_vmware_esx_server.pdf|C. A. Waldspurger, | + | |
- | * {{lottery_scheduling_flexible_proportional-share_resource_management.pdf|C. A. Waldspurger and W. E. Weihl, " | + | |
- | * {{stride_scheduling_deterministic_proportional-share_resource_management.pdf|C. A. Waldspurger and W. E. Weihl, " | + | |
- | * {{lottery_and_stride_scheduling_flexible_proportional-share_resource_management.pdf|C. A. Waldspurger, | + | |
- | ===== Lecture 15 (15.11 Wed.) ===== | + | |
- | === Required (lecture 15): === | + | |
- | * {{p381-qureshi.pdf|Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. Adaptive insertion policies for high performance caching. ISCA ' | + | |
- | === Described in detail during lecture 15: === | + | |
- | * {{hpca02.pdf|G. Edward Suh, Srinivas Devadas, and Larry Rudolph. A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning. HPCA ' | + | |
- | * {{utility-based-partitioning.pdf|Moinuddin K. Qureshi and Yale N. Patt. 2006. Utility-Based Cache Partitioning: | + | |
- | * {{https:// | + | |
- | * {{optimal_partitioning.pdf|Harold S. Stone, John Turek, and Joel L. Wolf. 1992. Optimal Partitioning of Cache Memory. IEEE TC 1992}} | + | |
- | * {{faircache.pdf|Seongbeom Kim, Dhruba Chandra, and Yan Solihin. Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture. PACT ' | + | |
- | * {{gaininginsights.pdf|Jiang Lin, Qingda Lu, Xiaoning Ding, Zhao Zhang, Xiaodong Zhang and P. Sadayappan, " | + | |
- | * {{managingdistributed.pdf|Sangyeun Cho and Lei Jin. Managing Distributed, | + | |
- | * {{cooperativecaching.pdf|Jichuan Chang and Gurindar S. Sohi. 2006. Cooperative Caching for Chip Multiprocessors. ISCA ' | + | |
- | * {{adaptive.pdf|M. K. Qureshi, Adaptive Spill-Receive for robust high-performance caching in CMPs, HPCA.2009}} | + | |
- | * {{https:// | + | |
- | === Suggested (lecture 15): === | + | |
- | * {{reactivenuca.pdf|Nikos Hardavellas, | + | |
- | * {{cqos.pdf|Ravi Iyer. CQoS: a framework for enabling QoS in shared caches of CMP platforms. ICS ' | + | |
- | * {{improvingperformanceisolation.pdf|Alexandra Fedorova, Margo Seltzer, and Michael D. Smith. Improving Performance Isolation on Chip Multiprocessors via an Operating System Scheduler. PACT ' | + | |
- | * {{https:// | + | |
- | * {{p211-kim.pdf|Changkyu Kim, Doug Burger, and Stephen W. Keckler. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches. ASPLOS ' | + | |
- | * {{p93-tyson.pdf|Gary Tyson, Matthew Farrens, John Matthews, and Andrew R. Pleszkun. A modified approach to data cache management. MICRO' | + | |
- | * {{deadblock.pdf|An-Chow Lai, C. Fide and B. Falsafi, " | + | |
- | * {{p422-bloom.pdf|Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. CACM 1970}} | + | |
- | * {{p315-johnson.pdf|Teresa L. Johnson and Wen-mei W. Hwu. Run-time adaptive cache hierarchy management via reference analysis. ISCA ' | + | |
- | * {{piquet.pdf|Thomas Piquet, Olivier Rochecouste, | + | |
- | * {{p430-wu.pdf|Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, | + | |
- | * {{p126-collins.pdf|Jamison D. Collins and Dean M. Tullsen. Hardware identification of cache conflict misses. MICRO' | + | |
- | * {{p208-jaleel.pdf|Aamer Jaleel, William Hasenplaugh, | + | |
- | * {{p60-jaleel.pdf|Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. High performance cache replacement using re-reference interval prediction (RRIP). ISCA ' | + | |
- | * {{p46-dusser.pdf|Julien Dusser, Thomas Piquet, and André Seznec. Zero-content augmented caches. ICS ' | + | |
- | * {{zero.pdf|M. M. Islam and P. Stenstrom, " | + | |
- | * {{p258-yang.pdf|Jun Yang, Youtao Zhang, and Rajiv Gupta. Frequent value compression in data caches. MICRO' | + | |
- | * {{21430212.pdf|Alaa R. Alameldeen and David A. Wood. Adaptive Cache Compression for High-Performance Processors. ISCA ' | + | |
- | * {{c-pack.pdf|X. Chen, L. Yang, R. P. Dick, L. Shang and H. Lekatsas, " | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | " | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | + | ||
- | ===== Lecture 16 (16.11 Thu.) ===== | + | |
- | === Described in detail during lecture 16: === | + | |
- | * {{Amdahl.pdf|Gene M. Amdahl, " | + | |
- | * {{https:// | + | |
- | * {{bottleneck-identification-and-scheduling_asplos12.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, " | + | |
- | * {{p441-suleman.pdf|M. Aater Suleman, Onur Mutlu, Jose A. Joao, Khubaib, Yale N. Patt, "Data Marshaling for Multi-Core Architectures" | + | |
- | * {{dk52.pdf|Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, | + | |
- | === Suggested (lecture 16): === | + | |
- | * {{https:// | + | |
- | * {{bottleneck-identification-and-scheduling_asplos12.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, " | + | |
- | * {{d7ce51c62671d5ffc1506786b0b7861ce00a.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, " | + | |
- | * {{22310236.pdf| Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang, "Best of Both Latency and Throughput" | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{2007.TileInterconnection.IEEEMicro.pdf|David Wentzlaff, Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal, " | + | |
- | * {{05389044.pdf|J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy, " | + | |
- | * {{719990eaab63a6bfa2988b5fd57a03b13229.pdf| Ron Kalla, Balaram Sinharoy, and Joel M. Tendler, "IBM Power5 Chip: A Dual-Core Multithreaded Processor" | + | |
- | * {{ : | + | |
- | + | ||
- | ===== Lecture 17 (22.11 Wed.) ===== | + | |
- | === Required (lecture 17): === | + | |
- | * {{https:// | + | |
- | === Described in detail during lecture 17: === | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | === Suggested (lecture 17): ==== | + | |
- | * {{01431565.pdf| M. Annavaram, E. Grochowski, J. Shen, “Mitigating Amdahl’s Law Through EPI Throttling, | + | |
- | * {{22310236.pdf| Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang, "Best of Both Latency and Throughput" | + | |
- | * {{https:// | + | |
- | * {{p332-chen.pdf| W.-k. Chen, S. Bhansali, T.M. Chilimbi, X. Gao, and W. Chuang, “Profile-guided proactive garbage collection for locality optimization, | + | |
- | * {{p226-lipasti.pdf| M.K. Lipasti and J.P. Shen, “Exceeding the dataflow limit via value prediction, | + | |
- | * {{https:// | + | |
- | * {{p267-adl-tabatabai.pdf| A.-R. Adl-Tabatabai, | + | |
- | === Recommended lectures (lecture 17): === | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | + | ||
- | ===== Lecture 18 (23.11 Thu.) ===== | + | |
- | === Required (lecture 18): === | + | |
- | * {{1-jouppi.pdf|N. P. Jouppi, " | + | |
- | * {{18-2-joseph-prefetching.pdf| D. Josephand and D. Grunwald, " | + | |
- | === Described in detail during lecture 18: === | + | |
- | * {{18-3-mowry.pdf|T. C. Mowry, M.S. Lam, and A. Gupta, " | + | |
- | * {{18-4-baer.pdf|J. L. Baer and T. F. Chen, " | + | |
- | * {{18-5-Srinath.pdf|S. Srinath, O. Mutlu, H. Kim, and Y. N. Patt, " | + | |
- | * {{18-6-cooksey.pdf|R. Cooksey, S. Jourdan, D. Grunwald, "A stateless, content-directed data prefetching mechanism," | + | |
- | * {{18-7-ebrahimi.pdf|E. Ebrahimi, O. Mutlu, and Y. N. Patt, " | + | |
- | * {{18-8-luk.pdf|C. K Luk, " | + | |
- | * {{18-9-zilles.pdf| C. Zilles and G. Sohi, “Understanding the backward slices of performance degrading instructions, | + | |
- | * {{https:// | + | |
- | * {{18-10-ebrahimi.pdf|E. Ebrahimi, O. Mutlu, C. J. Joo Lee, and Y. N. Patt, " | + | |
- | * {{https:// | + | |
- | === Suggested (lecture 18): === | + | |
- | * {{18-suggested-ibrahim.pdf|K. Z. Ibrahim, G. T. Byrd, and E. Rotenberg, " | + | |
- | * {{18-suggested-purser.pdf|Z. Purser, K. Sundaramoorthy, | + | |
- | * {{19-suggested-dubois.pdf| M. Dubois and Y. Song, “Assisted execution, | + | |
- | * {{18-suggested-chappell.pdf| R. S. Chappell, S. Stark, S. P. Kim, S. K. Reinhardt, Y. N. Patt, “Simultaneous subordinate microthreading (SSMT),” ISCA 1999}} | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | + | ||
- | ===== Lecture 19 (29.11 Wed.) ===== | + | |
- | === Required (lecture 19): === | + | |
- | * {{amdahl.pdf|G. M. Amdahl, " | + | |
- | * {{lamport.pdf|L. Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," | + | |
- | * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," | + | |
- | === Suggested (lecture 19): === | + | |
- | * {{flynn.pdf|M. J. Flynn, "Very High-Speed Computing Systems," | + | |
- | * {{multiprocessors-multicomputers.pdf|M. D. Hill, N. P. Jouppi, G. S. Sohi, " | + | |
- | * {{memory_consistency_and_event_ordering_in_scalable_shared-memory_multiprocessors.pdf|K. Gharachorloo, | + | |
- | Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, " | + | |
- | * {{two_techniques_to_enhance_the_performanc_of_memory_consistency_models.pdf|K. Gharachorloo, | + | |
- | * {{bulksc_bulk_enforcement_of_sequential_consistency.pdf|L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas, " | + | |
- | * {{https:// | + | |
- | * {{https:// | + | |
- | * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," | + | |
- | * {{using_cache_memory_to_reduce_processor-memory_traffic.pdf|J. R. Goodman, "Using cache memory to reduce processor-memory traffic," | + | |
- | * {{the_sgi_origin_a_ccnuma_highly_scalable_server.pdf|J. Laudon and D. Lenoski, "The SGI Origin: A ccNUMA Highly Scalable Server," | + | |
- | * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," | + | |
- | * {{on_the_inclusion_properties_for_multi-level_cache_hierarchies.pdf|J. Baer and W. Wang, "On the inclusion properties for multi-level cache hierarchies," | + | |
- | * {{designofacomputer_cdc6600.pdf|J. E. Thornton, "CDC 6600: Design of a Computer, | + | |
- | * {{a_pipelined_shared_resource_mimd_computer.pdf | B. J. Smith, "A Pipelined, Shared Resource MIMD Computer", | + | |
- | * {{a_new_method_of_solving_numerical_equations_of_all_orders_by_continuous_.pdf|W. G. Horner, "A new method of solving numerical equations of all orders, by continuous approximation," | + | |
- | * {{https:// | + | |
- | * {{co-operating_sequential_processes.pdf|E. W. Dijkstra, " | + | |
- | * {{culler_parcomparch_5.1.pdf|Culler and Singh, Parallel Computer Architecture, | + | |
- | * {{culler_parcomparch_5.3.pdf|Culler and Singh, Parallel Computer Architecture, | + | |
- | * {{ph_computerorganizationanddesignthehardwaresoftwareinterface5th_5.10.pdf|P& | + | |
- | + | ||
- | ===== Lecture 20 (30.11 Thu.) ===== | + | |
- | === Described in detail during lecture 20: === | + | |
- | * {{using_cache_memory_to_reduce_processor-memory_traffic.pdf|J. R. Goodman, "Using cache memory to reduce processor-memory traffic," | + | |
- | * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," | + | |
- | * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," | + | |
- | * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," | + |
readings.1512133372.txt.gz · Last modified: 2019/02/12 16:34 (external edit)