readings
Differences
This shows you the differences between two versions of the page.
readings [2018/11/28 13:41] – [Lecture 18b (22.11 Thu.)] yaglikca | readings [2019/12/12 09:02] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 441: | Line 441: | ||
===== Lecture 19a (28.11 Thu.) ===== | ===== Lecture 19a (28.11 Thu.) ===== | ||
+ | === Described in detail during lecture 19a === | ||
+ | * {{adaptive.pdf|M. K. Qureshi, Adaptive Spill-Receive for robust high-performance caching in CMPs, HPCA.2009}} | ||
+ | * {{p211-kim.pdf|C. Kim, D. Burger, and S. W. Keckler. "An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches," | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{p208-jaleel.pdf|A. Jaleel, W. Hasenplaugh, | ||
+ | * {{https:// | ||
=== Recommended (lecture 19a): === | === Recommended (lecture 19a): === | ||
+ | * {{cooperativecaching.pdf|J. Chang and G. S. Sohi. 2006, " | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{zero.pdf|M. M. Islam and P. Stenstrom, " | ||
+ | * {{p258-yang.pdf|J. Yang, Y. Zhang, and R. Gupta. Frequent value compression in data caches. MICRO' | ||
+ | * {{http:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{p93-tyson.pdf|G. Tyson, M. Farrens, J. Matthews, and A. R. Pleszkun, "A modified approach to data cache management," | ||
+ | * {{deadblock.pdf|A.C. Lai, C. Fide and B. Falsafi, " | ||
+ | * {{p422-bloom.pdf|B.H. Bloom, " | ||
+ | * {{https:// | ||
+ | |||
+ | |||
+ | ===== Lecture 19b (28.11 Thu.) ===== | ||
+ | === Recommended (lecture 19b): === | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{2007.TileInterconnection.IEEEMicro.pdf|D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C. C. Miao, J. F. Brown III, and A. Agarwal, " | ||
+ | |||
+ | |||
+ | ===== Lecture 20 (29.11 Thu.) ===== | ||
+ | === Recommended (lecture 20): === | ||
+ | * {{https:// | ||
+ | * {{bottleneck-identification-and-scheduling_asplos12.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, " | ||
+ | * {{d7ce51c62671d5ffc1506786b0b7861ce00a.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, " | ||
+ | * {{22310236.pdf| Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang, "Best of Both Latency and Throughput" | ||
+ | * {{lecture1-amdahl.pdf|G. M. Amdahl, " | ||
+ | * {{05389044.pdf|J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy, " | ||
+ | * {{719990eaab63a6bfa2988b5fd57a03b13229.pdf| Ron Kalla, Balaram Sinharoy, and Joel M. Tendler, "IBM Power5 Chip: A Dual-Core Multithreaded Processor" | ||
+ | * {{ : | ||
+ | * {{p441-suleman.pdf|M. Aater Suleman, Onur Mutlu, Jose A. Joao, Khubaib, Yale N. Patt, "Data Marshaling for Multi-Core Architectures" | ||
+ | * {{dk52.pdf|Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, | ||
+ | * {{01431565.pdf| M. Annavaram, E. Grochowski, J. Shen, “Mitigating Amdahl’s Law Through EPI Throttling, | ||
+ | * {{http:// | ||
+ | * {{http:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{http:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | |||
+ | ===== Lecture 21 (05.12 Wed.) ===== | ||
+ | === Suggested (lecture 21): === | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{p140-fisher.pdf|J.A. Fisher, “Very Long Instruction Word architectures and the ELI-512,” ISCA 1983}} | ||
+ | * {{Sung_2012.pdf|I.J. Sung, G.D. Liu, W.M. Hwu, "DL: A Data Layout Transformation System for Heterogeneous Computing," | ||
+ | * {{pseudo-randomly_interleaved_memory.pdf|B. R. Rau, " | ||
+ | * {{Braak_2016.pdf|G.J.v.d. Braak, J. Gomez-Luna, | ||
+ | * {{GomezLuna_2013.pdf|J. Gomez-Luna, | ||
+ | * {{GomezLuna_2012.pdf|J. Gomez-Luna, | ||
+ | * {{GomezLuna_2017.pdf|J. Gomez-Luna, I. E. Hajj, L. Chang, V. Garcia-Flores, | ||
+ | |||
+ | ===== Lecture 22 (6.12 Thu.) ===== | ||
+ | === Required (lecture 22): === | ||
+ | * {{lecture1-amdahl.pdf|G. M. Amdahl, " | ||
+ | * {{lamport.pdf|L. Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," | ||
+ | * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," | ||
+ | === Described in detail during lecture 22: === | ||
+ | * {{using_cache_memory_to_reduce_processor-memory_traffic.pdf|J. R. Goodman, "Using cache memory to reduce processor-memory traffic," | ||
+ | * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," | ||
+ | * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," | ||
+ | * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," | ||
+ | === Recommended (lecture 22): === | ||
+ | * {{flynn.pdf|M. J. Flynn, "Very High-Speed Computing Systems," | ||
+ | * {{multiprocessors-multicomputers.pdf|M. D. Hill, N. P. Jouppi, G. S. Sohi, " | ||
+ | * {{memory_consistency_and_event_ordering_in_scalable_shared-memory_multiprocessors.pdf|K. Gharachorloo, | ||
+ | Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, " | ||
+ | * {{two_techniques_to_enhance_the_performanc_of_memory_consistency_models.pdf|K. Gharachorloo, | ||
+ | * {{bulksc_bulk_enforcement_of_sequential_consistency.pdf|L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas, " | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," | ||
+ | * {{using_cache_memory_to_reduce_processor-memory_traffic.pdf|J. R. Goodman, "Using cache memory to reduce processor-memory traffic," | ||
+ | * {{the_sgi_origin_a_ccnuma_highly_scalable_server.pdf|J. Laudon and D. Lenoski, "The SGI Origin: A ccNUMA Highly Scalable Server," | ||
+ | * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," | ||
+ | * {{on_the_inclusion_properties_for_multi-level_cache_hierarchies.pdf|J. Baer and W. Wang, "On the inclusion properties for multi-level cache hierarchies," | ||
+ | * {{designofacomputer_cdc6600.pdf|J. E. Thornton, "CDC 6600: Design of a Computer, | ||
+ | * {{a_pipelined_shared_resource_mimd_computer.pdf | B. J. Smith, "A Pipelined, Shared Resource MIMD Computer", | ||
+ | * {{a_new_method_of_solving_numerical_equations_of_all_orders_by_continuous_.pdf|W. G. Horner, "A new method of solving numerical equations of all orders, by continuous approximation," | ||
+ | * {{https:// | ||
+ | * {{co-operating_sequential_processes.pdf|E. W. Dijkstra, " | ||
+ | * {{culler_parcomparch_5.1.pdf|Culler and Singh, Parallel Computer Architecture, | ||
+ | * {{culler_parcomparch_5.3.pdf|Culler and Singh, Parallel Computer Architecture, | ||
+ | * {{ph_computerorganizationanddesignthehardwaresoftwareinterface5th_5.10.pdf|P& | ||
+ | |||
+ | ===== Lecture 23 (12.12 Wed.) ===== | ||
+ | === Described in detail during lecture 23): === | ||
+ | * {{bless_isca09.pdf|T. Moscibroda and O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks", | ||
+ | |||
+ | === Suggested (lecture 23): === | ||
+ | * {{app-aware-noc_micro09.pdf|R. Das, O. Mutlu, T. Moscibroda, and C. R. Das, " | ||
+ | * {{ultrasparc.pdf|M. Shah, J. Barreh, J. Brooks, R. Golla, G. Grohoski, N. Gura, R. Hetherington, | ||
+ | * {{7d2822e9b7fcd60f147823478b59fcf7569e.pdf|J. H. Patel, " | ||
+ | * {{Ultracomputer.pdf|A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer - Designing an MIMD Shared Memory Parallel Computer", | ||
+ | * {{hierarchical-rings-with-deflection_sbacpad14.pdf|R. Ausavarungnirun, | ||
+ | * {{p272-leiserson.pdf|C.E. Leiserson, Z.S. Abuhamdeh, D.C. Douglas, C.R. Feynman, M.N. Ganmukhi, J.V. Hill, D. Hillis, B.C. Kuszmaul, M.A. St. Pierre, D.S. Wells, M.C. Wong, S.-W. Yang, R. Zak, "The Network Architecture of the Connection Machine CM-5", SPAA 1992}} | ||
+ | * {{seitz_cacm_1985.pdf|C. L. Seitz, "The Cosmic Cube", CACM 1985}} | ||
+ | * {{L8-TurnModel-ISCA92.pdf|C. J. Glass and L. M. Ni, "The Turn Model for Adaptive Routing", | ||
+ | * {{maze-routing_nocs15.pdf|M. Fattah, A. Airola, R. Ausavarungnirun, | ||
+ | * {{Baran64.pdf|P. Baran, "On Distributed Communications Networks", | ||
+ | * {{bufferless_springer14.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, | ||
+ | * {{virtual+channel.pdf|W. J. Dally, " | ||
+ | |||
+ | |||
+ | |||
+ | ===== Lecture 24 (13.12 Thu.) ===== | ||
+ | === Described in detail during lecture 24: === | ||
+ | * {{05749724.pdf|C. Fallin, C. Craik, and O. Mutlu, " | ||
+ | * {{bufferless_springer14.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, | ||
+ | * {{06209256.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, | ||
+ | |||
+ | === Suggested (lecture 24): === | ||
+ | * {{app-aware-noc_micro09.pdf|R. Das, O. Mutlu, T. Moscibroda, and C. R. Das, " | ||
+ | * {{https:// | ||
+ | SBAC-PAD, 2012}} | ||
+ | * {{bless_isca09.pdf|T. Moscibroda and O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks", | ||
+ | * {{06970669.pdf|R. Ausavarungnirun, | ||
+ | * {{1-s2.0-s0167819116000399-main.pdf|R. Ausavarungnirun, | ||
+ | * {{p106-das.pdf|R. Das, O. Mutlu, T. Moscibroda, and C.R. Das, " | ||
+ | * {{https:// | ||
+ | * {{p401-grot.pdf|B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu, " | ||
+ | * {{https:// | ||
+ | * {{http:// |
readings.1543412495.txt.gz · Last modified: 2019/02/12 16:33 (external edit)