This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
readings [2018/12/05 15:44] juang [Lecture 21 (05.12 Wed.)] |
readings [2019/12/12 10:02] (current) |
||
---|---|---|---|
Line 479: | Line 479: | ||
* {{d7ce51c62671d5ffc1506786b0b7861ce00a.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, "Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs". ISCA'13}} | * {{d7ce51c62671d5ffc1506786b0b7861ce00a.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, "Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs". ISCA'13}} | ||
* {{22310236.pdf| Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang, "Best of Both Latency and Throughput". ICCD 2004}} | * {{22310236.pdf| Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang, "Best of Both Latency and Throughput". ICCD 2004}} | ||
- | * {{amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}} | + | * {{lecture1-amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}} |
* {{05389044.pdf|J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy, "POWER4 System Microarchitecture". IBM J R&D 2002}} | * {{05389044.pdf|J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy, "POWER4 System Microarchitecture". IBM J R&D 2002}} | ||
* {{719990eaab63a6bfa2988b5fd57a03b13229.pdf| Ron Kalla, Balaram Sinharoy, and Joel M. Tendler, "IBM Power5 Chip: A Dual-Core Multithreaded Processor". IEEE Micro 2004}} | * {{719990eaab63a6bfa2988b5fd57a03b13229.pdf| Ron Kalla, Balaram Sinharoy, and Joel M. Tendler, "IBM Power5 Chip: A Dual-Core Multithreaded Processor". IEEE Micro 2004}} | ||
Line 508: | Line 508: | ||
* {{GomezLuna_2012.pdf|J. Gomez-Luna, J.M. Gonzalez-Linares, J.I. Benavides, N. Guil, "Performance Models for Asynchronous Data Transfers on Consumer Graphics Processing Units," JPDC, 2012}} | * {{GomezLuna_2012.pdf|J. Gomez-Luna, J.M. Gonzalez-Linares, J.I. Benavides, N. Guil, "Performance Models for Asynchronous Data Transfers on Consumer Graphics Processing Units," JPDC, 2012}} | ||
* {{GomezLuna_2017.pdf|J. Gomez-Luna, I. E. Hajj, L. Chang, V. Garcia-Flores, S. G. de Gonzalo, T. B. Jablin, A. J. Peña, W. Hwu, "Chai: Collaborative heterogeneous applications for integrated-architectures," ISPASS 2017}} | * {{GomezLuna_2017.pdf|J. Gomez-Luna, I. E. Hajj, L. Chang, V. Garcia-Flores, S. G. de Gonzalo, T. B. Jablin, A. J. Peña, W. Hwu, "Chai: Collaborative heterogeneous applications for integrated-architectures," ISPASS 2017}} | ||
+ | |||
+ | ===== Lecture 22 (6.12 Thu.) ===== | ||
+ | === Required (lecture 22): === | ||
+ | * {{lecture1-amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}} | ||
+ | * {{lamport.pdf|L. Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," IEEE Transactions on Computers, 1979}} | ||
+ | * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," ISCA 1984}} | ||
+ | === Described in detail during lecture 22: === | ||
+ | * {{using_cache_memory_to_reduce_processor-memory_traffic.pdf|J. R. Goodman, "Using cache memory to reduce processor-memory traffic," ISCA 1983}} | ||
+ | * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," ISCA 1984}} | ||
+ | * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," IEEE Trans. Computers, 1978}} | ||
+ | * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," ISCA 2003}} | ||
+ | === Recommended (lecture 22): === | ||
+ | * {{flynn.pdf|M. J. Flynn, "Very High-Speed Computing Systems," Proc. of IEEE, 1966}} | ||
+ | * {{multiprocessors-multicomputers.pdf|M. D. Hill, N. P. Jouppi, G. S. Sohi, "Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture.}} | ||
+ | * {{memory_consistency_and_event_ordering_in_scalable_shared-memory_multiprocessors.pdf|K. Gharachorloo, D. | ||
+ | Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors," ISCA 1990}} | ||
+ | * {{two_techniques_to_enhance_the_performanc_of_memory_consistency_models.pdf|K. Gharachorloo, A. Gupta, and J. Hennessy, "Two Techniques to Enhance the Performance of Memory Consistency Models," ICPP 1991}} | ||
+ | * {{bulksc_bulk_enforcement_of_sequential_consistency.pdf|L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas, "BulkSC: bulk enforcement of sequential consistency," ISCA 2007}} | ||
+ | * {{https://people.inf.ethz.ch/omutlu/pub/ThyNVM-transparent-crash-consistency-for-persistent-memory_micro15.pdf|J. Ren, J. Zhao, S. Khan, J., Y. Wu, and O. Mutlu, "ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems," MICRO 2015}} | ||
+ | * {{https://people.inf.ethz.ch/omutlu/pub/NVMove-byte-based-persistence-tool_inflow16.pdf|H. Chauhan, I. Calciu, V. Chidambaram, E. Schkufza, O. Mutlu, and P. Subrahmanyam, "NVMove: Helping Programmers Move to Byte-Based Persistence," INFLOW 2016}} | ||
+ | * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," IEEE Trans. Computers, 1978}} | ||
+ | * {{using_cache_memory_to_reduce_processor-memory_traffic.pdf|J. R. Goodman, "Using cache memory to reduce processor-memory traffic," ISCA 1983}} | ||
+ | * {{the_sgi_origin_a_ccnuma_highly_scalable_server.pdf|J. Laudon and D. Lenoski, "The SGI Origin: A ccNUMA Highly Scalable Server," ISCA 1997}} | ||
+ | * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," ISCA 2003}} | ||
+ | * {{on_the_inclusion_properties_for_multi-level_cache_hierarchies.pdf|J. Baer and W. Wang, "On the inclusion properties for multi-level cache hierarchies," ISCA 1988}} | ||
+ | * {{designofacomputer_cdc6600.pdf|J. E. Thornton, "CDC 6600: Design of a Computer,” 1970}} | ||
+ | * {{a_pipelined_shared_resource_mimd_computer.pdf | B. J. Smith, "A Pipelined, Shared Resource MIMD Computer", ICPP 1978}} | ||
+ | * {{a_new_method_of_solving_numerical_equations_of_all_orders_by_continuous_.pdf|W. G. Horner, "A new method of solving numerical equations of all orders, by continuous approximation," Philosophical Transactions of the Royal Society, 1819}} | ||
+ | * {{https://people.inf.ethz.ch/omutlu/pub/acs_asplos09.pdf|M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt, "Accelerating critical section execution with asymmetric multi-core architectures," ASPLOS'09}} | ||
+ | * {{co-operating_sequential_processes.pdf|E. W. Dijkstra, "Cooperating Sequential Processes," 1965}} | ||
+ | * {{culler_parcomparch_5.1.pdf|Culler and Singh, Parallel Computer Architecture, Chapter 5.1 (pp 269–283)}} | ||
+ | * {{culler_parcomparch_5.3.pdf|Culler and Singh, Parallel Computer Architecture, Chapter 5.3 (pp 291-305)}} | ||
+ | * {{ph_computerorganizationanddesignthehardwaresoftwareinterface5th_5.10.pdf|P&H, Computer Organization and Design, Chapter 5.10 (pp 466-470)}} | ||
+ | |||
+ | ===== Lecture 23 (12.12 Wed.) ===== | ||
+ | === Described in detail during lecture 23): === | ||
+ | * {{bless_isca09.pdf|T. Moscibroda and O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks", ISCA 2009}} | ||
+ | |||
+ | === Suggested (lecture 23): === | ||
+ | * {{app-aware-noc_micro09.pdf|R. Das, O. Mutlu, T. Moscibroda, and C. R. Das, "Application-Aware Prioritization Mechanisms for On-Chip Networks", MICRO 2009}} | ||
+ | * {{ultrasparc.pdf|M. Shah, J. Barreh, J. Brooks, R. Golla, G. Grohoski, N. Gura, R. Hetherington, P. Jordan, M. Luttrell, C. Olson, B. Saha, D. Sheahan, L. Spracklen, and A. Wynn, "UltraSPARC T2: A Highly-Threaded, Power-Efficient, SPARC SOC", ASSCC 2007}} | ||
+ | * {{7d2822e9b7fcd60f147823478b59fcf7569e.pdf|J. H. Patel, "Processor-memory interconnections for multiprocessors", ISCA 1979}} | ||
+ | * {{Ultracomputer.pdf|A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer - Designing an MIMD Shared Memory Parallel Computer", IEEE Trans. on Comp. 1983}} | ||
+ | * {{hierarchical-rings-with-deflection_sbacpad14.pdf|R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. Loh, and O. Mutlu, "Design and Evaluation of Hierarchical Rings with Deflection Routing", SBAC-PAD 2014}} | ||
+ | * {{p272-leiserson.pdf|C.E. Leiserson, Z.S. Abuhamdeh, D.C. Douglas, C.R. Feynman, M.N. Ganmukhi, J.V. Hill, D. Hillis, B.C. Kuszmaul, M.A. St. Pierre, D.S. Wells, M.C. Wong, S.-W. Yang, R. Zak, "The Network Architecture of the Connection Machine CM-5", SPAA 1992}} | ||
+ | * {{seitz_cacm_1985.pdf|C. L. Seitz, "The Cosmic Cube", CACM 1985}} | ||
+ | * {{L8-TurnModel-ISCA92.pdf|C. J. Glass and L. M. Ni, "The Turn Model for Adaptive Routing", ISCA 1992}} | ||
+ | * {{maze-routing_nocs15.pdf|M. Fattah, A. Airola, R. Ausavarungnirun, N. Mirzaei, P. Liljeberg, J. Plosila, S. Mohammadi, T. Pahikkala, O. Mutlu, and H. Tenhunen, "A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-Chips", NOCS 2015}} | ||
+ | * {{Baran64.pdf|P. Baran, "On Distributed Communications Networks", IEEE Trans. Comm., 1964}} | ||
+ | * {{bufferless_springer14.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu, "Bufferless and Minimally-Buffered Deflection Routing", Routing Algorithms in Networks-on-Chip (invited) 2014}} | ||
+ | * {{virtual+channel.pdf|W. J. Dally, "Virtual Channel Flow Control", ISCA 1990}} | ||
+ | |||
+ | |||
+ | |||
+ | ===== Lecture 24 (13.12 Thu.) ===== | ||
+ | === Described in detail during lecture 24: === | ||
+ | * {{05749724.pdf|C. Fallin, C. Craik, and O. Mutlu, "CHIPPER: A Low-Complexity Bufferless Deflection Router", HPCA 2011}} | ||
+ | * {{bufferless_springer14.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu, "Bufferless and Minimally-Buffered Deflection Routing", Routing Algorithms in Networks-on-Chip (invited book chapter), 2014}} | ||
+ | * {{06209256.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu, "MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect", NOCS 2012}} | ||
+ | |||
+ | === Suggested (lecture 24): === | ||
+ | * {{app-aware-noc_micro09.pdf|R. Das, O. Mutlu, T. Moscibroda, and C. R. Das, "Application-Aware Prioritization Mechanisms for On-Chip Networks", MICRO 2009}} | ||
+ | * {{https://people.inf.ethz.ch/omutlu/pub/hetero-adaptive-source-throttling_sbacpad12.pdf|K. Chang, R. Ausavarungnirun, C. Fallin, and O. Mutlu, "HAT: Heterogeneous Adaptive Throttling for On-Chip Networks," | ||
+ | SBAC-PAD, 2012}} | ||
+ | * {{bless_isca09.pdf|T. Moscibroda and O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks", ISCA 2009}} | ||
+ | * {{06970669.pdf|R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H. Loh, and O. Mutlu, "Design and Evaluation of Hierarchical Rings with Deflection Routing", SBAC-PAD 2014}} | ||
+ | * {{1-s2.0-s0167819116000399-main.pdf|R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H.Loh, and O. Mutlu, "A Case for Hierarchical Rings with Deflection Routing: An Energy-Efficient On-Chip Communication Substrate", PARCO 2016}} | ||
+ | * {{p106-das.pdf|R. Das, O. Mutlu, T. Moscibroda, and C.R. Das, "Aergia: Exploiting Packet Latency Slack in On-Chip Networks", ISCA 2010}} | ||
+ | * {{https://people.inf.ethz.ch/omutlu/pub/pvc-qos_micro09.pdf|B. Grot, S.W. Keckler, O. Mutlu, "Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QoS Scheme for Networks-on-Chip", MICRO 2009}} | ||
+ | * {{p401-grot.pdf|B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu, "Kilo-NOC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees", ISCA 2011}} | ||
+ | * {{https://people.inf.ethz.ch/omutlu/pub/onchip-network-congestion-scalability_sigcomm2012.pdf|G. Nychis, C. Fallin, T. Moscibroda, O. Mutlu, and S. Seshan, "On-Chip Networks from a Networking Perspective: Congestion and Scalability in Many-core Interconnects," SIGCOMM, 2012}} | ||
+ | * {{http://users.ece.cmu.edu/~omutlu/pub/noc-congestion_hotnets10.pdf|G. Nychis, C. Falling, T. Moscibroda, O. Mutlu, "Next Generation On-chip Networks: What Kind of Congestion Control Do We Need?" HotNets 2010}} |