User Tools

Site Tools


readings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

readings [2018/12/06 08:26] – [Lecture 21 (05.12 Wed.)] alsermreadings [2019/12/12 09:02] (current) – external edit 127.0.0.1
Line 479: Line 479:
   * {{d7ce51c62671d5ffc1506786b0b7861ce00a.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, "Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs". ISCA'13}}   * {{d7ce51c62671d5ffc1506786b0b7861ce00a.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, "Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs". ISCA'13}}
   * {{22310236.pdf| Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang, "Best of Both Latency and Throughput". ICCD 2004}}   * {{22310236.pdf| Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang, "Best of Both Latency and Throughput". ICCD 2004}}
-  * {{amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}}+  * {{lecture1-amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}}
   * {{05389044.pdf|J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy, "POWER4 System Microarchitecture". IBM J R&D 2002}}   * {{05389044.pdf|J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy, "POWER4 System Microarchitecture". IBM J R&D 2002}}
   * {{719990eaab63a6bfa2988b5fd57a03b13229.pdf| Ron Kalla, Balaram Sinharoy, and Joel M. Tendler, "IBM Power5 Chip: A Dual-Core Multithreaded Processor". IEEE Micro 2004}}   * {{719990eaab63a6bfa2988b5fd57a03b13229.pdf| Ron Kalla, Balaram Sinharoy, and Joel M. Tendler, "IBM Power5 Chip: A Dual-Core Multithreaded Processor". IEEE Micro 2004}}
Line 511: Line 511:
 ===== Lecture 22 (6.12 Thu.) ===== ===== Lecture 22 (6.12 Thu.) =====
 === Required (lecture 22): === === Required (lecture 22): ===
-  * {{amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}}+  * {{lecture1-amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}}
   * {{lamport.pdf|L. Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," IEEE Transactions on Computers, 1979}}   * {{lamport.pdf|L. Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," IEEE Transactions on Computers, 1979}}
   * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," ISCA 1984}}   * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," ISCA 1984}}
Line 519: Line 519:
   * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," IEEE Trans. Computers, 1978}}   * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," IEEE Trans. Computers, 1978}}
   * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," ISCA 2003}}   * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," ISCA 2003}}
-=== Suggested (lecture 22): ===+=== Recommended (lecture 22): ===
   * {{flynn.pdf|M. J. Flynn, "Very High-Speed Computing Systems," Proc. of IEEE, 1966}}   * {{flynn.pdf|M. J. Flynn, "Very High-Speed Computing Systems," Proc. of IEEE, 1966}}
   * {{multiprocessors-multicomputers.pdf|M. D. Hill, N. P. Jouppi, G. S. Sohi, "Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture.}}   * {{multiprocessors-multicomputers.pdf|M. D. Hill, N. P. Jouppi, G. S. Sohi, "Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture.}}
Line 541: Line 541:
   * {{culler_parcomparch_5.3.pdf|Culler and Singh, Parallel Computer Architecture, Chapter 5.3 (pp 291-305)}}   * {{culler_parcomparch_5.3.pdf|Culler and Singh, Parallel Computer Architecture, Chapter 5.3 (pp 291-305)}}
   * {{ph_computerorganizationanddesignthehardwaresoftwareinterface5th_5.10.pdf|P&H, Computer Organization and Design, Chapter 5.10 (pp 466-470)}}   * {{ph_computerorganizationanddesignthehardwaresoftwareinterface5th_5.10.pdf|P&H, Computer Organization and Design, Chapter 5.10 (pp 466-470)}}
 +
 +===== Lecture 23 (12.12 Wed.) =====
 +=== Described in detail during lecture 23): ===
 +  * {{bless_isca09.pdf|T. Moscibroda and O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks", ISCA 2009}}
 +
 +=== Suggested (lecture 23): ===
 +  * {{app-aware-noc_micro09.pdf|R. Das, O. Mutlu, T. Moscibroda, and C. R. Das, "Application-Aware Prioritization Mechanisms for On-Chip Networks", MICRO 2009}}
 +  * {{ultrasparc.pdf|M. Shah, J. Barreh, J. Brooks, R. Golla, G. Grohoski, N. Gura, R. Hetherington, P. Jordan, M. Luttrell, C. Olson, B. Saha, D. Sheahan, L. Spracklen, and A. Wynn, "UltraSPARC T2: A Highly-Threaded, Power-Efficient, SPARC SOC", ASSCC 2007}}
 +  * {{7d2822e9b7fcd60f147823478b59fcf7569e.pdf|J. H. Patel, "Processor-memory interconnections for multiprocessors", ISCA 1979}}
 +  * {{Ultracomputer.pdf|A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer - Designing an MIMD Shared Memory Parallel Computer", IEEE Trans. on Comp. 1983}}
 +  * {{hierarchical-rings-with-deflection_sbacpad14.pdf|R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. Loh, and O. Mutlu, "Design and Evaluation of Hierarchical Rings with Deflection Routing", SBAC-PAD 2014}}
 +  * {{p272-leiserson.pdf|C.E. Leiserson, Z.S. Abuhamdeh, D.C. Douglas, C.R. Feynman, M.N. Ganmukhi, J.V. Hill, D. Hillis, B.C. Kuszmaul, M.A. St. Pierre, D.S. Wells, M.C. Wong, S.-W. Yang, R. Zak, "The Network Architecture of the Connection Machine CM-5", SPAA 1992}}
 +  * {{seitz_cacm_1985.pdf|C. L. Seitz, "The Cosmic Cube", CACM 1985}}
 +  * {{L8-TurnModel-ISCA92.pdf|C. J. Glass and L. M. Ni, "The Turn Model for Adaptive Routing", ISCA 1992}}
 +  * {{maze-routing_nocs15.pdf|M. Fattah, A. Airola, R. Ausavarungnirun, N. Mirzaei, P. Liljeberg, J. Plosila, S. Mohammadi, T. Pahikkala, O. Mutlu, and H. Tenhunen, "A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-Chips", NOCS 2015}}
 +  * {{Baran64.pdf|P. Baran, "On Distributed Communications Networks", IEEE Trans. Comm., 1964}}
 +  * {{bufferless_springer14.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu, "Bufferless and Minimally-Buffered Deflection Routing", Routing Algorithms in Networks-on-Chip (invited) 2014}}
 +  * {{virtual+channel.pdf|W. J. Dally, "Virtual Channel Flow Control", ISCA 1990}}
 +
 +
 +
 +===== Lecture 24 (13.12 Thu.) =====
 +=== Described in detail during lecture 24: ===
 +  * {{05749724.pdf|C. Fallin, C. Craik, and O. Mutlu, "CHIPPER: A Low-Complexity Bufferless Deflection Router", HPCA 2011}}
 +  * {{bufferless_springer14.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu, "Bufferless and Minimally-Buffered Deflection Routing", Routing Algorithms in Networks-on-Chip (invited book chapter), 2014}}
 +  * {{06209256.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu, "MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect", NOCS 2012}}
 +
 +=== Suggested (lecture 24): ===
 +  * {{app-aware-noc_micro09.pdf|R. Das, O. Mutlu, T. Moscibroda, and C. R. Das, "Application-Aware Prioritization Mechanisms for On-Chip Networks", MICRO 2009}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/hetero-adaptive-source-throttling_sbacpad12.pdf|K. Chang, R. Ausavarungnirun, C. Fallin, and O. Mutlu, "HAT: Heterogeneous Adaptive Throttling for On-Chip Networks,"
 +SBAC-PAD, 2012}}
 +  * {{bless_isca09.pdf|T. Moscibroda and O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks", ISCA 2009}}
 +  * {{06970669.pdf|R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H. Loh, and O. Mutlu, "Design and Evaluation of Hierarchical Rings with Deflection Routing", SBAC-PAD 2014}}
 +  * {{1-s2.0-s0167819116000399-main.pdf|R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H.Loh, and O. Mutlu, "A Case for Hierarchical Rings with Deflection Routing: An Energy-Efficient On-Chip Communication Substrate", PARCO 2016}}
 +  * {{p106-das.pdf|R. Das, O. Mutlu, T. Moscibroda, and C.R. Das, "Aergia: Exploiting Packet Latency Slack in On-Chip Networks", ISCA 2010}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/pvc-qos_micro09.pdf|B. Grot, S.W. Keckler, O. Mutlu, "Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QoS Scheme for Networks-on-Chip", MICRO 2009}}
 +  * {{p401-grot.pdf|B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu, "Kilo-NOC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees", ISCA 2011}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/onchip-network-congestion-scalability_sigcomm2012.pdf|G. Nychis, C. Fallin, T. Moscibroda, O. Mutlu, and S. Seshan, "On-Chip Networks from a Networking Perspective: Congestion and Scalability in Many-core Interconnects," SIGCOMM, 2012}}
 +  * {{http://users.ece.cmu.edu/~omutlu/pub/noc-congestion_hotnets10.pdf|G. Nychis, C. Falling, T. Moscibroda, O. Mutlu, "Next Generation On-chip Networks: What Kind of Congestion Control Do We Need?" HotNets 2010}}
readings.1544084792.txt.gz · Last modified: 2019/02/12 16:33 (external edit)