User Tools

Site Tools


readings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
readings [2018/12/06 08:26] – [Lecture 21 (05.12 Wed.)] alsermreadings [2019/12/12 09:02] geraldod
Line 479: Line 479:
   * {{d7ce51c62671d5ffc1506786b0b7861ce00a.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, "Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs". ISCA'13}}   * {{d7ce51c62671d5ffc1506786b0b7861ce00a.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, "Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs". ISCA'13}}
   * {{22310236.pdf| Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang, "Best of Both Latency and Throughput". ICCD 2004}}   * {{22310236.pdf| Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang, "Best of Both Latency and Throughput". ICCD 2004}}
-  * {{amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}}+  * {{lecture1-amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}}
   * {{05389044.pdf|J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy, "POWER4 System Microarchitecture". IBM J R&D 2002}}   * {{05389044.pdf|J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy, "POWER4 System Microarchitecture". IBM J R&D 2002}}
   * {{719990eaab63a6bfa2988b5fd57a03b13229.pdf| Ron Kalla, Balaram Sinharoy, and Joel M. Tendler, "IBM Power5 Chip: A Dual-Core Multithreaded Processor". IEEE Micro 2004}}   * {{719990eaab63a6bfa2988b5fd57a03b13229.pdf| Ron Kalla, Balaram Sinharoy, and Joel M. Tendler, "IBM Power5 Chip: A Dual-Core Multithreaded Processor". IEEE Micro 2004}}
Line 511: Line 511:
 ===== Lecture 22 (6.12 Thu.) ===== ===== Lecture 22 (6.12 Thu.) =====
 === Required (lecture 22): === === Required (lecture 22): ===
-  * {{amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}}+  * {{lecture1-amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}}
   * {{lamport.pdf|L. Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," IEEE Transactions on Computers, 1979}}   * {{lamport.pdf|L. Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," IEEE Transactions on Computers, 1979}}
   * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," ISCA 1984}}   * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," ISCA 1984}}
Line 519: Line 519:
   * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," IEEE Trans. Computers, 1978}}   * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," IEEE Trans. Computers, 1978}}
   * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," ISCA 2003}}   * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," ISCA 2003}}
-=== Suggested (lecture 22): ===+=== Recommended (lecture 22): ===
   * {{flynn.pdf|M. J. Flynn, "Very High-Speed Computing Systems," Proc. of IEEE, 1966}}   * {{flynn.pdf|M. J. Flynn, "Very High-Speed Computing Systems," Proc. of IEEE, 1966}}
   * {{multiprocessors-multicomputers.pdf|M. D. Hill, N. P. Jouppi, G. S. Sohi, "Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture.}}   * {{multiprocessors-multicomputers.pdf|M. D. Hill, N. P. Jouppi, G. S. Sohi, "Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture.}}
Line 541: Line 541:
   * {{culler_parcomparch_5.3.pdf|Culler and Singh, Parallel Computer Architecture, Chapter 5.3 (pp 291-305)}}   * {{culler_parcomparch_5.3.pdf|Culler and Singh, Parallel Computer Architecture, Chapter 5.3 (pp 291-305)}}
   * {{ph_computerorganizationanddesignthehardwaresoftwareinterface5th_5.10.pdf|P&H, Computer Organization and Design, Chapter 5.10 (pp 466-470)}}   * {{ph_computerorganizationanddesignthehardwaresoftwareinterface5th_5.10.pdf|P&H, Computer Organization and Design, Chapter 5.10 (pp 466-470)}}
 +
 +===== Lecture 23 (12.12 Wed.) =====
 +=== Described in detail during lecture 23): ===
 +  * {{bless_isca09.pdf|T. Moscibroda and O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks", ISCA 2009}}
 +
 +=== Suggested (lecture 23): ===
 +  * {{app-aware-noc_micro09.pdf|R. Das, O. Mutlu, T. Moscibroda, and C. R. Das, "Application-Aware Prioritization Mechanisms for On-Chip Networks", MICRO 2009}}
 +  * {{ultrasparc.pdf|M. Shah, J. Barreh, J. Brooks, R. Golla, G. Grohoski, N. Gura, R. Hetherington, P. Jordan, M. Luttrell, C. Olson, B. Saha, D. Sheahan, L. Spracklen, and A. Wynn, "UltraSPARC T2: A Highly-Threaded, Power-Efficient, SPARC SOC", ASSCC 2007}}
 +  * {{7d2822e9b7fcd60f147823478b59fcf7569e.pdf|J. H. Patel, "Processor-memory interconnections for multiprocessors", ISCA 1979}}
 +  * {{Ultracomputer.pdf|A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer - Designing an MIMD Shared Memory Parallel Computer", IEEE Trans. on Comp. 1983}}
 +  * {{hierarchical-rings-with-deflection_sbacpad14.pdf|R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. Loh, and O. Mutlu, "Design and Evaluation of Hierarchical Rings with Deflection Routing", SBAC-PAD 2014}}
 +  * {{p272-leiserson.pdf|C.E. Leiserson, Z.S. Abuhamdeh, D.C. Douglas, C.R. Feynman, M.N. Ganmukhi, J.V. Hill, D. Hillis, B.C. Kuszmaul, M.A. St. Pierre, D.S. Wells, M.C. Wong, S.-W. Yang, R. Zak, "The Network Architecture of the Connection Machine CM-5", SPAA 1992}}
 +  * {{seitz_cacm_1985.pdf|C. L. Seitz, "The Cosmic Cube", CACM 1985}}
 +  * {{L8-TurnModel-ISCA92.pdf|C. J. Glass and L. M. Ni, "The Turn Model for Adaptive Routing", ISCA 1992}}
 +  * {{maze-routing_nocs15.pdf|M. Fattah, A. Airola, R. Ausavarungnirun, N. Mirzaei, P. Liljeberg, J. Plosila, S. Mohammadi, T. Pahikkala, O. Mutlu, and H. Tenhunen, "A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-Chips", NOCS 2015}}
 +  * {{Baran64.pdf|P. Baran, "On Distributed Communications Networks", IEEE Trans. Comm., 1964}}
 +  * {{bufferless_springer14.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu, "Bufferless and Minimally-Buffered Deflection Routing", Routing Algorithms in Networks-on-Chip (invited) 2014}}
 +  * {{virtual+channel.pdf|W. J. Dally, "Virtual Channel Flow Control", ISCA 1990}}
 +
 +
 +
 +===== Lecture 24 (13.12 Thu.) =====
 +=== Described in detail during lecture 24: ===
 +  * {{05749724.pdf|C. Fallin, C. Craik, and O. Mutlu, "CHIPPER: A Low-Complexity Bufferless Deflection Router", HPCA 2011}}
 +  * {{bufferless_springer14.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu, "Bufferless and Minimally-Buffered Deflection Routing", Routing Algorithms in Networks-on-Chip (invited book chapter), 2014}}
 +  * {{06209256.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu, "MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect", NOCS 2012}}
 +
 +=== Suggested (lecture 24): ===
 +  * {{app-aware-noc_micro09.pdf|R. Das, O. Mutlu, T. Moscibroda, and C. R. Das, "Application-Aware Prioritization Mechanisms for On-Chip Networks", MICRO 2009}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/hetero-adaptive-source-throttling_sbacpad12.pdf|K. Chang, R. Ausavarungnirun, C. Fallin, and O. Mutlu, "HAT: Heterogeneous Adaptive Throttling for On-Chip Networks,"
 +SBAC-PAD, 2012}}
 +  * {{bless_isca09.pdf|T. Moscibroda and O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks", ISCA 2009}}
 +  * {{06970669.pdf|R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H. Loh, and O. Mutlu, "Design and Evaluation of Hierarchical Rings with Deflection Routing", SBAC-PAD 2014}}
 +  * {{1-s2.0-s0167819116000399-main.pdf|R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H.Loh, and O. Mutlu, "A Case for Hierarchical Rings with Deflection Routing: An Energy-Efficient On-Chip Communication Substrate", PARCO 2016}}
 +  * {{p106-das.pdf|R. Das, O. Mutlu, T. Moscibroda, and C.R. Das, "Aergia: Exploiting Packet Latency Slack in On-Chip Networks", ISCA 2010}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/pvc-qos_micro09.pdf|B. Grot, S.W. Keckler, O. Mutlu, "Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QoS Scheme for Networks-on-Chip", MICRO 2009}}
 +  * {{p401-grot.pdf|B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu, "Kilo-NOC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees", ISCA 2011}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/onchip-network-congestion-scalability_sigcomm2012.pdf|G. Nychis, C. Fallin, T. Moscibroda, O. Mutlu, and S. Seshan, "On-Chip Networks from a Networking Perspective: Congestion and Scalability in Many-core Interconnects," SIGCOMM, 2012}}
 +  * {{http://users.ece.cmu.edu/~omutlu/pub/noc-congestion_hotnets10.pdf|G. Nychis, C. Falling, T. Moscibroda, O. Mutlu, "Next Generation On-chip Networks: What Kind of Congestion Control Do We Need?" HotNets 2010}}
readings.txt · Last modified: 2019/12/12 09:02 by 127.0.0.1