User Tools

Site Tools


readings

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
readings [2018/11/28 13:59] – [Lecture 19a (28.11 Thu.)] yaglikcareadings [2019/12/12 09:02] geraldod
Line 465: Line 465:
    * {{https://people.inf.ethz.ch/omutlu/pub/mise-predictable_memory_performance-hpca13.pdf|L. Subramanian, V. Seshadri, Y. Kim, B. Jaiyen, and O. Mutlu, "MISE: Providing Performance Predictability and Improving Fairness in Shared Main Memory Systems," HPCA 2013}}    * {{https://people.inf.ethz.ch/omutlu/pub/mise-predictable_memory_performance-hpca13.pdf|L. Subramanian, V. Seshadri, Y. Kim, B. Jaiyen, and O. Mutlu, "MISE: Providing Performance Predictability and Improving Fairness in Shared Main Memory Systems," HPCA 2013}}
  
 +
 +===== Lecture 19b (28.11 Thu.) =====
 +=== Recommended (lecture 19b): ===
 +   * {{https://people.inf.ethz.ch/omutlu/pub/raidr-dram-refresh_isca12.pdf|J. Liu, B. Jaiyen, R. Veras, O. Mutlu, "RAIDR: Retention-Aware Intelligent DRAM Refresh," ISCA 2012}}
 +   * {{https://people.inf.ethz.ch/omutlu/pub/tldram_hpca13.pdf|D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, O. Mutlu, "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," HPCA 2013}}
 +   * {{2007.TileInterconnection.IEEEMicro.pdf|D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C. C. Miao, J. F. Brown III, and A. Agarwal, "On-Chip Interconnection Architecture of the Tile Processor". IEEE Micro 2007}}
 +
 +
 +===== Lecture 20 (29.11 Thu.) =====
 +=== Recommended (lecture 20): ===
 +  * {{https://people.inf.ethz.ch/omutlu/pub/acs_asplos09.pdf|M. Aater Suleman, Onur Mutlu, Moinuddin K. Qureshi, and Yale N. Patt. Accelerating critical section execution with asymmetric multi-core architectures. ASPLOS'09}}
 +  * {{bottleneck-identification-and-scheduling_asplos12.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, "Bottleneck Identification and Scheduling in Multithreaded Applications". ASPLOS'12}}
 +  * {{d7ce51c62671d5ffc1506786b0b7861ce00a.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, "Utility-Based Acceleration of Multithreaded Applications on Asymmetric CMPs". ISCA'13}}
 +  * {{22310236.pdf| Ed Grochowski, Ronny Ronen, John Shen, and Hong Wang, "Best of Both Latency and Throughput". ICCD 2004}}
 +  * {{lecture1-amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}}
 +  * {{05389044.pdf|J. M. Tendler, J. S. Dodson, J. S. Fields, Jr., H. Le, and B. Sinharoy, "POWER4 System Microarchitecture". IBM J R&D 2002}}
 +  * {{719990eaab63a6bfa2988b5fd57a03b13229.pdf| Ron Kalla, Balaram Sinharoy, and Joel M. Tendler, "IBM Power5 Chip: A Dual-Core Multithreaded Processor". IEEE Micro 2004}}
 +  * {{ :niagara_a_32-way_multithreaded_sparc_processor.pdf | P. Kongetira, K. Aingaran, and K. Olukotun, "Niagara: A 32-Way Multithreaded Sparc Processor", IEEE Micro 2005}}
 +  * {{p441-suleman.pdf|M. Aater Suleman, Onur Mutlu, Jose A. Joao, Khubaib, Yale N. Patt, "Data Marshaling for Multi-Core Architectures". ISCA'10, IEEE Micro Top Picks 2011}}
 +  * {{dk52.pdf|Rakesh Kumar, Keith I. Farkas, Norman P. Jouppi, Parthasarathy Ranganathan, and Dean M. Tullsen, "Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction". MICRO 2003}}
 +  * {{01431565.pdf| M. Annavaram, E. Grochowski, J. Shen, “Mitigating Amdahl’s Law Through EPI Throttling,” ISCA 2005}}
 +  * {{http://people.inf.ethz.ch/omutlu/pub/onur-Asymmetry-Everywhere-talk.pdf|O. Mutlu, "Asymmetry Everywhere (with Automatic Resource Management)," CRA Workshop on Advanced Computer Architecture Research 2010}}
 +  * {{http://users.ece.cmu.edu/~omutlu/pub/heterogeneous-block-architecture_iccd14.pdf|C. Fallin,, C. Wilkerson, O. Mutlu, "The Deterogeneous Block Architecture," ICCD'14}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/atlas_hpca10.pdf|Y. Kim, D. Han, O. Mutlu, M. Harchol-Balter, “ATLAS: A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers,” HPCA 2010}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/tcm_micro10.pdf|Y. Kim, M. Papamichel, O. Mutlu, M. Harchol-Balter, “Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior,” MICRO 2010}}
 +  * {{http://users.ece.cmu.edu/~omutlu/pub/noc-congestion_hotnets10.pdf|G. Nychis, C. Falling, T. Moscibroda, O. Mutlu, "Next Generation On-chip Networks: What Kind of Congestion Control Do We Need?" HotNets 2010}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/timber-fine-grained-dram-cache_ieee-cal12.pdf|J. Meza, J. Chang, H. Yoon, O. Mutlu, and P. Ranganathan, "Enabling Efficient and Scalable Hybrid Memories Using Fine-Granularity DRAM Cache Management" CAL 2012}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/rowbuffer-aware-caching_iccd12.pdf|H. Yoon, J. Meza, R. Ausavarungnirun, R. Harding, and O. Mutlu, "Row Buffer Locality Aware Caching Policies for Hybrid Memories" ICCD 2012}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/heterogeneous-reliability-memory-for-data-centers_dsn14.pdf|
Y. Luo, S. Govindan, B. Sharma, M. Santaniello, J. Meza, A. Kansal, J. Liu, B. Khessib, K. Vaid, O. Mutlu, "Characterizing Application Memory Error Vulnerability to Optimize Datacenter Cost via Heterogeneous-Reliability Memory," DSN 2014}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/tldram_hpca13.pdf|D. Lee, Y. Kim, V. Seshadri, J. Liu, L. Subramanian, O. Mutlu, "Tiered-Latency DRAM: A Low Latency and Low Cost DRAM Architecture," HPCA 2013}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/raidr-dram-refresh_isca12.pdf|J. Liu, B. Jaiyen, R. Veras, O. Mutlu, "RAIDR: Retention-Aware Intelligent DRAM Refresh," ISCA 2012}}
 +
 +===== Lecture 21 (05.12 Wed.) =====
 +=== Suggested (lecture 21): ===
 +  * {{https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html|NVIDIA, "CUDA C Programming Guide," Version 9.0, 2018}}
 +  * {{https://www.sciencedirect.com/science/book/9780128119860|D.B. Kirk and W.M. Hwu, "Programming Massively Parallel Processors. A Hands-on Approach," Third Edition, 2017}}
 +  * {{p140-fisher.pdf|J.A. Fisher, “Very Long Instruction Word architectures and the ELI-512,” ISCA 1983}}
 +  * {{Sung_2012.pdf|I.J. Sung, G.D. Liu, W.M. Hwu, "DL: A Data Layout Transformation System for Heterogeneous Computing," INPAR 2012}}
 +  * {{pseudo-randomly_interleaved_memory.pdf|B. R. Rau, "Pseudo-Randomly Interleaved Memory," ISCA 1991}}
 +  * {{Braak_2016.pdf|G.J.v.d. Braak, J. Gomez-Luna, J.M. Gonzalez-Linares, H. Corporaal, N. Guil, "Configurable XOR Hash Functions for Banked Scratchpad Memories in GPUs," IEEE TC, 2016}}
 +  * {{GomezLuna_2013.pdf|J. Gomez-Luna, J.M. Gonzalez-Linares, J.I. Benavides, N. Guil, "Performance Modeling of Atomic Additions on GPU Scratchpad Memory," IEEE TPDS, 2013}}
 +  * {{GomezLuna_2012.pdf|J. Gomez-Luna, J.M. Gonzalez-Linares, J.I. Benavides, N. Guil, "Performance Models for Asynchronous Data Transfers on Consumer Graphics Processing Units," JPDC, 2012}}
 +  * {{GomezLuna_2017.pdf|J. Gomez-Luna, I. E. Hajj, L. Chang, V. Garcia-Flores, S. G. de Gonzalo, T. B. Jablin, A. J. Peña, W. Hwu, "Chai: Collaborative heterogeneous applications for integrated-architectures," ISPASS 2017}}
 +
 +===== Lecture 22 (6.12 Thu.) =====
 +=== Required (lecture 22): ===
 +  * {{lecture1-amdahl.pdf|G. M. Amdahl, "Validity of the single processor approach to achieving large scale computing capabilities," AFIPS 1967}}
 +  * {{lamport.pdf|L. Lamport, "How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs," IEEE Transactions on Computers, 1979}}
 +  * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," ISCA 1984}}
 +=== Described in detail during lecture 22: ===
 +  * {{using_cache_memory_to_reduce_processor-memory_traffic.pdf|J. R. Goodman, "Using cache memory to reduce processor-memory traffic," ISCA 1983}}
 +  * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," ISCA 1984}}
 +  * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," IEEE Trans. Computers, 1978}}
 +  * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," ISCA 2003}}
 +=== Recommended (lecture 22): ===
 +  * {{flynn.pdf|M. J. Flynn, "Very High-Speed Computing Systems," Proc. of IEEE, 1966}}
 +  * {{multiprocessors-multicomputers.pdf|M. D. Hill, N. P. Jouppi, G. S. Sohi, "Multiprocessors and Multicomputers,” pp. 551-560 in Readings in Computer Architecture.}}
 +  * {{memory_consistency_and_event_ordering_in_scalable_shared-memory_multiprocessors.pdf|K. Gharachorloo, D. 
 +Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, "Memory Consistency and Event Ordering in Scalable Shared-Memory Multiprocessors," ISCA 1990}}
 +  * {{two_techniques_to_enhance_the_performanc_of_memory_consistency_models.pdf|K. Gharachorloo, A. Gupta, and J. Hennessy, "Two Techniques to Enhance the Performance of Memory Consistency Models," ICPP 1991}}
 +  * {{bulksc_bulk_enforcement_of_sequential_consistency.pdf|L. Ceze, J. Tuck, P. Montesinos, and J. Torrellas, "BulkSC: bulk enforcement of sequential consistency," ISCA 2007}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/ThyNVM-transparent-crash-consistency-for-persistent-memory_micro15.pdf|J. Ren, J. Zhao, S. Khan, J., Y. Wu, and O. Mutlu, "ThyNVM: Enabling Software-Transparent Crash Consistency in Persistent Memory Systems," MICRO 2015}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/NVMove-byte-based-persistence-tool_inflow16.pdf|H. Chauhan, I. Calciu, V. Chidambaram, E. Schkufza, O. Mutlu, and P. Subrahmanyam, "NVMove: Helping Programmers Move to Byte-Based Persistence," INFLOW 2016}}
 +  * {{a_new_solution_to_coherence_problems_in_multicache_systems.pdf|L. M. Censier and P. Feautrier, "A new solution to coherence problems in multicache systems," IEEE Trans. Computers, 1978}}
 +  * {{using_cache_memory_to_reduce_processor-memory_traffic.pdf|J. R. Goodman, "Using cache memory to reduce processor-memory traffic," ISCA 1983}}
 +  * {{the_sgi_origin_a_ccnuma_highly_scalable_server.pdf|J. Laudon and D. Lenoski, "The SGI Origin: A ccNUMA Highly Scalable Server," ISCA 1997}}
 +  * {{token_coherence_decoupling_performance_and_correctness.pdf|M. Martin, M. D. Hill, and D. A. Wood, "Token coherence: decoupling performance and correctness," ISCA 2003}}
 +  * {{on_the_inclusion_properties_for_multi-level_cache_hierarchies.pdf|J. Baer and W. Wang, "On the inclusion properties for multi-level cache hierarchies," ISCA 1988}}
 +  * {{designofacomputer_cdc6600.pdf|J. E. Thornton, "CDC 6600: Design of a Computer,” 1970}}
 +  * {{a_pipelined_shared_resource_mimd_computer.pdf | B. J. Smith, "A Pipelined, Shared Resource MIMD Computer", ICPP 1978}}
 +  * {{a_new_method_of_solving_numerical_equations_of_all_orders_by_continuous_.pdf|W. G. Horner, "A new method of solving numerical equations of all orders, by continuous approximation," Philosophical Transactions of the Royal Society, 1819}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/acs_asplos09.pdf|M. A. Suleman, O. Mutlu, M. K. Qureshi, and Y. N. Patt, "Accelerating critical section execution with asymmetric multi-core architectures," ASPLOS'09}}
 +  * {{co-operating_sequential_processes.pdf|E. W. Dijkstra, "Cooperating Sequential Processes," 1965}}
 +  * {{culler_parcomparch_5.1.pdf|Culler and Singh, Parallel Computer Architecture, Chapter 5.1 (pp 269–283)}}
 +  * {{culler_parcomparch_5.3.pdf|Culler and Singh, Parallel Computer Architecture, Chapter 5.3 (pp 291-305)}}
 +  * {{ph_computerorganizationanddesignthehardwaresoftwareinterface5th_5.10.pdf|P&H, Computer Organization and Design, Chapter 5.10 (pp 466-470)}}
 +
 +===== Lecture 23 (12.12 Wed.) =====
 +=== Described in detail during lecture 23): ===
 +  * {{bless_isca09.pdf|T. Moscibroda and O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks", ISCA 2009}}
 +
 +=== Suggested (lecture 23): ===
 +  * {{app-aware-noc_micro09.pdf|R. Das, O. Mutlu, T. Moscibroda, and C. R. Das, "Application-Aware Prioritization Mechanisms for On-Chip Networks", MICRO 2009}}
 +  * {{ultrasparc.pdf|M. Shah, J. Barreh, J. Brooks, R. Golla, G. Grohoski, N. Gura, R. Hetherington, P. Jordan, M. Luttrell, C. Olson, B. Saha, D. Sheahan, L. Spracklen, and A. Wynn, "UltraSPARC T2: A Highly-Threaded, Power-Efficient, SPARC SOC", ASSCC 2007}}
 +  * {{7d2822e9b7fcd60f147823478b59fcf7569e.pdf|J. H. Patel, "Processor-memory interconnections for multiprocessors", ISCA 1979}}
 +  * {{Ultracomputer.pdf|A. Gottlieb, R. Grishman, C. P. Kruskal, K. P. McAuliffe, L. Rudolph, and M. Snir, "The NYU Ultracomputer - Designing an MIMD Shared Memory Parallel Computer", IEEE Trans. on Comp. 1983}}
 +  * {{hierarchical-rings-with-deflection_sbacpad14.pdf|R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. Loh, and O. Mutlu, "Design and Evaluation of Hierarchical Rings with Deflection Routing", SBAC-PAD 2014}}
 +  * {{p272-leiserson.pdf|C.E. Leiserson, Z.S. Abuhamdeh, D.C. Douglas, C.R. Feynman, M.N. Ganmukhi, J.V. Hill, D. Hillis, B.C. Kuszmaul, M.A. St. Pierre, D.S. Wells, M.C. Wong, S.-W. Yang, R. Zak, "The Network Architecture of the Connection Machine CM-5", SPAA 1992}}
 +  * {{seitz_cacm_1985.pdf|C. L. Seitz, "The Cosmic Cube", CACM 1985}}
 +  * {{L8-TurnModel-ISCA92.pdf|C. J. Glass and L. M. Ni, "The Turn Model for Adaptive Routing", ISCA 1992}}
 +  * {{maze-routing_nocs15.pdf|M. Fattah, A. Airola, R. Ausavarungnirun, N. Mirzaei, P. Liljeberg, J. Plosila, S. Mohammadi, T. Pahikkala, O. Mutlu, and H. Tenhunen, "A Low-Overhead, Fully-Distributed, Guaranteed-Delivery Routing Algorithm for Faulty Network-on-Chips", NOCS 2015}}
 +  * {{Baran64.pdf|P. Baran, "On Distributed Communications Networks", IEEE Trans. Comm., 1964}}
 +  * {{bufferless_springer14.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu, "Bufferless and Minimally-Buffered Deflection Routing", Routing Algorithms in Networks-on-Chip (invited) 2014}}
 +  * {{virtual+channel.pdf|W. J. Dally, "Virtual Channel Flow Control", ISCA 1990}}
 +
 +
 +
 +===== Lecture 24 (13.12 Thu.) =====
 +=== Described in detail during lecture 24: ===
 +  * {{05749724.pdf|C. Fallin, C. Craik, and O. Mutlu, "CHIPPER: A Low-Complexity Bufferless Deflection Router", HPCA 2011}}
 +  * {{bufferless_springer14.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu, "Bufferless and Minimally-Buffered Deflection Routing", Routing Algorithms in Networks-on-Chip (invited book chapter), 2014}}
 +  * {{06209256.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, and O. Mutlu, "MinBD: Minimally-Buffered Deflection Routing for Energy-Efficient Interconnect", NOCS 2012}}
 +
 +=== Suggested (lecture 24): ===
 +  * {{app-aware-noc_micro09.pdf|R. Das, O. Mutlu, T. Moscibroda, and C. R. Das, "Application-Aware Prioritization Mechanisms for On-Chip Networks", MICRO 2009}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/hetero-adaptive-source-throttling_sbacpad12.pdf|K. Chang, R. Ausavarungnirun, C. Fallin, and O. Mutlu, "HAT: Heterogeneous Adaptive Throttling for On-Chip Networks,"
 +SBAC-PAD, 2012}}
 +  * {{bless_isca09.pdf|T. Moscibroda and O. Mutlu, "A Case for Bufferless Routing in On-Chip Networks", ISCA 2009}}
 +  * {{06970669.pdf|R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H. Loh, and O. Mutlu, "Design and Evaluation of Hierarchical Rings with Deflection Routing", SBAC-PAD 2014}}
 +  * {{1-s2.0-s0167819116000399-main.pdf|R. Ausavarungnirun, C. Fallin, X. Yu, K. Chang, G. Nazario, R. Das, G. H.Loh, and O. Mutlu, "A Case for Hierarchical Rings with Deflection Routing: An Energy-Efficient On-Chip Communication Substrate", PARCO 2016}}
 +  * {{p106-das.pdf|R. Das, O. Mutlu, T. Moscibroda, and C.R. Das, "Aergia: Exploiting Packet Latency Slack in On-Chip Networks", ISCA 2010}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/pvc-qos_micro09.pdf|B. Grot, S.W. Keckler, O. Mutlu, "Preemptive Virtual Clock: A Flexible, Efficient, and Cost-effective QoS Scheme for Networks-on-Chip", MICRO 2009}}
 +  * {{p401-grot.pdf|B. Grot, J. Hestness, S. W. Keckler, and O. Mutlu, "Kilo-NOC: A Heterogeneous Network-on-Chip Architecture for Scalability and Service Guarantees", ISCA 2011}}
 +  * {{https://people.inf.ethz.ch/omutlu/pub/onchip-network-congestion-scalability_sigcomm2012.pdf|G. Nychis, C. Fallin, T. Moscibroda, O. Mutlu, and S. Seshan, "On-Chip Networks from a Networking Perspective: Congestion and Scalability in Many-core Interconnects," SIGCOMM, 2012}}
 +  * {{http://users.ece.cmu.edu/~omutlu/pub/noc-congestion_hotnets10.pdf|G. Nychis, C. Falling, T. Moscibroda, O. Mutlu, "Next Generation On-chip Networks: What Kind of Congestion Control Do We Need?" HotNets 2010}}
readings.txt · Last modified: 2019/12/12 09:02 by 127.0.0.1