readings
Differences
This shows you the differences between two versions of the page.
Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
readings [2020/12/05 10:23] – [Lecture 18 (26.11 Thu.)] loisor | readings [2021/01/04 08:45] (current) – [Lecture 26 (31.12 Thu.)] firtinac | ||
---|---|---|---|
Line 640: | Line 640: | ||
* {{https:// | * {{https:// | ||
in the IBM POWER6 Microprocessor, | in the IBM POWER6 Microprocessor, | ||
- | * {{https:// | ||
* {{https:// | * {{https:// | ||
* {{https:// | * {{https:// | ||
Line 649: | Line 648: | ||
* {{https:// | * {{https:// | ||
* {{https:// | * {{https:// | ||
- | * {{https:// | + | |
+ | | ||
* {{https:// | * {{https:// | ||
* {{https:// | * {{https:// | ||
Line 656: | Line 656: | ||
=== Described in detail during lecture 19b: === | === Described in detail during lecture 19b: === | ||
- | * {{https:// | + | * {{lecture1-amdahl.pdf| G. M. Amdahl, " |
| | ||
=== Suggested (lecture 19b): === | === Suggested (lecture 19b): === | ||
- | | + | * {{flynn_1966.pdf|M.J. Flynn, “Very high-speed computing systems,” Proc. of IEEE 1966}} |
- | * {{https:// | + | * {{multiprocessors-multicomputers.pdf| M. D. Hill, N. P. Jouppi, G. S. Sohi, " |
- | * {{https:// | + | |
- | * {{https:// | + | |
* {{|M. D. Hill, N. P. Jouppi, G. S. Sohi, " | * {{|M. D. Hill, N. P. Jouppi, G. S. Sohi, " | ||
- | * {{https:// | + | |
- | * {{https:// | + | * {{a_low-overhead_coherence_solution_for_multiprocessors_with_private_cache_memories.pdf|M. S. Papamarcos and J. H. Patel, "A low-overhead coherence solution for multiprocessors with private cache memories," |
- | * {{https:// | + | * {{culler_parcomparch_5.1.pdf| Culler and Singh, Parallel Computer Architecture, |
+ | * {{culler_parcomparch_5.3.pdf|Culler and Singh, Parallel Computer Architecture, | ||
+ | * {{p_h_ch5.pdf |P&H, Computer Organization and Design, | ||
=== Mentioned (lecture 19b): === | === Mentioned (lecture 19b): === | ||
* {{https:// | * {{https:// | ||
- | * {{https:// | + | * {{pipelined1978smith.pdf| B. Smith, “A pipelined, shared resource MIMD computer, |
* {{https:// | * {{https:// | ||
- | * {{https:// | + | * {{bottleneck-identification-and-scheduling_asplos12.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, |
- | * {{https:// | + | * {{d7ce51c62671d5ffc1506786b0b7861ce00a.pdf| Jose A. Joao, M. Aater Suleman, Onur Mutlu, and Yale N. Patt, " |
* {{https:// | * {{https:// | ||
Line 704: | Line 704: | ||
* {{on_the_inclusion_properties_for_multi-level_cache_hierarchies.pdf|J. Baer and W. Wang, "On the inclusion properties for multi-level cache hierarchies," | * {{on_the_inclusion_properties_for_multi-level_cache_hierarchies.pdf|J. Baer and W. Wang, "On the inclusion properties for multi-level cache hierarchies," | ||
* {{https:// | * {{https:// | ||
+ | ===== Lecture 22 (27.12 Sun.) ===== | ||
+ | === Described in detail during lecture 22 === | ||
+ | * {{p239-gottlieb.pdf| A. Gottlieb, R. Grishman, C. P. Kruskal, K, P. McAuliffe, L. Rudolph, M. Snir, "The NYU Ultracomputer - Designing an MIMD Shared Memory Parallel Computer", | ||
+ | * {{0211027.pdf| L. G. Valiant, "A Scheme for Fast Parallel Communication", | ||
+ | * {{https:// | ||
+ | * {{chipper_hpca11.pdf|C. Fallin, C. Craik, and O. Mutlu, " | ||
+ | * {{bufferless-and-minimally-buffered-deflection-routing_springer14.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, | ||
+ | * {{minimally-buffered-deflection-router_nocs12.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, | ||
+ | === Suggested (lecture 22): === | ||
+ | * {{p168-patel.pdf| J. Patel, " | ||
+ | | ||
+ | * {{https:// | ||
+ | * {{p272-leiserson.pdf| C. E. Leiserson, Z. S. Abuhamdeh, D. C. Douglas, C. R. Feynman, M. N. Ganmukhi, J. V. Hill, D. Hillis, B. C. Kuszmaul, M. A. St. Pierre, D. S. Wells, M. C. Wong, Shaw-Wen Yang, R. Zak, "The network architecture of the Connection Machine CM-5", SPAA 1992}} | ||
+ | * {{p22-seitz.pdf| C. L. Seitz, "The cosmic cube", CACM 1985}} | ||
+ | * {{p278-glass.pdf| C. J. Glass, L. M. Ni, "The turn model for adaptive routing", | ||
+ | * {{p263-valiant.pdf| L.G. Valiant, G.J. Brebner, " | ||
+ | * {{https:// | ||
+ | * {{P2626.pdf| P. Baran, "On Distributed Communication Networks", | ||
+ | * {{app-aware-noc_micro09.pdf|R. Das, O. Mutlu, T. Moscibroda, and C.R. Das, " | ||
+ | ===== Lecture 23 (28.12 Mon.) ===== | ||
+ | === Described in detail during lecture 23 === | ||
+ | * {{chipper_hpca11.pdf|C. Fallin, C. Craik, and O. Mutlu, " | ||
+ | * {{bufferless-and-minimally-buffered-deflection-routing_springer14.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, | ||
+ | * {{minimally-buffered-deflection-router_nocs12.pdf|C. Fallin, G. Nazario, X. Yu, K. Chang, R. Ausavarungnirun, | ||
+ | === Suggested (lecture 23): === | ||
+ | | ||
+ | * {{https:// | ||
+ | * {{P2626.pdf| P. Baran, "On Distributed Communication Networks", | ||
+ | ===== Lecture 24 (29.12 Tue.) ===== | ||
+ | === Suggested (lecture 24): === | ||
+ | * {{Flynn_1966.pdf|M.J. Flynn, “Very high-speed computing systems,” Proc. of IEEE 1966}} | ||
+ | * {{p140-fisher.pdf|J.A.Fisher, | ||
+ | * {{p63-russell.pdf|R.M. Russell, "The CRAY-1 computer system,” CACM 1978}} | ||
+ | * {{p74-rau.pdf|B.R. Rau, " | ||
+ | * {{mmx_technology_1996.pdf|A. Peleg and U. Weiser, "MMX technology extension to the Intel architecture, | ||
+ | * {{04523358.pdf|E. Lindholm, J. Nickolls, S. Oberman, and J. Montrym, " | ||
+ | * {{30470407.pdf| W.W.L. Fung, I. Sham, G. Yuan, and T.M. Aamodt, " | ||
+ | ===== Lecture 25 (30.12 Wed.) ===== | ||
+ | === Suggested (lecture 25): === | ||
+ | * {{cuda_c_programming_guide.pdf|NVIDIA, | ||
+ | * {{2013_programming_massively_parallel_processors_a_hands-on_approach_2nd.pdf| Hwu and Kirk , “Programming Massively Parallel Processors ” 2017}} | ||
+ | * {{p140-fisher.pdf|Fisher , “Very Long Instruction Word Architectures and the ELI-512,” ISCA 1983}} | ||
+ | * {{sung_2012.pdf|I. Sung, G. D. Liu, and W. W. Hwu , “DL: A data layout transformation system for heterogeneous computing ,” INPAR 2012}} | ||
+ | * {{10.1.1.12.7149.pdf|B. R. Rau , “Pseudo-randomly interleaved memory ,” ISCA 1991}} | ||
+ | * {{configurable_xor_hash_functions_for_banked_scratchpad_memories_in_gpus.pdf|G. Braak, J. Gomez-Luna, J.M. Gonzalez-Linares, | ||
+ | * {{gomezluna_2013.pdf|J. Gomez-Luna, J.M. Gonzalez-Linares, | ||
+ | * {{gomezluna_2012.pdf|J. Gomez-Luna, J.M. Gonzalez-Linares, | ||
+ | * {{ransac-publication.pdf|M.A. Fisher, and R.C. Bolles ”Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography“, | ||
+ | |||
+ | ===== Lecture 26 (31.12 Thu.) ===== | ||
+ | === Suggested (lecture 26): === | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | * {{https:// | ||
+ | |||
+ | ===== Lecture 27 (4.01 Mon.) ===== | ||
+ | === Suggested (lecture 27): === | ||
+ | * {{1982-kung-why-systolic-architecture.pdf | H.T. Kung, “Why Systolic Architectures?, | ||
+ | * {{p1-Jouppi.pdf | N. Jouppi, C. Young, N. Patil, D. Patterson, G. Agrawal, R. Bajwa, S. Bates, S. Bhatia, N. Boden, A. Borchers and R. Boyle, “In-datacenter Performance Analysis of a Tensor Processing Unit,” ISCA 2017}} | ||
+ | * {{4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf | A. Krizhevsky, I. Sutskever, G.E. Hinton, " | ||
+ | * {{GoogLeNet.pdf | C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich, "Going Deeper with Convolutions," | ||
+ | * {{resnet.pdf | K. He, X. Zhang, S. Ren, J. Sun, “Deep Residual Learning for Image Recognition, | ||
+ | * {{p346-annaratone.pdf | M. Annaratone, E. Arnould, T. Gross, H.T. Kung, and M.S. Lam, “Warp Architecture and Implementation, | ||
+ | * {{ADA184329.pdf | M. Annaratone, E. Arnould, T. Gross, H.T. Kung, M. Lam, O. Menzilcioglu, | ||
+ | * {{Smith-1982-Decoupled-Access-Execute-Computer-Architectures.pdf | J.E. Smith, “Decoupled Access/ | ||
+ | * {{p199-smith.pdf | J.E. Smith, G. E. Dermer, B. D. Vanderwarn, S. D. Klinger, and C. M. Rozewski, "The ZS-1 Central Processor, | ||
+ | * {{DynamicScheduling.pdf | J.E. Smith, “Dynamic Instruction Scheduling and the Astronautics ZS-1,” IEEE Computer, 1989}} | ||
+ | * {{microarchitecture_pentium4_2001.pdf | G. Hinton, D. Sager, M. Upton, and D. Boggs, "The Microarchitecture of the Pentium® 4 Processor," | ||
+ | * {{mutlu_hpca_2003.pdf | O. Mutlu, J. Stark, C. Wilkerson, and Y.N. Patt, " | ||
+ | |||
+ | ===== Lecture 28 (4.01 Mon.) ===== | ||
+ | === Suggested (lecture 28): === | ||
+ | * {{parallel1964thornton.pdf | J. Thornton, “Parallel Operation in the Control Data 6600,” AFIPS 1964.}} | ||
+ | * {{pipelined1978smith.pdf | B.J. Smith, “A Pipelined, Shared Resource MIMD Computer, | ||
+ | * {{kongetira05_niagara.pdf | P. Kongetira, A. Kathirgamar, | ||
+ | * {{hep_burton.pdf | B.J. Smith, " | ||
+ | * {{tera_alverson.pdf | R. Alverson, D. Callahan, D. Cummings, B. Koblenz, A. Porterfield, |
readings.1607163787.txt.gz · Last modified: 2020/12/05 10:23 by loisor