Join us for our upcoming SAFARI Seminar:
Date: Thursday, September 14 2023, 14:00 Zurich time (CEST)
Where: ETZ E9
Title: An Introduction to Multimodal Understanding: Building Models to See, Hear, and Read the World
Modern language models like ChatGPT, Claude, LLAMA, and Bard demonstrate capabilities that are broadly useful, largely powered by improved techniques for scaling data and models. This current generation of models is language-only. What might be required for the next frontier of large-model intelligence: general understanding of any multimodal content? I will talk about the basics of models that handle rich multimodal content, and prior works that attempt to weave together image, text, audio, and video understanding. Finally, I touch on what might be possible opportunities for computer architecture researchers, and the strong synergy between these large models and the underlying hardware, that enables these models to be served at scale.
Skanda Koppula is a research engineer and technical team lead at Google DeepMind. He is broadly interested in multimodal learning (focusing on video, images, and language understanding), and previously has worked on topics in computer architecture and security. Previously, he worked with SAFARI and Professor Onur Mutlu at ETH Zürich, and studied at MIT, obtaining his BS/MEng.