Networks of Machine Learning, for Machine Learning, by Machine Learning

September 22-24, 2021

Attend: Zoom Link (You may need to authenticate through your institution’s Zoom account before joining)
Watch Workshop Videos: YouTube Channel (From our YouTube Channel select the “Live Now” stream)


The main driver of new traffic in current backbones has been data shared for Machine Learning. More data has been created in the past five years than in the previous 5,000 years of humanity and the trend is accelerating. Computation, storage, and communication are no longer interdependent, they are entirely merged. This highly interactive workshop explores this new reality and its technical underpinnings. Speakers and participants from industry and academia, who are leading these momentous changes, will provide perspectives on the present and prospective changes for the future.


Manya Ghobadi (Massachusetts Institute of Technology), Muriel Médard (Massachusetts Institute of Technology)


Aditya Akella (University of Texas at Austin), Ganesh Ananthanarayanan (Microsoft Research), Pavan Balaji (Facebook), Paolo Costa (Microsoft Research), Yonina Eldar (Weizmann Institute of Science), Nadia Fawaz (Pinterest), Brighten Godfrey (University of Illinois at Urbana-Champaign), H. Vincent Poor (Princeton University), Ariela Zeira (Intel Labs)

(All times are Pacific Time)

Day 1 (Wednesday, September 22)

8 am – 8:05 am: Welcome and Opening Remarks, R. Srikant (University of Illinois at Urbana-Champaign)

8:05 am – 8:10 am: Workshop Overview, Muriel Médard (Massachusetts Institute of Technology), Manya Ghobadi (Massachusetts Institute of Technology)

Session I: Networks of Machine Learning
Moderators: Muriel Médard (Massachusetts Institute of Technology) and Manya Ghobadi (Massachusetts Institute of Technology)

8:10 am – 8:30 am: Learning at the Wireless Edge, H. Vincent Poor (Princeton University)

Abstract: Wireless networks can be used as platforms for machine learning, taking advantage of the fact that data is often collected at the edges of networks, and also mitigating the latency and privacy concerns that backhauling data to the cloud can entail. Focusing primarily on federated learning, this talk will discuss several issues arising in this context including the effects of wireless transmission on learning performance, the allocation of wireless resources to learning, and privacy leakage.

H. Vincent Poor

Speaker: H. Vincent Poor is the Michael Henry Strater University Professor at Princeton University, where he is engaged in research in information theory, machine learning and network science, and their applications in wireless networks, energy systems, and related fields, including recently modeling the spread of the COVID-19 epidemic. He is a member of the National Academy of Engineering and the National Academy of Sciences, and a foreign member of the Chinese Academy of Sciences and the Royal Society. Recognition of his work includes the 2017 IEEE Alexander Graham Medal and honorary doctorates from universities in Asia, Europe, and North America.

 8:30 am – 8:40 am: Discussion

8:40 am – 9 am: Challenges in Machine Learning and Way forward, Ariela Zeira (Intel Labs)

Abstract: This presentation will discuss key challenges in current Machine Learning methods and explore Hyper Dimensional Computing as a promising new paradigm for energy-efficient, robust and fast alternative to standard machine learning. We will discuss the potential benefits of Hyper Dimensional Computing and the technology gaps and challenges that we need to address to reach its full potential.

Ariela Zeira

Speaker: Ariela Zeira joined Intel in 2016 and in early 2020 moved to Intel Labs where she drives forward-looking research on Edge Networking and Compute. Prior to joining Intel, Ariela served as a Vice President of Engineering at InterDigital. She has a proven track record of leading technology and product teams in multiple areas including wireless communication and AI, and on numerous occasions being the first to introduce new technologies to the market. Ariela obtained her B.Sc. and M.Sc. from the Technion, Israel and her PhD at Yale University, all in Electrical Engineering. She holds over 200 issued US patents.

 9 am – 10 am: Discussion and Lunch

10 am – 10:20 am: Inclusive Search and Recommendations, Nadia Fawaz (Pinterest)

Abstract: Machine learning powers many advanced search and recommendation systems, and user experience strongly depends on how well ML systems perform across all data segments. This performance can be impacted by biases, which can lead to a subpar experience for subsets of users, content providers, or applications. Biases may arise at different stages in ML systems, from existing societal biases in the data, to biases introduced by the data collection or modeling processes. These biases may impact the performance of various components of ML systems, from offline training, to evaluation, to online serving in production systems. We will describe sources of bias in ML technology, why addressing bias is important, and techniques to mitigate bias, with examples from our Inclusive AI work at Pinterest for search and recommendations. Mitigating bias in machine learning systems is crucial to successfully achieve our mission to “bring everyone the inspiration to create a life they love.”

Nadia Fawaz

Speaker: Nadia Fawaz is an Applied Research Scientist and the technical lead for Inclusive AI at Pinterest. Her research and engineering interests include machine learning for personalization, AI fairness, and data privacy and her and aims at bridging theory and practice. She was named one of the 100 Brilliant Women in AI Ethics 2021 and her work on inclusive AI has been featured in news outlets such as CBS, The Wall Street Journal, Fast Company, and Vogue Business. Prior to Pinterest, she was a Staff Software Engineer in machine learning at LinkedIn, a Principal Research Scientist at Technicolor Research Lab, and a postdoctoral researcher at the Massachusetts Institute of Technology. She received her Ph.D. and her Diplome d’ingenieur (M.Sc.), both in EECS, from Telecom ParisTech and EURECOM, France.

 10:20 am – 10:30 am: Discussion

Day 2 (Thursday, September 23)

Session II: Networks for Machine Learning
Moderators: Muriel Médard (Massachusetts Institute of Technology) and Manya Ghobadi (Massachusetts Institute of Technology)

8:10 am – 8:30 am: Model-Based Deep Learning: Applications to Imaging and Communications, Yonina Eldar (Weizmann Institute of Science)

Abstract: Deep neural networks provide unprecedented performance gains in many real-world problems in signal and image processing. Despite these gains, the future development and practical deployment of deep networks are hindered by their black-box nature (e.g., a lack of interpretability and the need for very large training sets.) On the other hand, signal processing and communications have traditionally relied on classical statistical modeling techniques that utilize mathematical formulations representing the underlying physics, prior information, and additional domain knowledge. Simple classical models are useful but sensitive to inaccuracies and may lead to poor performance when real systems display complex or dynamic behavior. Here we introduce various approaches to model-based learning that merge parametric models with optimization tools leading to efficient, interpretable networks from reasonably sized training sets. We will consider examples of such model-based deep networks to image deblurring, image separation, super resolution in ultrasound and microscopy, and efficient communications systems. We will conclude showing how model-based methods can be used for efficient diagnosis of COVID-19 using X-ray and ultrasound.

Yonina Eldar

Speaker: Yonina Eldar is a Professor in the Department of Math and Computer Science at the Weizmann Institute of Science, Rehovot, Israel, where she heads the Center for Biomedical Engineering and Signal Processing. She is also a Visiting Professor at MIT and at the Broad Institute and an Adjunct Professor at Duke University and was a Visiting Professor at Stanford University. She is a member of the Israel Academy of Sciences and Humanities, an IEEE Fellow, and a EURASIP Fellow. She has received many awards for excellence in research and teaching, including the IEEE Signal Processing Society Technical Achievement Award, the IEEE/AESS Fred Nathanson Memorial Radar Award, the IEEE Kiyo Tomiyasu Award, the Michael Bruno Memorial Award from the Rothschild Foundation, the Weizmann Prize for Exact Sciences, and the Wolf Foundation Krill Prize for Excellence in Scientific Research. She is the Editor in Chief of Foundations and Trends in Signal Processing, and serves the IEEE on several technical and award committees. She heads the Committee for Promoting Gender Fairness in Higher Education Institutions in Israel.

 8:30 am – 8:40 am: Discussion

8:40 am – 9 am: Optical Networking for Machine Learning: The Light at the End of the Tunnel?, Paolo Costa (Microsoft Research)

Abstract: Emerging workloads driven by large-scale machine learning training and inference pose challenging requirements in terms of bandwidth and latency, which are hard to satisfy with current network infrastructure. Optical interconnects have the potential to address this issue by drastically reducing overall power consumption and providing ultra-low latency. Fully unleashing these benefits in practice, however, requires a deep rethinking of the current stack and a cross-layer design of the network, hardware, and software infrastructure. In this talk, I will review some of the opportunities and challenges as we embark on this journey.

Paolo Costa

Speaker: Paolo Costa is a Principal Researcher in the Cloud Infrastructure Group in Microsoft Research and an Honorary Lecturer with the Department of Computing of Imperial College London. His current research brings together hardware, optics, networking, and application-level expertise to take a cross-stack view towards developing optical technologies for next-generation data-center networks.

 9 am – 10 am: Discussion and Lunch

10 am – 10:20 am: Emerging Directions in Network Support for Distributed Machine Learning, Aditya Akella (University of Texas at Austin)

Abstract: Training machine learning (ML) models is a common-case workload at any data-driven enterprise. To keep up with evolving data and maintain a competitive edge, enterprises are employing more sophisticated features and more complex model architectures, and attempting to train faster at ever larger scales and deploy high-quality models frequently. Recent advances in hardware accelerators and model architectures have shifted the performance bottleneck of training from computation to communication. In this talk, I will present an overview of two recent contributions from my group on speeding up ML jobs’ communication. First, I will describe ATP, a service for in-network aggregation aimed at modern multi-rack, multi-job ML settings. ATP uses emerging programmable switch hardware to support decentralized, dynamic, best-effort gradient aggregation. It enables efficient and equitable sharing of limited switch resources across simultaneously running ML jobs, and gracefully accommodates heavy contention for switch resources. Second, I will describe Syndicate, a framework to minimize communication bottlenecks and speed up training for large-scale hybrid-parallel models and heterogeneous network interconnects. Syndicate proposes a novel “motif” abstraction, to break large communication work into smaller pieces, and re-tools interfaces in distributed ML networking stacks to allow for jointly optimizing scheduling and execution planning. Motifs allow Syndicate to pack and order communication work so as to maximize both network utilization and overlap with compute.

Aditya Akella

Speaker: Aditya Akella is a Regents Chair Professor of Computer Science at University of Texas at Austin and a Software Engineer at Google. He received his B. Tech. from IIT Madras (2000), and Ph.D. from Carnegie Mellon University (2005). His research spans computer systems and networking, with a focus on programmable networks, formal methods in systems, and systems for big data and machine learning. His work has influenced the infrastructure of some of the world’s largest online service providers. He has received many awards for his contributions, including being selected as a finalist for the U.S. Blavatnik National Award for Young Scientists (2020 and 2021), UW-Madison “Professor of the Year” Award (2019 and 2017), IRTF Applied Networking Research Prize (2015), SIGCOMM Rising Star Award (2014), NSF CAREER Award (2008), and several best paper awards.

 10:20 am – 10:30 am: Discussion

Day 3 (Friday, September 24)

Session III: Networks by Machine Learning
Moderators: Muriel Médard (Massachusetts Institute of Technology) and Manya Ghobadi (Massachusetts Institute of Technology)

8:10 am – 8:30 am: An Overview of the Network Requirements for Some Key Facebook Workloads, Pavan Balaji (Facebook)

Abstract: Like many BigTech companies, Facebook heavily relies on AI for several aspects of its business ecosystem including Ads services, Facebook and Instagram feeds, recommendations on Facebook marketplace, hate speech recognition, and language translation. Training and running inference with these models, however, requires large parallel computing systems with cutting-edge HPC technologies including fast compute units, high-bandwidth memory, and high-speed networks. In this talk, I will focus on the network requirements of some of our key workloads. I will discuss network constraints including injection and bisection bandwidth requirements that, if not carefully designed, can limit the scalability of our models; network congestion issues and topology requirements for our workload characteristics; and why traditional HPC network communication libraries, such as MPI, might not be ideal for our workloads.

Pavan Balaji

Speaker: Pavan Balaji is an Applied Research Scientist at Facebook AI. Before joining Facebook, he was a Senior Computer Scientist and Group Lead at the Argonne National Laboratory, where he led two groups: Programming Models and Runtime Systems and Future Architectures for AI. His research interests include HPC hardware/software codesign for AI workloads, communication runtime systems, networks, and scale-out techniques, parallel programming models and runtime systems for communication and I/O on extreme-scale supercomputing systems, modern system architecture, cloud computing systems, and data-intensive computing. He has more than 200 publications in these areas and has delivered more than 200 talks and tutorials at various conferences and research institutes. He is a recipient of several awards including the IEEE TCSC Award for Excellence in Scalable Computing (Middle Career) in 2015; TEDxMidwest Emerging Leader award in 2013; U.S. Department of Energy Early Career award in 2012; Crain's Chicago 40 under 40 award in 2012; Los Alamos National Laboratory Director's Technical Achievement award in 2005; Ohio State University Outstanding Researcher award in 2005; best paper awards at PACT 2019, ACM HPDC 2018, IEEE ScalCom 2013, Euro PVM/MPI 2009, ISC 2009, IEEE Cluster 2008, Euro PVM/MPI 2008, ISC 2008; best paper finalist at IEEE/ACM SC 2014; best poster award at IEEE ICPADS 2018; best poster finalist at IEEE/ACM SC 2014; and best student poster award at ICPP 2018. He has served as a chair or editor for nearly 100 journals, conferences and workshops, and as a technical program committee member in numerous conferences and workshops. He is a senior member of the IEEE and a distinguished member of the ACM.

 8:30 am – 8:40 am: Discussion

8:40 am – 9 am: Adventures in Learning-Based Rate Control, Brighten Godfrey (University of Illinois at Urbana-Champaign)

Abstract: Rate control algorithms are central to the Internet, for transport layer congestion control, and application layer video delivery. Rate control is also one of the most subtly difficult and persistent challenges in networking, having to infer good decisions at millisecond timescales in diverse, opaque environments using only a trickle of information. In this talk, we'll discuss how this problem can be approached with an online learning-based architecture, where an explicit performance objective function guides a rate controller learning from real-time observations. After introducing the basic Performance-oriented Congestion Control framework, we'll discuss how one can plug in new objectives (for example, to build a "scavenger" transport) and new control algorithms (for example, deep reinforcement learning). Finally, we'll discuss challenges in the approach and future directions including using ML agents as "adversaries" to produce challenging environments for rate controllers. These projects are joint work with UIUC and Hebrew University of Jerusalem.

Brighten Godfrey

Speaker: Brighten Godfrey is a professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign, and a technical director at VMware. He co-founded and served as CTO of network verification pioneer Veriflow, through its 2019 acquisition by VMware. He received his Ph.D. at University of California, Berkeley in 2009, and his B.S. at Carnegie Mellon University in 2002. His research interests lie in the design of networked systems and algorithms. He is a winner of the ACM SIGCOMM Rising Star Award, the Sloan Research Fellowship, the National Science Foundation CAREER Award, and several best paper awards.

 9 am – 10 am: Discussion and Lunch

10 am – 10:20 am: Edge Video Services on 5G Infrastructure, Ganesh Ananthanarayanan (Microsoft Research)

Abstract: Creating a programmable software infrastructure for telecommunication operations promises to reduce both the capital expenditure and operational expenses of 5G telecommunications operators. The convergence of telecommunications, the cloud, and edge infrastructures will open up opportunities for new innovations and revenue for both the telecommunications industry and the cloud ecosystem. This talk will focus on video, the dominant traffic type on the Internet since the introduction of 4G networks. With 5G, not only will the volume of video traffic increase, but there will also be many new solutions for industries, from retail to manufacturing to healthcare and forest monitoring, infusing deep learning and AI for video analytics scenarios. The talk will touch on various advances in edge video analytics systems, including real-time inference over edge hierarchies, continuous learning of models, and privacy-preserving video analytics.

Ganesh Ananthanarayanan

Speaker: Ganesh Ananthanarayanan is a Principal Researcher at Microsoft. His research interests are broadly in systems and networking, with a recent focus on live video analytics, cloud computing and large scale data analytics systems, and Internet performance. He has published over 30 papers in systems and networking conferences such as USENIX OSDI, ACM SIGCOMM and USENIX NSDI, which have been recognized with the Best Paper Award at ACM Symposium on Edge Computing (SEC) 2020, CSAW 2020 Applied Research Competition Award (runner-up), ACM MobiSys 2019 Best Demo Award (runner-up), and highest-rated paper at ACM Symposium on Edge Computing (SEC) 2018. His work on “Video Analytics for Vision Zero” on analyzing traffic camera feeds won the Institute of Transportation Engineers 2017 Achievement Award as well as the “Safer Cities, Safer People” U.S. Department of Transportation Award. He has collaborated with and shipped technology to Microsoft’s cloud and online products like Azure Cloud, Cosmos, Azure Live Video Analytics, and Skype. He was a founding member of the ACM Future of Computing Academy. Prior to joining Microsoft Research, he completed his Ph.D. at the University of California, Berkeley in 2013, where he was also a recipient of the Regents Fellowship. Prior to his Ph.D., he was a Research Fellow at Microsoft Research India.

 10:20 am – 10:30 am: Discussion and Closing Remarks