Mosharaf Chowdhury

Last updated
Mosharaf Chowdhury
Alma mater University of California, Berkeley, Bangladesh University of Engineering and Technology
Awards
Scientific career
Institutions University of Michigan, Ann Arbor
Thesis Coflow: A Networking Abstraction for Distributed Data-Parallel Applications  (2015)
Doctoral advisor Ion Stoica
Website www.mosharaf.com

Mosharaf Chowdhury is a Bangladeshi-American computer scientist known for his contributions to the fields of computer networking and large-scale systems for emerging machine learning and big data workloads. He is an Associate Professor of Computer Science and Engineering at the University of Michigan, Ann Arbor and leads SymbioticLab. He is the creator of coflow [2] and the co-creator of Apache Spark. [3]

Research

Chowdhury specializes in the fields of computer networking and large-scale systems for emerging machine learning and big data workloads. Especially, his research aims for the symbiosis of AI/ML applications and software/hardware infrastructure in wide-area, datacenter-scale, and rack-scale computing.

Chowdhury pioneered many fields of research and technology in the context of emerging workloads and computer systems. Chowdhury created Infiniswap, the first practical memory disaggregation system, [4] Salus, the first software-only GPU sharing system for deep learning, [5] FedScale, the largest federated learning benchmark and platform, [6] and Zeus, the first GPU time & energy optimization framework for deep learning. [7]

Related Research Articles

<span class="mw-page-title-main">Sequential access</span>

Sequential access is a term describing a group of elements being accessed in a predetermined, ordered sequence. It is the opposite of random access, the ability to access an arbitrary element of a sequence as easily and efficiently as any other at any time.

Power management is a feature of some electrical appliances, especially copiers, computers, computer CPUs, computer GPUs and computer peripherals such as monitors and printers, that turns off the power or switches the system to a low-power state when inactive. In computing this is known as PC power management and is built around a standard called ACPI which superseded APM. All recent computers have ACPI support.

Scott J. Shenker is an American computer scientist, and professor of computer science at the University of California, Berkeley. He is also the leader of the Extensible Internet Group at the International Computer Science Institute in Berkeley, California.

<span class="mw-page-title-main">Urs Hölzle</span> Swiss computer scientist

Urs Hölzle is a Swiss software engineer and technology executive. As Google's eighth employee and its first VP of Engineering, he has shaped much of Google's development processes and infrastructure, as well as its engineering culture. His most notable contributions include leading the development of fundamental cloud infrastructure such as energy-efficient data centers, distributed compute and storage systems, and software-defined networking. Until July 2023, he was the Senior Vice President of Technical Infrastructure and Google Fellow at Google. In July 2023, he transitioned to being a Google Fellow only.

Randy Howard Katz is a distinguished professor emeritus at University of California, Berkeley of the electrical engineering and computer science department.

The Collective Tuning Initiative is a community-driven initiative started by Grigori Fursin to develop free and open-source research tools with a unified API for collaborative characterization, optimization and co-design of computer systems. They enable sharing of benchmarks, data sets and optimization cases from the community in the Collective Optimization Database through unified web services to predict better optimizations or architecture designs. Using common research-and-development tools should help to improve the quality and reproducibility of computer systems' research and development and accelerate innovation in this area. This approach helped establish Reproducibility Initiatives and Artifact Evaluation at several ACM-sponsored conferences to encourage sharing of artifacts and validation of experimental results from accepted papers.

David Ron Karger is an American computer scientist who is professor and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL) at the Massachusetts Institute of Technology.

<span class="mw-page-title-main">Ion Stoica</span> Romanian–American computer scientist

Ion Stoica is a Romanian–American computer scientist specializing in distributed systems, cloud computing and computer networking. He is a professor of computer science at the University of California, Berkeley and co-director of AMPLab. He co-founded Conviva and Databricks with other original developers of Apache Spark.

Marinus Frans (Frans) Kaashoek is a Dutch computer scientist, entrepreneur, and Charles Piper Professor at the Massachusetts Institute of Technology.

A data center is a pool of resources interconnected using a communication network. A data center network (DCN) holds a pivotal role in a data center, as it interconnects all of the data center resources together. DCNs need to be scalable and efficient to connect tens or even hundreds of thousands of servers to handle the growing demands of cloud computing. Today's data centers are constrained by the interconnection network.

In computing, energy proportionality is a measure of the relationship between power consumed in a computer system, and the rate at which useful work is done. If the overall power consumption is proportional to the computer's utilization, then the machine is said to be energy proportional. Equivalently stated, for an idealized energy proportional computer, the overall energy per operation is constant for all possible workloads and operating conditions.

An AI accelerator, deep learning processor, or neural processing unit (NPU) is a class of specialized hardware accelerator or computer system designed to accelerate artificial intelligence and machine learning applications, including artificial neural networks and machine vision. Typical applications include algorithms for robotics, Internet of Things, and other data-intensive or sensor-driven tasks. They are often manycore designs and generally focus on low-precision arithmetic, novel dataflow architectures or in-memory computing capability. As of 2024, a typical AI integrated circuit chip contains tens of billions of MOSFET transistors.

Reynold Xin is a computer scientist and engineer specializing in big data, distributed systems, and cloud computing. He is a co-founder and Chief Architect of Databricks. He is best known for his work on Apache Spark, a leading open-source Big Data project. He was designer and lead developer of the GraphX, Project Tungsten, and Structured Streaming components and he co-designed DataFrames, all of which are part of the core Apache Spark distribution; he also served as the release manager for Spark's 2.0 release.

<span class="mw-page-title-main">Richard Vuduc</span>

Richard Vuduc is a tenured professor of computer science at the Georgia Institute of Technology. His research lab, The HPC Garage, studies high-performance computing, scientific computing, parallel algorithms, modeling, and engineering. He is a member of the Association for Computing Machinery (ACM). As of 2022, Vuduc serves as Vice President of the SIAM Activity Group on Supercomputing. He has co-authored over 200 articles in peer-reviewed journals and conferences.

<span class="mw-page-title-main">Concurrent hash table</span>

A concurrent hash table or concurrent hash map is an implementation of hash tables allowing concurrent access by multiple threads using a hash function.

<span class="mw-page-title-main">Owl Scientific Computing</span> Numerical programming library for the OCaml programming language

Owl Scientific Computing is a software system for scientific and engineering computing developed in the Department of Computer Science and Technology, University of Cambridge. The System Research Group (SRG) in the department recognises Owl as one of the representative systems developed in SRG in the 2010s. The source code is licensed under the MIT License and can be accessed from the GitHub repository.

A Network Coordinate System is a system for predicting characteristics such as the latency or bandwidth of connections between nodes in a network by assigning coordinates to nodes. More formally, It assigns a coordinate embedding to each node in a network using an optimization algorithm such that a predefined operation estimates some directional characteristic of the connection between node and .

The Center for Supercomputing Research and Development (CSRD) at the University of Illinois (UIUC) was a research center funded from 1984 to 1993. It built the shared memory Cedar computer system, which included four hardware multiprocessor clusters, as well as parallel system and applications software. It was distinguished from the four earlier UIUC Illiac systems by starting with commercial shared memory subsystems that were based on an earlier paper published by the CSRD founders. Thus CSRD was able to avoid many of the hardware design issues that slowed the Illiac series work. Over its 9 years of major funding, plus follow-on work by many of its participants, CSRD pioneered many of the shared memory architectural and software technologies upon which all 21st century computation is based.

References

  1. "SIGCOMM Doctoral Dissertation Award | acm sigcomm". www.sigcomm.org. Retrieved 2023-05-07.
  2. Chowdhury, Mosharaf; Stoica, Ion (2012-10-29). "Coflow: A networking abstraction for cluster applications". Proceedings of the 11th ACM Workshop on Hot Topics in Networks. HotNets-XI. New York, NY, USA: Association for Computing Machinery. pp. 31–36. doi:10.1145/2390231.2390237. ISBN   978-1-4503-1776-4. S2CID   6956491.
  3. Zaharia, Matei; Chowdhury, Mosharaf; Franklin, Michael J.; Shenker, Scott; Stoica, Ion (2010-06-22). "Spark: cluster computing with working sets". Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. HotCloud'10. USA: USENIX Association.
  4. Gu, Juncheng; Lee, Youngmoon; Zhang, Yiwen; Chowdhury, Mosharaf; Shin, Kang G. (2017). "Efficient Memory Disaggregation with Infiniswap". Proceedings of the 14th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’17). pp. 649–667. ISBN   978-1-931971-37-9.
  5. Yu, Peifeng; Chowdhury, Mosharaf (2020). "Fine-Grained GPU Sharing Primitives for Deep Learning Applications" (PDF). Proceedings of the 3rd MLSys Conference.
  6. Lai, Fan; Dai, Yinwei; Singapuram, Sanjay; Liu, Jiachen; Zhu, Xiangfeng; Madhyastha, Harsha; Chowdhury, Mosharaf (2022-06-28). "FedScale: Benchmarking Model and System Performance of Federated Learning at Scale". Proceedings of the 39th International Conference on Machine Learning. PMLR: 11814–11827. arXiv: 2105.11367 .
  7. You, Jie; Chung, Jae-Won; Chowdhury, Mosharaf (2023). "Zeus: Understanding and Optimizing {GPU} Energy Consumption of {DNN} Training". Usenix Nsdi: 119–139. ISBN   978-1-939133-33-5.