Building ML systems that scale

Machine learning engineer crafting intelligent systems at Microsoft, focused on turning complex problems into elegant solutions. I work at the intersection of infrastructure, research, and real-world impact.

Siddharth Agrawal

About

I use ML to solve real business problems — not just to build interesting models. At Microsoft, I work on the health and quality of Azure, helping the platform stay reliable for its customers at global scale.

What drives me is understanding a business’s most important challenges, then figuring out how ML can help — quickly, cleanly, and with impact. I work iteratively: scoping tightly, shipping early, and improving based on feedback from the system, the data, and the people using it.

Translate problem

I break down complex problems into smaller, solvable chunks.

Scalable and Secure by Design

A good system is designed deliberately to be scalable and secure. I build systems that are easy to reason about, debug, and extend.

Keep it simple

I believe complexity is an outcome of interaction between simpler components.

Selected Projects

Distributed Model Training Platform

Built a fault-tolerant distributed training system that reduced model training time by 60% while handling dynamic resource allocation across thousands of GPUs.

PyTorchKubernetesRayAzure
Used by 200+ ML engineers

Real-time Recommendation Engine

Designed and deployed a low-latency recommendation system serving 10M+ requests per day with sub-100ms response times.

TensorFlowRedisgRPCDocker
15% increase in user engagement

AutoML Pipeline Framework

Created an end-to-end AutoML platform that democratized ML model development, enabling non-experts to build production-ready models.

PythonMLflowApache AirflowPostgreSQL
Adopted across 5 product teams

Recent Writing

Let's Connect

Always interested in discussing ML systems, sharing ideas, or exploring opportunities to build something meaningful together.