We Switched to PyTorch in 2020. Was It the Right Call?

October 2024

In June 2020, I wrote about switching our NLP projects from TensorFlow 1.x to PyTorch. I was stuck in Pune during the pandemic, working remotely with the Halialabs team in Singapore, and we used the forced slowdown to make an infrastructure decision we had been deferring.

Four years later, I have left that startup, moved to a global bank, and the ML framework landscape looks nothing like what any of us expected. This is a look back at whether the decision was right — and a look forward at where tooling is headed.

The Short Answer

Yes. Switching to PyTorch was unambiguously the right call.

Not because PyTorch turned out to be the perfect framework — it has its own problems. But because the ecosystem consolidated around it in ways that made every alternative more expensive. In 2020, it was a bet on momentum. By 2024, it is the default.

What Happened to TensorFlow

This is the part I did not predict. In 2020, I expected TensorFlow 2.x to remain a strong contender — it had Google's backing, a mature serving infrastructure, and a massive installed base. I wrote that TF's advantages in production deployment were real and that we kept TF for some projects.

What actually happened:

Research abandoned TensorFlow almost entirely. By 2022-2023, the shift was decisive. At major conferences (NeurIPS, ICML, ACL), PyTorch papers outnumber TensorFlow papers roughly 5:1 or higher. New model releases — LLaMA, Mistral, Stable Diffusion, Whisper — are PyTorch-only. The research ecosystem that used to be split is now overwhelmingly PyTorch.

Google moved to JAX internally. This was the signal that mattered most. Google's own research teams — DeepMind, Google Brain (now merged) — increasingly use JAX rather than TensorFlow for new work. Gemini was trained with JAX. PaLM was trained with JAX. When the creator of a framework stops using it for new flagship projects, that tells you something.

Keras decoupled from TensorFlow. In late 2023, Keras 3 launched as a multi-backend framework — supporting TensorFlow, PyTorch, and JAX. This was a tacit acknowledgment that Keras needed to hedge beyond TF. It is a well-engineered library, but the multi-backend approach means it is no longer the on-ramp to TensorFlow. It is a layer that can sit on top of whichever framework wins.

TF Serving is still good. I will give credit where it is due. TensorFlow Serving, TF Lite, and the TensorFlow.js ecosystem remain strong for deployment use cases, especially edge and mobile. If your primary concern is inference on constrained hardware, TF still has advantages. But for the full workflow — research, training, fine-tuning, experimentation — PyTorch won.

What I Got Right

HuggingFace became the center of gravity. In 2020 I wrote that not using HuggingFace was like not using scikit-learn. That turned out to be an understatement. HuggingFace is now the GitHub of ML models — the Hub hosts over 500,000 models, the transformers library is the standard way to load and fine-tune models, and the ecosystem (datasets, tokenizers, evaluate, PEFT) has become the default toolkit for applied ML. It is still PyTorch-first.

PyTorch's ecosystem compounded. Lightning matured into a serious framework. TorchServe improved substantially. ONNX export became reliable. The tooling that was thin in 2020 — serving, profiling, distributed training — filled in. Meta's continued investment kept the project healthy.

"Don't rewrite working code" was correct. The TF models we left running in 2020 continued running fine at Halialabs. The right approach was always: adopt the new framework for new work, let old code age gracefully.

What I Got Wrong

I underestimated JAX. In 2020 I barely mentioned JAX. By 2024, it is the framework of choice at Google and DeepMind for training the largest models. JAX's functional paradigm — pure functions, explicit randomness, composable transformations (jit, grad, vmap, pmap) — turns out to be a better fit for distributed training across thousands of TPUs. It will not replace PyTorch for most practitioners, but for frontier model training, JAX is serious.

I overestimated how much the training framework matters for production LLMs. In 2020, I was thinking about training models. In 2024, most production LLM work is not training — it is inference, fine-tuning with adapters (LoRA, QLoRA), and orchestration (RAG, agents, chains). The training framework matters less when you are downloading a pre-trained model and serving it. The tools that matter now — vLLM, TGI, llama.cpp, Ollama — are inference engines, not training frameworks. They operate at a layer below (or beside) PyTorch.

I did not foresee the convergence of hardware and framework. Apple's MLX for Apple Silicon, NVIDIA's TensorRT-LLM, Intel's OpenVINO — hardware vendors are building their own inference stacks optimized for their chips. The framework layer is becoming thinner. You train in PyTorch, export to ONNX or a vendor format, and serve with a hardware-optimized engine. The training framework is an authoring tool, not a runtime.

The Current State (Late 2024)

If someone asked me today what to use, the answer depends on what they are doing:

Fine-tuning or training custom models: PyTorch + HuggingFace. This is not even a discussion anymore. The ecosystem, community, model availability, and tooling are overwhelmingly concentrated here. Use Lightning or the HuggingFace Trainer for structure.

Serving LLMs in production: vLLM or TGI for GPU inference. llama.cpp or Ollama for local/edge inference. These are specialized tools that handle batching, quantization, KV caching, and GPU memory management in ways that a raw PyTorch serving loop cannot match. TorchServe works for non-LLM models.

Training at Google scale: JAX. If you have access to TPU pods and are training billion-parameter models from scratch, JAX's functional model and TPU integration are superior. For everyone else, this is academic.

On-device / edge: Depends on hardware. TF Lite (Android), CoreML/MLX (Apple), ONNX Runtime (cross-platform). PyTorch Mobile exists but is not the leader here.

Quick prototyping: Keras 3 is actually excellent for this now. It runs on PyTorch, TF, or JAX backends, and the API is clean. For rapid experimentation where you do not care about the backend, it is a good choice.

Where This Is Going

Looking forward, three trends seem clear to me:

The framework becomes invisible. Most ML practitioners in 2025 will interact with models through high-level APIs — HuggingFace, LangChain, inference engines, cloud APIs. The training framework will matter less and less as pre-trained models dominate. Writing raw PyTorch training loops will become a specialist skill, like writing raw SQL is for most application developers — important to understand, rarely done directly.

Inference optimization is the new battleground. Training a frontier model costs hundreds of millions of dollars and is done by a handful of labs. Serving that model efficiently to millions of users is the problem that most engineers face. Quantization (GPTQ, AWQ, GGUF), speculative decoding, continuous batching, paged attention — these inference optimizations are where the most impactful work is happening. The tools are new, fast-moving, and often independent of the training framework.

Compilation will unify the fragmentation. PyTorch 2.0's torch.compile (backed by TorchDynamo and TorchInductor) is an attempt to get the best of both worlds — eager execution for development, compiled execution for performance. If this matures, it could make the eager-vs-graph debate obsolete. Write eager PyTorch, compile it for production, serve it on any backend. We are not there yet, but the direction is clear.

Four years ago I switched frameworks during a pandemic, stuck in Pune, working remotely with a team in Singapore. It was the right decision for reasons I anticipated (HuggingFace, debugging, community) and reasons I did not (TensorFlow's decline, JAX's rise, the inference engine revolution).

The meta-lesson: framework decisions matter less than they feel like they matter. The choice between TF and PyTorch in 2020 felt consequential. In hindsight, what mattered was that we were building NLP systems, understanding the data, and shipping to production. The framework was the medium, not the message. That is even more true in 2024, when the framework is increasingly just an authoring tool for models that will be served by something else entirely.

Pick the framework with the best ecosystem for your work today. Do not overthink it. And do not rewrite working code.