Switching from TensorFlow to PyTorch — A Practical Assessment

June 2020

It is June 2020 and I am writing this from Pune, India — where I have been stuck since March. I came for a family visit and then the borders closed. Travel restrictions between India and Singapore mean I cannot get back, so I have been working remotely with the Halialabs team for three months now. The lockdown here has been intense — curfews, everything shut, uncertainty about when flights will resume.

The pandemic has forced a strange kind of focus. With client meetings cancelled and travel impossible, we have had time to address technical debt that kept getting deferred. One decision we finally made: switching our primary deep learning framework from TensorFlow to PyTorch.

This is not a hot take or a framework war post. It is a practical accounting of why we switched, what the trade-offs are, and what the framework landscape actually looks like in mid-2020 for a small team building production NLP.

Where We Started

Our codebase was mostly TensorFlow 1.x. We had built our HS code classification system, several NER pipelines, and a text classification service on TF 1.x with Keras as a frontend. The code worked. It was in production. Some of it had been running for over a year without issues.

But TensorFlow 1.x had problems that compounded over time:

Session-based execution was painful to debug. In TF 1.x, you build a computation graph, then execute it in a session. This means you cannot just print a tensor's value — you have to run it in a session, feed in the inputs, and inspect the output. Debugging a shape mismatch or a NaN in the loss meant scattering tf.print statements through the graph and re-running. Compared to writing regular Python, it felt like programming through a wall.

The API surface was enormous and unstable. TF 1.x had tf.layers, tf.keras.layers, tf.contrib.layers, and raw tf.nn. All did similar things. Different tutorials used different APIs. Every major release broke something. We had code using tf.contrib functions that were deprecated, moved, or removed across versions. Upgrading TensorFlow was a project in itself.

Keras integration was half-finished. Keras was supposed to be the simple API on top of TF. In practice, mixing Keras layers with raw TF operations created subtle bugs. The tf.keras and standalone keras packages diverged. Model serialization was inconsistent — some models saved with model.save(), others needed tf.saved_model, others needed tf.train.Saver. We had three different serialization formats in production.

Why PyTorch

PyTorch had been gaining traction since 2017, and by 2020 it had crossed a threshold for us. Three things tipped the decision:

Eager execution changes everything. PyTorch is eager by default — you write y = model(x) and it executes immediately. You can print intermediate values, set breakpoints, use pdb, inspect tensor shapes — all the normal Python debugging tools work. TensorFlow 2.0 added eager execution too, but it was bolted onto an existing graph-based system. In PyTorch, eager is the foundation. The difference in development speed is real — I estimate we debug models 2-3x faster in PyTorch than we did in TF 1.x.

HuggingFace Transformers is PyTorch-first. This was the most practical factor. HuggingFace's transformers library — which provides pre-trained BERT, GPT-2, RoBERTa, DistilBERT, and dozens of other models — is primarily a PyTorch library. TensorFlow support exists but lags behind. For an NLP team in 2020, not using HuggingFace is like not using scikit-learn in 2015 — you are reinventing work that has already been done. And HuggingFace means PyTorch.

Research papers ship PyTorch code. In 2018, most papers released TensorFlow code. By 2020, the majority release PyTorch code — or PyTorch only. If we want to reproduce results, try new architectures, or adapt published models, PyTorch is where the code is. This is a network effect that has tipped decisively.

What We Gave Up

The switch was not free. TensorFlow had real advantages we had to replace or accept losing:

TF Serving is excellent. Deploying a TensorFlow model behind a gRPC or REST endpoint with TF Serving is battle-tested, performant, and handles batching, versioning, and GPU scheduling out of the box. PyTorch's serving story in mid-2020 is fragmented — TorchServe just launched (experimental), and most teams roll their own Flask/FastAPI wrapper. We moved to a custom FastAPI service with TorchScript for serialization. It works, but it is more code to maintain.

TF Lite and mobile deployment. We had one project exploring on-device inference. TF Lite is mature for this. PyTorch Mobile exists but is early. For this specific use case, we kept TensorFlow.

Existing production models. We did not rewrite our running production models. The NER pipeline and classification service stayed on TF 1.x. The switch applies to new projects going forward. Rewriting working production code for a framework change is almost never worth it.

What About TensorFlow 2.0?

The obvious question: TF 2.0 added eager execution, cleaned up the API, and made Keras the official high-level interface. Why not just upgrade to TF 2.0 instead of switching frameworks?

We considered it. Three things pushed us away:

Migration effort is similar. TF 1.x to TF 2.0 is not a simple upgrade. Session-based code has to be rewritten. tf.contrib is gone. The migration script (tf_upgrade_v2) handles syntax changes but not architectural changes. If we were going to rewrite, we might as well rewrite in the framework with better long-term momentum.

The community has moved. NLP research in 2020 is happening in PyTorch. When a new model is published — BART, ELECTRA, T5 — the reference implementation is PyTorch. The HuggingFace ecosystem, which has become the de facto standard for NLP, is PyTorch-native. TF 2.0 is catching up, but "catching up" is not where you want your core framework to be.

Google's own mixed signals. Google uses TensorFlow internally, but also uses JAX for new research (the T5 paper used JAX). TF 2.0, Keras, JAX, Trax — the number of Google-backed ML frameworks is confusing. PyTorch has one framework, one way of doing things, and a clear roadmap. For a small team, simplicity wins.

The Pandemic Factor

I would not have written this post without COVID. Not because the pandemic changed our technical reasoning — the framework decision was overdue regardless. But because the pandemic created the conditions for making it.

Pre-COVID, we were busy with client delivery, travel, on-site work. Technical debt decisions kept getting deferred — "we'll switch after this project," "we'll refactor after this deployment." The pandemic removed the excuses. With client timelines extended, travel impossible, and me working from a different timezone than the rest of the team, we finally had space to make infrastructure decisions that would pay off over months rather than days. The timezone gap between India and Singapore is small enough to make remote collaboration workable, and the forced async style actually helped — more written communication, fewer interruptions.

The pandemic has been devastating — the human cost in India and globally is real, and I do not want to minimize it. Being separated from my normal life in Singapore, watching the situation in India worsen, seeing friends lose jobs and businesses we worked with close — it has been hard. The uncertainty about when I can return, when things normalize, is constant.

But for the specific, narrow question of software decisions: the forced slowdown gave us time to think about foundations rather than features. I suspect many engineering teams are having similar moments — using the disruption to address choices they should have made a year ago.

Practical Notes on the Migration

For anyone considering a similar switch, a few things we learned:

Start with the training loop, not the model. The biggest difference between TF and PyTorch is not the layer APIs (they are similar) — it is the training loop. TF/Keras hides the training loop behind model.fit(). PyTorch makes you write it explicitly: forward pass, loss computation, loss.backward(), optimizer.step(). This is more code, but it is also more control. Start by understanding the PyTorch training loop, and the rest follows.

Data loading is different and better. PyTorch's DataLoader with Dataset classes and multiprocessing workers is cleaner than tf.data. It is regular Python — no special API, no graph-mode data pipeline. You can debug your data pipeline with print statements. This sounds trivial; it is not.

Model serialization: use TorchScript for production. torch.save pickles the model and works for checkpoints. For production serving, TorchScript (torch.jit.trace or torch.jit.script) produces a serialized model that can run without Python. Not every model traces cleanly — dynamic control flow requires torch.jit.script, which has its own quirks. But for standard architectures (BERT, LSTM, CNN), tracing works.

Use pytorch-lightning to reduce boilerplate. We discovered pytorch-lightning partway through the migration. It provides the structure of Keras (organized training loop, callbacks, logging) without hiding the PyTorch underneath. For a team that was used to Keras's model.fit(), Lightning was a good bridge.

The new stack: HuggingFace + PyTorch + Lightning for training, TorchScript + FastAPI for serving.

Six Months In

We are about three months into active development on PyTorch (we made the decision in March, started the migration in April). Early observations:

Development velocity is noticeably higher. Prototyping a new model takes about half the time it did in TF 1.x. Debugging takes a third. The difference is largest for new or experimental work.
HuggingFace is as good as advertised. Fine-tuning DistilBERT for our NER task took less than a day, including data preparation. The same work in TF 1.x took a week the first time we did it.
Production serving needs more work. Our FastAPI + TorchScript setup works but lacks the monitoring and scaling features of TF Serving. We are evaluating TorchServe but it is still early.
Old TF code is still running fine. The production NER pipeline and classification service have not been touched. They do not need to be. If it works, let it work.

The framework decision is less important than it feels. The real work — understanding the data, designing the pipeline, evaluating the model, getting it into production — is framework-agnostic. PyTorch makes the daily mechanics of that work faster and more pleasant, which compounds over weeks and months. But it is the work that matters, not the tool.

If you are on TF 1.x and considering a switch: the answer is probably yes, but do not rewrite working production code. Adopt PyTorch for new projects. Let old code age gracefully. And if you are starting fresh in 2020 — start with PyTorch. The ecosystem has tipped.