What Is TensorFlow?
TensorFlow is Google's open-source platform for machine learning and deep learning. Originally released in 2015 by the Google Brain team, it has evolved into one of the two dominant deep learning frameworks — alongside PyTorch. TensorFlow 2 (released in 2019) was a major rewrite that brought eager execution by default, tight Keras integration, and a dramatically simpler API.
At its core, TensorFlow provides three fundamental capabilities:
- Tensor computation — multi-dimensional arrays with GPU/TPU acceleration
- Automatic differentiation —
tf.GradientTaperecords operations and computes gradients automatically - Production deployment — a complete ecosystem for serving, mobile, web, and edge deployment
@tf.function decorator when you need maximum performance, giving you the best of both worlds.
TensorFlow 1 vs TensorFlow 2
If you've heard that TensorFlow is "hard to learn," that reputation comes from TF 1's graph-first design. TF 2 is a completely different experience:
| Feature | TensorFlow 1.x | TensorFlow 2.x |
|---|---|---|
| Execution | Graph mode (build then run) | Eager by default (immediate) |
| Sessions | tf.Session().run() required | No sessions needed |
| API | Low-level, verbose, multiple APIs | Clean Keras-based high-level API |
| Debugging | Difficult (deferred execution) | Standard Python debugger works |
| Variables | Global collections, tf.get_variable() | Simple tf.Variable() |
| Graph optimization | Automatic | Opt-in via @tf.function |
The TensorFlow Ecosystem
TensorFlow's greatest strength is its comprehensive production ecosystem. No other framework offers this breadth of deployment options:
flowchart TD
A["TensorFlow Core
Tensors + GradientTape"] --> B["Keras
High-Level Model API"]
A --> C["tf.data
Data Pipelines"]
A --> D["TF Hub
Pretrained Models"]
A --> E["TFX
ML Pipelines (Prod)"]
A --> F["TF Lite
Mobile & Edge"]
A --> G["TF.js
Browser ML"]
A --> H["TF Serving
Model Serving"]
A --> I["TensorBoard
Visualization"]
style A fill:#132440,stroke:#3B9797,color:#ffffff
style B fill:#16476A,stroke:#3B9797,color:#ffffff
style C fill:#16476A,stroke:#3B9797,color:#ffffff
style D fill:#16476A,stroke:#3B9797,color:#ffffff
style E fill:#3B9797,stroke:#132440,color:#ffffff
style F fill:#3B9797,stroke:#132440,color:#ffffff
style G fill:#3B9797,stroke:#132440,color:#ffffff
style H fill:#3B9797,stroke:#132440,color:#ffffff
style I fill:#3B9797,stroke:#132440,color:#ffffff
When to Choose TensorFlow vs PyTorch
Both are excellent frameworks. Your choice depends on your goals:
| Consideration | Choose TensorFlow | Choose PyTorch |
|---|---|---|
| Production deployment | TF Serving, TFLite, TF.js — unmatched breadth | TorchServe, ONNX — growing rapidly |
| Mobile/edge/browser | TFLite + TF.js are mature and battle-tested | ExecuTorch is newer |
| Research | Used in some Google research | ~75% of papers at NeurIPS/ICML |
| Google Cloud / TPU | First-class TPU support | TPU support via PyTorch/XLA |
| Enterprise ML pipelines | TFX is production-grade | Requires custom tooling |
| Learning curve | Easy with Keras, deeper for low-level | Pythonic and intuitive |
| Community | Large, especially in industry | Large, especially in academia |
Installation & Setup
TensorFlow installs via pip like any other Python package. A single command gives you CPU support. For GPU acceleration, you'll need compatible NVIDIA drivers and CUDA — but the default pip package now bundles GPU support automatically on supported systems.
CPU-Only Installation
For learning and development, CPU is sufficient. Install TensorFlow with a single pip command:
# Install TensorFlow (CPU + GPU unified package since TF 2.15+)
pip install tensorflow
# For a specific version
pip install tensorflow==2.16.1
GPU Installation
Since TensorFlow 2.15, the tensorflow pip package includes GPU support automatically on Linux. On Windows, you may still need tensorflow[and-cuda]:
# Linux: GPU support included in the default package
pip install tensorflow
# Windows: explicit CUDA bundle
pip install tensorflow[and-cuda]
# Verify CUDA availability after install
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
Verifying Your Installation
After installing TensorFlow, run this quick sanity check to confirm everything works. This snippet prints your TF version, checks for GPU availability, and performs a simple tensor operation to verify eager execution:
import tensorflow as tf
# Print version and build info
print("TensorFlow version:", tf.__version__)
print("Eager execution:", tf.executing_eagerly())
# Check for GPU
gpus = tf.config.list_physical_devices('GPU')
print(f"GPUs available: {len(gpus)}")
for gpu in gpus:
print(f" - {gpu.name}: {gpu.device_type}")
# Quick tensor operation to verify everything works
x = tf.constant([1.0, 2.0, 3.0])
print("Tensor:", x)
print("Sum:", tf.reduce_sum(x).numpy())
# Expected: TensorFlow version: 2.x.x, Eager execution: True, Sum: 6.0
If you see Eager execution: True and a sum of 6.0, your installation is working correctly. The .numpy() call converts the TensorFlow tensor to a regular Python number — this only works in eager mode.
import tensorflow as tf. Every TensorFlow tutorial, paper, and codebase uses this alias. You'll also frequently see from tensorflow import keras for the high-level API.
Tensors: The Core Data Structure
A tensor is a multi-dimensional array — the fundamental data structure in TensorFlow. If you know NumPy, tensors will feel familiar. The key differences: TensorFlow tensors are immutable (you can't change values in-place), can run on GPUs and TPUs, and are tracked by the automatic differentiation engine.
Tensors are classified by their rank (number of dimensions):
| Rank | Name | Example | Shape |
|---|---|---|---|
| 0 | Scalar | A single number: 42 | () |
| 1 | Vector | A list: [1, 2, 3] | (3,) |
| 2 | Matrix | A 2D grid: [[1, 2], [3, 4]] | (2, 2) |
| 3+ | N-D Tensor | Images, video, batches | (batch, height, width, channels) |
Creating Tensors
TensorFlow provides several factory functions for creating tensors. The most common is tf.constant(), which creates an immutable tensor from a Python list or NumPy array. Here are the essential creation methods:
import tensorflow as tf
# Scalar (rank 0) — a single number
scalar = tf.constant(42)
print(f"Scalar: {scalar}, shape: {scalar.shape}, rank: {scalar.ndim}")
# Vector (rank 1) — a list of numbers
vector = tf.constant([1.0, 2.0, 3.0, 4.0])
print(f"Vector: {vector}, shape: {vector.shape}")
# Matrix (rank 2) — a 2D grid
matrix = tf.constant([[1, 2, 3],
[4, 5, 6]])
print(f"Matrix shape: {matrix.shape}") # (2, 3)
# 3D Tensor (rank 3) — e.g., a batch of sequences
tensor_3d = tf.constant([[[1, 2], [3, 4]],
[[5, 6], [7, 8]]])
print(f"3D Tensor shape: {tensor_3d.shape}") # (2, 2, 2)
# Zeros and ones
zeros = tf.zeros([3, 4]) # 3x4 matrix of zeros
ones = tf.ones([2, 3], dtype=tf.int32) # 2x3 matrix of ones
print(f"Zeros shape: {zeros.shape}, Ones dtype: {ones.dtype}")
# Random tensors
normal = tf.random.normal([3, 3], mean=0.0, stddev=1.0)
uniform = tf.random.uniform([2, 4], minval=0, maxval=10)
print(f"Normal:\n{normal.numpy()}")
print(f"Uniform:\n{uniform.numpy()}")
Notice how each tensor has a shape (dimensions), dtype (data type), and ndim (rank). The .numpy() method converts a tensor to a NumPy array for easy inspection.
Data Types & Casting
TensorFlow supports a wide range of data types. The default for floating-point numbers is tf.float32, which offers a good balance of precision and speed. You'll need to explicitly cast between types — TensorFlow won't do it automatically to prevent silent precision loss:
import tensorflow as tf
# Default float type is float32
a = tf.constant([1.0, 2.0, 3.0])
print(f"Default dtype: {a.dtype}") # tf.float32
# Explicit dtype specification
b = tf.constant([1, 2, 3], dtype=tf.float64)
c = tf.constant([True, False, True], dtype=tf.bool)
d = tf.constant(["hello", "tensorflow"], dtype=tf.string)
print(f"float64: {b.dtype}, bool: {c.dtype}, string: {d.dtype}")
# Casting between types
int_tensor = tf.constant([1, 2, 3])
float_tensor = tf.cast(int_tensor, dtype=tf.float32)
print(f"Cast int to float: {float_tensor}")
# Common dtypes: tf.float16, tf.float32, tf.float64,
# tf.int8, tf.int16, tf.int32, tf.int64,
# tf.bool, tf.string, tf.complex64
int and float tensors. You'll get a TypeError. Always use tf.cast() to convert explicitly.
Shapes & Ranks
Understanding tensor shapes is critical for building neural networks. Every layer expects inputs of a specific shape, and shape mismatches are the most common source of bugs. Here's how to inspect and manipulate shapes:
import tensorflow as tf
# Create a tensor and inspect its properties
t = tf.random.normal([2, 3, 4])
print(f"Shape: {t.shape}") # (2, 3, 4)
print(f"Rank (ndim): {t.ndim}") # 3
print(f"Dtype: {t.dtype}") # tf.float32
print(f"Total elements: {tf.size(t).numpy()}") # 24
# Shape as a tensor (useful inside tf.function)
print(f"tf.shape(): {tf.shape(t)}") # [2 3 4]
# Individual dimensions
print(f"Dim 0: {t.shape[0]}") # 2
print(f"Dim 1: {t.shape[1]}") # 3
print(f"Dim 2: {t.shape[2]}") # 4
Eager Execution
Eager execution means that TensorFlow operations execute immediately when called, returning concrete values instead of building a computational graph for later execution. This was a fundamental shift in TF 2 — making TensorFlow behave like regular Python code.
Eager Mode vs Graph Mode
In TF 1, you had to build a static graph first, then create a Session to run it. In TF 2, operations just work — like NumPy:
import tensorflow as tf
# TF 2: Eager execution (default) — results are immediate
a = tf.constant([[1, 2], [3, 4]])
b = tf.constant([[5, 6], [7, 8]])
# Operations execute immediately and return values
c = tf.add(a, b)
print(f"a + b = \n{c.numpy()}")
# [[6 8]
# [10 12]]
# You can use Python control flow naturally
x = tf.constant(10.0)
if x > 5:
print("x is greater than 5 — this works in eager mode!")
# Verify eager mode is active
print(f"Eager execution enabled: {tf.executing_eagerly()}") # True
The beauty of eager execution is that you can use standard Python tools for debugging — print(), pdb, assert — and they all work as expected. When you need maximum performance, you can opt into graph compilation with @tf.function (covered later in this article).
Think of it this way: Eager mode is like a calculator — you type 2 + 3 and immediately see 5. Graph mode is like writing a recipe first (define all the steps), then cooking it all at once (execute the graph). TF 2 lets you work in "calculator mode" by default, and optionally compile into "recipe mode" for speed.
Tensor Operations
TensorFlow provides a rich library of operations for manipulating tensors. Most operations mirror NumPy's API, making the transition straightforward. Operations are executed on the device (CPU or GPU) where the tensor resides.
Arithmetic & Matrix Operations
Basic arithmetic uses Python operators or explicit TF functions. For matrix multiplication, use tf.matmul() or the @ operator. These are the building blocks of every neural network:
import tensorflow as tf
# Element-wise arithmetic
a = tf.constant([1.0, 2.0, 3.0, 4.0])
b = tf.constant([10.0, 20.0, 30.0, 40.0])
print("Add:", (a + b).numpy()) # [11. 22. 33. 44.]
print("Subtract:", (b - a).numpy()) # [ 9. 18. 27. 36.]
print("Multiply:", (a * b).numpy()) # [10. 40. 90. 160.]
print("Divide:", (b / a).numpy()) # [10. 10. 10. 10.]
print("Power:", (a ** 2).numpy()) # [1. 4. 9. 16.]
# Matrix multiplication
m1 = tf.constant([[1, 2], [3, 4]], dtype=tf.float32)
m2 = tf.constant([[5, 6], [7, 8]], dtype=tf.float32)
# Two equivalent ways to multiply matrices
result1 = tf.matmul(m1, m2)
result2 = m1 @ m2 # Python @ operator (preferred)
print(f"Matrix multiplication:\n{result1.numpy()}")
# [[19 22]
# [43 50]]
# Reduction operations
x = tf.constant([[1.0, 2.0, 3.0],
[4.0, 5.0, 6.0]])
print("Sum (all):", tf.reduce_sum(x).numpy()) # 21.0
print("Mean (axis=0):", tf.reduce_mean(x, axis=0).numpy()) # [2.5 3.5 4.5]
print("Max (axis=1):", tf.reduce_max(x, axis=1).numpy()) # [3. 6.]
Reshaping, Slicing & Manipulation
Reshaping tensors is essential for connecting different layers in a neural network. The key functions are tf.reshape(), tf.expand_dims(), tf.squeeze(), and standard Python slicing. Remember: TensorFlow tensors are immutable, so these operations always return new tensors:
import tensorflow as tf
# Reshaping
t = tf.constant([[1, 2, 3, 4],
[5, 6, 7, 8]])
print(f"Original shape: {t.shape}") # (2, 4)
reshaped = tf.reshape(t, [4, 2])
print(f"Reshaped to (4, 2):\n{reshaped.numpy()}")
flattened = tf.reshape(t, [-1]) # -1 infers the dimension
print(f"Flattened: {flattened.numpy()}") # [1 2 3 4 5 6 7 8]
# Expand and squeeze dimensions
v = tf.constant([1.0, 2.0, 3.0]) # shape: (3,)
expanded = tf.expand_dims(v, axis=0) # shape: (1, 3)
expanded2 = tf.expand_dims(v, axis=1) # shape: (3, 1)
print(f"Expanded axis=0: {expanded.shape}") # (1, 3)
print(f"Expanded axis=1: {expanded2.shape}") # (3, 1)
squeezed = tf.squeeze(expanded) # removes size-1 dims
print(f"Squeezed back: {squeezed.shape}") # (3,)
# Slicing and indexing (same as NumPy)
m = tf.constant([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(f"Row 0: {m[0].numpy()}") # [1 2 3]
print(f"Element [1,2]: {m[1, 2].numpy()}") # 6
print(f"Column 1: {m[:, 1].numpy()}") # [2 5 8]
print(f"Sub-matrix: \n{m[0:2, 1:3].numpy()}") # [[2 3] [5 6]]
# Concatenation and stacking
a = tf.constant([[1, 2], [3, 4]])
b = tf.constant([[5, 6], [7, 8]])
concat = tf.concat([a, b], axis=0) # stack vertically
print(f"Concat axis=0:\n{concat.numpy()}") # [[1,2],[3,4],[5,6],[7,8]]
stacked = tf.stack([a, b], axis=0) # creates new dimension
print(f"Stack shape: {stacked.shape}") # (2, 2, 2)
Broadcasting
Broadcasting allows TensorFlow to automatically expand smaller tensors to match larger ones during arithmetic, without copying data. The rules are identical to NumPy: dimensions are compared from right to left, and a dimension of size 1 is stretched to match:
import tensorflow as tf
# Scalar broadcasts to all elements
matrix = tf.constant([[1.0, 2.0], [3.0, 4.0]])
result = matrix + 10 # scalar 10 broadcasts to [[10,10],[10,10]]
print(f"Matrix + 10:\n{result.numpy()}")
# Vector broadcasts across rows
row_vector = tf.constant([100.0, 200.0]) # shape: (2,)
result = matrix + row_vector # broadcasts to each row
print(f"Matrix + row_vector:\n{result.numpy()}")
# [[101. 202.]
# [103. 204.]]
# Column vector broadcasts across columns
col_vector = tf.constant([[10.0], [20.0]]) # shape: (2, 1)
result = matrix + col_vector
print(f"Matrix + col_vector:\n{result.numpy()}")
# [[11. 12.]
# [23. 24.]]
NumPy Interoperability
TensorFlow and NumPy work together seamlessly. You can convert between the two freely, and TensorFlow operations will automatically accept NumPy arrays as inputs. This makes it easy to integrate TensorFlow into existing NumPy-based workflows:
import tensorflow as tf
import numpy as np
# NumPy array → TensorFlow tensor
np_array = np.array([1.0, 2.0, 3.0])
tf_tensor = tf.constant(np_array)
print(f"From NumPy: {tf_tensor}")
print(f"Dtype: {tf_tensor.dtype}") # tf.float64 (NumPy default)
# TensorFlow tensor → NumPy array
back_to_numpy = tf_tensor.numpy()
print(f"Back to NumPy: {back_to_numpy}")
print(f"Type: {type(back_to_numpy)}") # <class 'numpy.ndarray'>
# TF operations accept NumPy arrays directly
np_a = np.array([[1, 2], [3, 4]], dtype=np.float32)
np_b = np.array([[5, 6], [7, 8]], dtype=np.float32)
result = tf.matmul(np_a, np_b) # NumPy arrays accepted directly
print(f"TF matmul on NumPy arrays:\n{result.numpy()}")
# NumPy operations accept TF tensors (auto-converted)
tf_vals = tf.constant([1.0, 4.0, 9.0])
np_result = np.sqrt(tf_vals) # TF tensor auto-converts
print(f"NumPy sqrt on TF tensor: {np_result}")
Device Awareness
One important difference: when a tensor is on a GPU, calling .numpy() copies it back to CPU memory. For large tensors, this transfer can be slow. Be mindful of unnecessary conversions in training loops.
Variables (tf.Variable)
While tf.constant creates immutable tensors, tf.Variable creates mutable tensors — these are used to store model parameters (weights and biases) that change during training. Variables are automatically tracked by GradientTape for computing gradients:
import tensorflow as tf
# Create variables — mutable tensors for model parameters
weight = tf.Variable(tf.random.normal([3, 2]), name="weight")
bias = tf.Variable(tf.zeros([2]), name="bias")
print(f"Weight:\n{weight.numpy()}")
print(f"Bias: {bias.numpy()}")
print(f"Weight name: {weight.name}")
# Variables are mutable — use .assign() to change values
bias.assign([1.0, 2.0])
print(f"After assign: {bias.numpy()}") # [1. 2.]
# In-place addition and subtraction
bias.assign_add([0.1, 0.1])
print(f"After assign_add: {bias.numpy()}") # [1.1 2.1]
bias.assign_sub([0.05, 0.05])
print(f"After assign_sub: {bias.numpy()}") # [1.05 2.05]
# Key difference: tf.constant is immutable
const = tf.constant([1.0, 2.0])
# const.assign([3.0, 4.0]) # ERROR: AttributeError
print(f"Constants cannot be modified: {const.numpy()}")
When to Use Variables vs Constants
The rule is simple: use tf.Variable for anything that needs to change (model weights, optimiser state), and tf.constant for fixed data (input features, hyperparameters). GradientTape automatically watches variables but requires explicit tape.watch() for constants.
Automatic Differentiation (GradientTape)
tf.GradientTape is TensorFlow's engine for automatic differentiation — the algorithm that makes neural network training possible. It records all operations performed on watched tensors inside a context manager, then computes gradients of a target (loss) with respect to sources (parameters) via the chain rule.
flowchart LR
A["1. Enter tape context
with tf.GradientTape()"] --> B["2. Forward pass
Record operations"]
B --> C["3. Compute loss
Scalar output"]
C --> D["4. tape.gradient()
Backpropagation"]
D --> E["5. Update weights
w -= lr × grad"]
style A fill:#132440,stroke:#3B9797,color:#ffffff
style B fill:#16476A,stroke:#3B9797,color:#ffffff
style C fill:#3B9797,stroke:#132440,color:#ffffff
style D fill:#BF092F,stroke:#132440,color:#ffffff
style E fill:#132440,stroke:#3B9797,color:#ffffff
GradientTape Basics
The simplest use: compute the derivative of a function. GradientTape automatically watches tf.Variable objects. To watch a tf.constant, you must call tape.watch() explicitly:
import tensorflow as tf
# Example 1: Gradient of y = x^2 at x = 3.0
x = tf.Variable(3.0)
with tf.GradientTape() as tape:
y = x ** 2 # y = x²
# dy/dx = 2x = 2 * 3 = 6
grad = tape.gradient(y, x)
print(f"y = x², dy/dx at x=3: {grad.numpy()}") # 6.0
# Example 2: Watching a constant (must be explicit)
x_const = tf.constant(4.0)
with tf.GradientTape() as tape:
tape.watch(x_const) # Required for constants
y = x_const ** 3 # y = x³
# dy/dx = 3x² = 3 * 16 = 48
grad = tape.gradient(y, x_const)
print(f"y = x³, dy/dx at x=4: {grad.numpy()}") # 48.0
The mathematical gradient for $y = x^2$ is $\frac{dy}{dx} = 2x$. At $x = 3$, this gives $2 \times 3 = 6$. TensorFlow computes this automatically — no manual calculus needed.
Persistent Tapes
By default, a GradientTape releases its resources after a single gradient() call. If you need to compute multiple gradients from the same recorded operations (e.g., gradients for both weights and biases), use persistent=True:
import tensorflow as tf
x = tf.Variable(2.0)
# Persistent tape allows multiple gradient calls
with tf.GradientTape(persistent=True) as tape:
y = x ** 2 # y = x²
z = x ** 3 # z = x³
# Compute gradients for both y and z
dy_dx = tape.gradient(y, x)
dz_dx = tape.gradient(z, x)
print(f"dy/dx = 2x at x=2: {dy_dx.numpy()}") # 4.0
print(f"dz/dx = 3x² at x=2: {dz_dx.numpy()}") # 12.0
# IMPORTANT: delete persistent tape to free resources
del tape
Higher-Order Gradients
You can nest GradientTape contexts to compute higher-order derivatives — second derivatives, third derivatives, and beyond. This is useful for advanced optimization methods and physics-informed neural networks:
import tensorflow as tf
x = tf.Variable(3.0)
# Nested tapes for second derivative
with tf.GradientTape() as outer_tape:
with tf.GradientTape() as inner_tape:
y = x ** 3 # y = x³
# First derivative: dy/dx = 3x²
dy_dx = inner_tape.gradient(y, x)
# Second derivative: d²y/dx² = 6x
d2y_dx2 = outer_tape.gradient(dy_dx, x)
print(f"y = x³ at x=3")
print(f" dy/dx = 3x² = {dy_dx.numpy()}") # 27.0
print(f" d²y/dx² = 6x = {d2y_dx2.numpy()}") # 18.0
Practical Example: Linear Regression from Scratch
Let's put everything together and build a simple linear regression model using raw TensorFlow operations — no Keras, no pre-built layers. We want to learn the parameters $w$ and $b$ in the equation $\hat{y} = wx + b$ by minimizing the mean squared error loss:
$$L = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$
The gradients we need are:
$$\frac{\partial L}{\partial w} = \frac{1}{n}\sum_{i=1}^{n} 2(wx_i + b - y_i) \cdot x_i$$
import tensorflow as tf
import numpy as np
# Generate synthetic data: y = 3x + 1 + noise
np.random.seed(42)
X_data = np.random.randn(100).astype(np.float32)
Y_data = 3.0 * X_data + 1.0 + np.random.randn(100).astype(np.float32) * 0.3
# Convert to tensors
X = tf.constant(X_data)
Y = tf.constant(Y_data)
# Learnable parameters (start with random values)
w = tf.Variable(0.0, name="weight")
b = tf.Variable(0.0, name="bias")
# Hyperparameters
learning_rate = 0.1
epochs = 100
# Training loop
for epoch in range(epochs):
with tf.GradientTape() as tape:
# Forward pass: y_hat = wx + b
Y_pred = w * X + b
# Compute MSE loss
loss = tf.reduce_mean((Y - Y_pred) ** 2)
# Compute gradients
gradients = tape.gradient(loss, [w, b])
# Manual gradient descent update
w.assign_sub(learning_rate * gradients[0])
b.assign_sub(learning_rate * gradients[1])
if (epoch + 1) % 20 == 0:
print(f"Epoch {epoch+1:3d} | Loss: {loss.numpy():.4f} | w: {w.numpy():.4f} | b: {b.numpy():.4f}")
print(f"\nLearned: y = {w.numpy():.2f}x + {b.numpy():.2f}")
print(f"True: y = 3.00x + 1.00")
This is the essence of deep learning training: compute a forward pass, calculate the loss, use GradientTape to find gradients, and update the parameters. In Part 2, Keras will handle all of this for you with a single model.fit() call — but understanding the mechanics here is crucial.
What just happened? We started with random parameters ($w = 0$, $b = 0$) and iteratively adjusted them to minimize the loss. After 100 epochs, $w$ converges to ≈3.0 and $b$ to ≈1.0 — exactly the true values from our synthetic data. This is gradient descent in action: follow the slope of the loss surface downhill until you reach the minimum.
GPU Acceleration
TensorFlow automatically places tensors and operations on available GPUs. A single training step that takes 10 seconds on CPU might complete in 100 milliseconds on a modern GPU — a 100× speedup. Here's how to detect, configure, and use GPU devices:
Device Detection
Always check what hardware TensorFlow can see. This is the first thing you should run in any notebook or training script:
import tensorflow as tf
# List all available devices
print("All physical devices:")
for device in tf.config.list_physical_devices():
print(f" {device.device_type}: {device.name}")
# Specifically check for GPUs
gpus = tf.config.list_physical_devices('GPU')
print(f"\nGPUs found: {len(gpus)}")
# Check where a tensor is placed
x = tf.constant([1.0, 2.0, 3.0])
print(f"Tensor device: {x.device}")
# Check CUDA/cuDNN versions (if GPU available)
if gpus:
print(f"Built with CUDA: {tf.test.is_built_with_cuda()}")
print(f"GPU available: {tf.test.is_gpu_available()}")
Device Placement
TensorFlow places operations on the GPU automatically when available, but you can explicitly control placement with tf.device(). This is useful when you want to keep certain operations on the CPU (e.g., data preprocessing):
import tensorflow as tf
# Explicit device placement
with tf.device('/CPU:0'):
cpu_tensor = tf.random.normal([1000, 1000])
print(f"CPU tensor device: {cpu_tensor.device}")
# If GPU is available, place on GPU
if tf.config.list_physical_devices('GPU'):
with tf.device('/GPU:0'):
gpu_tensor = tf.random.normal([1000, 1000])
print(f"GPU tensor device: {gpu_tensor.device}")
# Operations between devices cause automatic transfer
# TF handles this, but it adds overhead
result = gpu_tensor + 1
print(f"Result device: {result.device}")
else:
print("No GPU available — running on CPU")
# All code still works, just slower
Memory Configuration
By default, TensorFlow allocates all available GPU memory upfront — even if your model only needs a fraction. This prevents other programs from using the GPU. Enable memory growth to allocate memory incrementally:
import tensorflow as tf
# IMPORTANT: Configure memory growth BEFORE any TF operations
gpus = tf.config.list_physical_devices('GPU')
if gpus:
try:
# Option 1: Enable memory growth (allocate as needed)
for gpu in gpus:
tf.config.experimental.set_memory_growth(gpu, True)
print("Memory growth enabled for all GPUs")
# Option 2: Set a hard memory limit (e.g., 4 GB)
# tf.config.set_logical_device_configuration(
# gpus[0],
# [tf.config.LogicalDeviceConfiguration(memory_limit=4096)]
# )
except RuntimeError as e:
# Memory growth must be set before GPUs are initialized
print(f"Error: {e}")
else:
print("No GPUs — memory configuration not needed")
set_memory_growth(True) at the very start of your script — before creating any tensors or operations. This prevents TensorFlow from hogging all GPU memory and allows you to run multiple notebooks or experiments simultaneously.
tf.function & Graph Compilation
While eager execution is great for development and debugging, graph execution is faster for production. The @tf.function decorator compiles a Python function into an optimized TensorFlow graph — giving you the performance of TF 1's graph mode with the simplicity of TF 2's eager API:
import tensorflow as tf
import time
# Regular Python function (runs eagerly)
def eager_square(x):
return x ** 2
# Compiled graph function (traced and optimized)
@tf.function
def graph_square(x):
return x ** 2
# Both produce the same result
x = tf.constant(4.0)
print(f"Eager: {eager_square(x).numpy()}") # 16.0
print(f"Graph: {graph_square(x).numpy()}") # 16.0
# Performance comparison on a larger operation
@tf.function
def graph_matmul(a, b):
return tf.matmul(a, b)
a = tf.random.normal([500, 500])
b = tf.random.normal([500, 500])
# Warm-up (first call triggers tracing)
_ = graph_matmul(a, b)
# Time eager vs graph
start = time.time()
for _ in range(1000):
tf.matmul(a, b)
eager_time = time.time() - start
start = time.time()
for _ in range(1000):
graph_matmul(a, b)
graph_time = time.time() - start
print(f"\nEager: {eager_time:.3f}s")
print(f"Graph: {graph_time:.3f}s")
print(f"Speedup: {eager_time/graph_time:.1f}x")
Tracing & Retracing
When you call a @tf.function for the first time, TensorFlow traces it — running the Python code once to build a graph, then using that graph for subsequent calls. Be aware that TF will retrace if you call the function with different input shapes or types:
import tensorflow as tf
@tf.function
def my_func(x):
print("Tracing!") # Only prints during tracing, not execution
return x + 1
# First call with float32 — triggers tracing
result1 = my_func(tf.constant(1.0))
print(f"Result 1: {result1.numpy()}") # Prints "Tracing!" then "Result 1: 2.0"
# Second call with same type — reuses graph (no retracing)
result2 = my_func(tf.constant(2.0))
print(f"Result 2: {result2.numpy()}") # Only prints "Result 2: 3.0"
# Call with different type (int32) — triggers NEW trace
result3 = my_func(tf.constant(3))
print(f"Result 3: {result3.numpy()}") # Prints "Tracing!" then "Result 3: 4"
# Use input_signature to prevent retracing
@tf.function(input_signature=[tf.TensorSpec(shape=None, dtype=tf.float32)])
def fixed_func(x):
print("Tracing fixed_func!")
return x * 2
print(fixed_func(tf.constant(5.0)).numpy()) # 10.0 — traces once
print(fixed_func(tf.constant(10.0)).numpy()) # 20.0 — reuses graph
When to Use @tf.function
Use @tf.function on your training step function and any compute-intensive function that gets called repeatedly. Don't use it everywhere — the tracing overhead can actually slow down simple operations:
- Use @tf.function for: training steps, inference functions, any function called 100+ times with the same input shapes
- Don't use @tf.function for: one-time setup code, data preprocessing, debugging (use eager mode instead)
- Avoid Python side effects inside @tf.function:
print(), list mutations, and global variable changes only happen during tracing, not during execution
Conclusion & Next Steps
You now have a solid foundation in TensorFlow's core primitives. Let's recap what we covered:
- Tensors are the fundamental data structure — immutable, GPU-ready multi-dimensional arrays
- Eager execution lets you write and debug TensorFlow code like regular Python
- Tensor operations include arithmetic, matrix multiplication, reshaping, slicing, and broadcasting
- NumPy interoperability makes it easy to integrate TF into existing workflows
- tf.Variable stores mutable state for model parameters
- GradientTape computes gradients automatically — the engine behind all neural network training
- GPU acceleration and memory configuration for faster computation
- @tf.function compiles Python functions into optimized graphs for production speed
Next in the Series
In Part 2: Building Models with Keras, we'll use TensorFlow's high-level Keras API to build neural networks with just a few lines of code — Sequential models, the Functional API, custom layers, and model subclassing.