Part 1: Tensors, Eager Execution & Autodiff

What Is TensorFlow?

TensorFlow is Google's open-source platform for machine learning and deep learning. Originally released in 2015 by the Google Brain team, it has evolved into one of the two dominant deep learning frameworks — alongside PyTorch. TensorFlow 2 (released in 2019) was a major rewrite that brought eager execution by default, tight Keras integration, and a dramatically simpler API.

At its core, TensorFlow provides three fundamental capabilities:

Tensor computation — multi-dimensional arrays with GPU/TPU acceleration
Automatic differentiation — tf.GradientTape records operations and computes gradients automatically
Production deployment — a complete ecosystem for serving, mobile, web, and edge deployment

                            
                            Key Insight: TensorFlow 2 is eager by default, meaning operations execute immediately and return concrete values — just like regular Python and NumPy. You can still compile functions into optimized graphs using the @tf.function decorator when you need maximum performance, giving you the best of both worlds.
                        

TensorFlow 1 vs TensorFlow 2

If you've heard that TensorFlow is "hard to learn," that reputation comes from TF 1's graph-first design. TF 2 is a completely different experience:

Feature	TensorFlow 1.x	TensorFlow 2.x
Execution	Graph mode (build then run)	Eager by default (immediate)
Sessions	`tf.Session().run()` required	No sessions needed
API	Low-level, verbose, multiple APIs	Clean Keras-based high-level API
Debugging	Difficult (deferred execution)	Standard Python debugger works
Variables	Global collections, `tf.get_variable()`	Simple `tf.Variable()`
Graph optimization	Automatic	Opt-in via `@tf.function`

The TensorFlow Ecosystem

TensorFlow's greatest strength is its comprehensive production ecosystem. No other framework offers this breadth of deployment options:

TensorFlow Ecosystem Overview

flowchart TD
    A["TensorFlow Core
Tensors + GradientTape"] --> B["Keras
High-Level Model API"]
    A --> C["tf.data
Data Pipelines"]
    A --> D["TF Hub
Pretrained Models"]
    A --> E["TFX
ML Pipelines (Prod)"]
    A --> F["TF Lite
Mobile & Edge"]
    A --> G["TF.js
Browser ML"]
    A --> H["TF Serving
Model Serving"]
    A --> I["TensorBoard
Visualization"]

    style A fill:#132440,stroke:#3B9797,color:#ffffff
    style B fill:#16476A,stroke:#3B9797,color:#ffffff
    style C fill:#16476A,stroke:#3B9797,color:#ffffff
    style D fill:#16476A,stroke:#3B9797,color:#ffffff
    style E fill:#3B9797,stroke:#132440,color:#ffffff
    style F fill:#3B9797,stroke:#132440,color:#ffffff
    style G fill:#3B9797,stroke:#132440,color:#ffffff
    style H fill:#3B9797,stroke:#132440,color:#ffffff
    style I fill:#3B9797,stroke:#132440,color:#ffffff

When to Choose TensorFlow vs PyTorch

Both are excellent frameworks. Your choice depends on your goals:

Consideration	Choose TensorFlow	Choose PyTorch
Production deployment	TF Serving, TFLite, TF.js — unmatched breadth	TorchServe, ONNX — growing rapidly
Mobile/edge/browser	TFLite + TF.js are mature and battle-tested	ExecuTorch is newer
Research	Used in some Google research	~75% of papers at NeurIPS/ICML
Google Cloud / TPU	First-class TPU support	TPU support via PyTorch/XLA
Enterprise ML pipelines	TFX is production-grade	Requires custom tooling
Learning curve	Easy with Keras, deeper for low-level	Pythonic and intuitive
Community	Large, especially in industry	Large, especially in academia

                            
                            Bottom Line: If you're deploying to mobile, browsers, or Google Cloud TPUs, TensorFlow has a clear edge. If you're doing cutting-edge research or prefer a more Pythonic API, PyTorch may be more natural. Many professionals learn both — the concepts transfer directly.
                        

Installation & Setup

TensorFlow installs via pip like any other Python package. A single command gives you CPU support. For GPU acceleration, you'll need compatible NVIDIA drivers and CUDA — but the default pip package now bundles GPU support automatically on supported systems.

CPU-Only Installation

For learning and development, CPU is sufficient. Install TensorFlow with a single pip command:

# Install TensorFlow (CPU + GPU unified package since TF 2.15+)
pip install tensorflow

# For a specific version
pip install tensorflow==2.16.1

GPU Installation

Since TensorFlow 2.15, the tensorflow pip package includes GPU support automatically on Linux. On Windows, you may still need tensorflow[and-cuda]:

# Linux: GPU support included in the default package
pip install tensorflow

# Windows: explicit CUDA bundle
pip install tensorflow[and-cuda]

# Verify CUDA availability after install
python -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"

Verifying Your Installation

After installing TensorFlow, run this quick sanity check to confirm everything works. This snippet prints your TF version, checks for GPU availability, and performs a simple tensor operation to verify eager execution:

import tensorflow as tf

# Print version and build info
print("TensorFlow version:", tf.__version__)
print("Eager execution:", tf.executing_eagerly())

# Check for GPU
gpus = tf.config.list_physical_devices('GPU')
print(f"GPUs available: {len(gpus)}")
for gpu in gpus:
    print(f"  - {gpu.name}: {gpu.device_type}")

# Quick tensor operation to verify everything works
x = tf.constant([1.0, 2.0, 3.0])
print("Tensor:", x)
print("Sum:", tf.reduce_sum(x).numpy())
# Expected: TensorFlow version: 2.x.x, Eager execution: True, Sum: 6.0

If you see Eager execution: True and a sum of 6.0, your installation is working correctly. The .numpy() call converts the TensorFlow tensor to a regular Python number — this only works in eager mode.

                            
                            Import Convention: The standard convention is import tensorflow as tf. Every TensorFlow tutorial, paper, and codebase uses this alias. You'll also frequently see from tensorflow import keras for the high-level API.
                        

Tensors: The Core Data Structure

A tensor is a multi-dimensional array — the fundamental data structure in TensorFlow. If you know NumPy, tensors will feel familiar. The key differences: TensorFlow tensors are immutable (you can't change values in-place), can run on GPUs and TPUs, and are tracked by the automatic differentiation engine.

Tensors are classified by their rank (number of dimensions):

Rank	Name	Example	Shape
0	Scalar	A single number: `42`	`()`
1	Vector	A list: `[1, 2, 3]`	`(3,)`
2	Matrix	A 2D grid: `[[1, 2], [3, 4]]`	`(2, 2)`
3+	N-D Tensor	Images, video, batches	`(batch, height, width, channels)`

Creating Tensors

TensorFlow provides several factory functions for creating tensors. The most common is tf.constant(), which creates an immutable tensor from a Python list or NumPy array. Here are the essential creation methods:

import tensorflow as tf

# Scalar (rank 0) — a single number
scalar = tf.constant(42)
print(f"Scalar: {scalar}, shape: {scalar.shape}, rank: {scalar.ndim}")

# Vector (rank 1) — a list of numbers
vector = tf.constant([1.0, 2.0, 3.0, 4.0])
print(f"Vector: {vector}, shape: {vector.shape}")

# Matrix (rank 2) — a 2D grid
matrix = tf.constant([[1, 2, 3],
                       [4, 5, 6]])
print(f"Matrix shape: {matrix.shape}")  # (2, 3)

# 3D Tensor (rank 3) — e.g., a batch of sequences
tensor_3d = tf.constant([[[1, 2], [3, 4]],
                          [[5, 6], [7, 8]]])
print(f"3D Tensor shape: {tensor_3d.shape}")  # (2, 2, 2)

# Zeros and ones
zeros = tf.zeros([3, 4])           # 3x4 matrix of zeros
ones = tf.ones([2, 3], dtype=tf.int32)  # 2x3 matrix of ones
print(f"Zeros shape: {zeros.shape}, Ones dtype: {ones.dtype}")

# Random tensors
normal = tf.random.normal([3, 3], mean=0.0, stddev=1.0)
uniform = tf.random.uniform([2, 4], minval=0, maxval=10)
print(f"Normal:\n{normal.numpy()}")
print(f"Uniform:\n{uniform.numpy()}")

Notice how each tensor has a shape (dimensions), dtype (data type), and ndim (rank). The .numpy() method converts a tensor to a NumPy array for easy inspection.

Data Types & Casting

TensorFlow supports a wide range of data types. The default for floating-point numbers is tf.float32, which offers a good balance of precision and speed. You'll need to explicitly cast between types — TensorFlow won't do it automatically to prevent silent precision loss:

import tensorflow as tf

# Default float type is float32
a = tf.constant([1.0, 2.0, 3.0])
print(f"Default dtype: {a.dtype}")  # tf.float32

# Explicit dtype specification
b = tf.constant([1, 2, 3], dtype=tf.float64)
c = tf.constant([True, False, True], dtype=tf.bool)
d = tf.constant(["hello", "tensorflow"], dtype=tf.string)
print(f"float64: {b.dtype}, bool: {c.dtype}, string: {d.dtype}")

# Casting between types
int_tensor = tf.constant([1, 2, 3])
float_tensor = tf.cast(int_tensor, dtype=tf.float32)
print(f"Cast int to float: {float_tensor}")

# Common dtypes: tf.float16, tf.float32, tf.float64,
#                tf.int8, tf.int16, tf.int32, tf.int64,
#                tf.bool, tf.string, tf.complex64

                            
                            Watch Out: Unlike NumPy, TensorFlow will not automatically upcast types when mixing int and float tensors. You'll get a TypeError. Always use tf.cast() to convert explicitly.
                        

Shapes & Ranks

Understanding tensor shapes is critical for building neural networks. Every layer expects inputs of a specific shape, and shape mismatches are the most common source of bugs. Here's how to inspect and manipulate shapes:

import tensorflow as tf

# Create a tensor and inspect its properties
t = tf.random.normal([2, 3, 4])

print(f"Shape: {t.shape}")           # (2, 3, 4)
print(f"Rank (ndim): {t.ndim}")      # 3
print(f"Dtype: {t.dtype}")           # tf.float32
print(f"Total elements: {tf.size(t).numpy()}")  # 24

# Shape as a tensor (useful inside tf.function)
print(f"tf.shape(): {tf.shape(t)}")  # [2 3 4]

# Individual dimensions
print(f"Dim 0: {t.shape[0]}")  # 2
print(f"Dim 1: {t.shape[1]}")  # 3
print(f"Dim 2: {t.shape[2]}")  # 4

Eager Execution

Eager execution means that TensorFlow operations execute immediately when called, returning concrete values instead of building a computational graph for later execution. This was a fundamental shift in TF 2 — making TensorFlow behave like regular Python code.

Eager Mode vs Graph Mode

In TF 1, you had to build a static graph first, then create a Session to run it. In TF 2, operations just work — like NumPy:

import tensorflow as tf

# TF 2: Eager execution (default) — results are immediate
a = tf.constant([[1, 2], [3, 4]])
b = tf.constant([[5, 6], [7, 8]])

# Operations execute immediately and return values
c = tf.add(a, b)
print(f"a + b = \n{c.numpy()}")
# [[6  8]
#  [10 12]]

# You can use Python control flow naturally
x = tf.constant(10.0)
if x > 5:
    print("x is greater than 5 — this works in eager mode!")

# Verify eager mode is active
print(f"Eager execution enabled: {tf.executing_eagerly()}")  # True

The beauty of eager execution is that you can use standard Python tools for debugging — print(), pdb, assert — and they all work as expected. When you need maximum performance, you can opt into graph compilation with @tf.function (covered later in this article).

Concept Eager vs Graph Execution

Think of it this way: Eager mode is like a calculator — you type 2 + 3 and immediately see 5. Graph mode is like writing a recipe first (define all the steps), then cooking it all at once (execute the graph). TF 2 lets you work in "calculator mode" by default, and optionally compile into "recipe mode" for speed.

Eager Mode Graph Mode tf.function

Tensor Operations

TensorFlow provides a rich library of operations for manipulating tensors. Most operations mirror NumPy's API, making the transition straightforward. Operations are executed on the device (CPU or GPU) where the tensor resides.

Arithmetic & Matrix Operations

Basic arithmetic uses Python operators or explicit TF functions. For matrix multiplication, use tf.matmul() or the @ operator. These are the building blocks of every neural network:

import tensorflow as tf

# Element-wise arithmetic
a = tf.constant([1.0, 2.0, 3.0, 4.0])
b = tf.constant([10.0, 20.0, 30.0, 40.0])

print("Add:", (a + b).numpy())       # [11. 22. 33. 44.]
print("Subtract:", (b - a).numpy())  # [ 9. 18. 27. 36.]
print("Multiply:", (a * b).numpy())  # [10. 40. 90. 160.]
print("Divide:", (b / a).numpy())    # [10. 10. 10. 10.]
print("Power:", (a ** 2).numpy())    # [1. 4. 9. 16.]

# Matrix multiplication
m1 = tf.constant([[1, 2], [3, 4]], dtype=tf.float32)
m2 = tf.constant([[5, 6], [7, 8]], dtype=tf.float32)

# Two equivalent ways to multiply matrices
result1 = tf.matmul(m1, m2)
result2 = m1 @ m2  # Python @ operator (preferred)
print(f"Matrix multiplication:\n{result1.numpy()}")
# [[19 22]
#  [43 50]]

# Reduction operations
x = tf.constant([[1.0, 2.0, 3.0],
                  [4.0, 5.0, 6.0]])
print("Sum (all):", tf.reduce_sum(x).numpy())      # 21.0
print("Mean (axis=0):", tf.reduce_mean(x, axis=0).numpy())  # [2.5 3.5 4.5]
print("Max (axis=1):", tf.reduce_max(x, axis=1).numpy())    # [3. 6.]

Reshaping, Slicing & Manipulation

Reshaping tensors is essential for connecting different layers in a neural network. The key functions are tf.reshape(), tf.expand_dims(), tf.squeeze(), and standard Python slicing. Remember: TensorFlow tensors are immutable, so these operations always return new tensors:

import tensorflow as tf

# Reshaping
t = tf.constant([[1, 2, 3, 4],
                  [5, 6, 7, 8]])
print(f"Original shape: {t.shape}")  # (2, 4)

reshaped = tf.reshape(t, [4, 2])
print(f"Reshaped to (4, 2):\n{reshaped.numpy()}")

flattened = tf.reshape(t, [-1])  # -1 infers the dimension
print(f"Flattened: {flattened.numpy()}")  # [1 2 3 4 5 6 7 8]

# Expand and squeeze dimensions
v = tf.constant([1.0, 2.0, 3.0])           # shape: (3,)
expanded = tf.expand_dims(v, axis=0)         # shape: (1, 3)
expanded2 = tf.expand_dims(v, axis=1)        # shape: (3, 1)
print(f"Expanded axis=0: {expanded.shape}")  # (1, 3)
print(f"Expanded axis=1: {expanded2.shape}") # (3, 1)

squeezed = tf.squeeze(expanded)              # removes size-1 dims
print(f"Squeezed back: {squeezed.shape}")    # (3,)

# Slicing and indexing (same as NumPy)
m = tf.constant([[1, 2, 3],
                  [4, 5, 6],
                  [7, 8, 9]])
print(f"Row 0: {m[0].numpy()}")          # [1 2 3]
print(f"Element [1,2]: {m[1, 2].numpy()}")  # 6
print(f"Column 1: {m[:, 1].numpy()}")     # [2 5 8]
print(f"Sub-matrix: \n{m[0:2, 1:3].numpy()}")  # [[2 3] [5 6]]

# Concatenation and stacking
a = tf.constant([[1, 2], [3, 4]])
b = tf.constant([[5, 6], [7, 8]])
concat = tf.concat([a, b], axis=0)  # stack vertically
print(f"Concat axis=0:\n{concat.numpy()}")  # [[1,2],[3,4],[5,6],[7,8]]

stacked = tf.stack([a, b], axis=0)  # creates new dimension
print(f"Stack shape: {stacked.shape}")  # (2, 2, 2)

Broadcasting

Broadcasting allows TensorFlow to automatically expand smaller tensors to match larger ones during arithmetic, without copying data. The rules are identical to NumPy: dimensions are compared from right to left, and a dimension of size 1 is stretched to match:

import tensorflow as tf

# Scalar broadcasts to all elements
matrix = tf.constant([[1.0, 2.0], [3.0, 4.0]])
result = matrix + 10  # scalar 10 broadcasts to [[10,10],[10,10]]
print(f"Matrix + 10:\n{result.numpy()}")

# Vector broadcasts across rows
row_vector = tf.constant([100.0, 200.0])  # shape: (2,)
result = matrix + row_vector               # broadcasts to each row
print(f"Matrix + row_vector:\n{result.numpy()}")
# [[101. 202.]
#  [103. 204.]]

# Column vector broadcasts across columns
col_vector = tf.constant([[10.0], [20.0]])  # shape: (2, 1)
result = matrix + col_vector
print(f"Matrix + col_vector:\n{result.numpy()}")
# [[11. 12.]
#  [23. 24.]]

NumPy Interoperability

TensorFlow and NumPy work together seamlessly. You can convert between the two freely, and TensorFlow operations will automatically accept NumPy arrays as inputs. This makes it easy to integrate TensorFlow into existing NumPy-based workflows:

import tensorflow as tf
import numpy as np

# NumPy array → TensorFlow tensor
np_array = np.array([1.0, 2.0, 3.0])
tf_tensor = tf.constant(np_array)
print(f"From NumPy: {tf_tensor}")
print(f"Dtype: {tf_tensor.dtype}")  # tf.float64 (NumPy default)

# TensorFlow tensor → NumPy array
back_to_numpy = tf_tensor.numpy()
print(f"Back to NumPy: {back_to_numpy}")
print(f"Type: {type(back_to_numpy)}")  # <class 'numpy.ndarray'>

# TF operations accept NumPy arrays directly
np_a = np.array([[1, 2], [3, 4]], dtype=np.float32)
np_b = np.array([[5, 6], [7, 8]], dtype=np.float32)
result = tf.matmul(np_a, np_b)  # NumPy arrays accepted directly
print(f"TF matmul on NumPy arrays:\n{result.numpy()}")

# NumPy operations accept TF tensors (auto-converted)
tf_vals = tf.constant([1.0, 4.0, 9.0])
np_result = np.sqrt(tf_vals)  # TF tensor auto-converts
print(f"NumPy sqrt on TF tensor: {np_result}")

Device Awareness

One important difference: when a tensor is on a GPU, calling .numpy() copies it back to CPU memory. For large tensors, this transfer can be slow. Be mindful of unnecessary conversions in training loops.

Variables (tf.Variable)

While tf.constant creates immutable tensors, tf.Variable creates mutable tensors — these are used to store model parameters (weights and biases) that change during training. Variables are automatically tracked by GradientTape for computing gradients:

import tensorflow as tf

# Create variables — mutable tensors for model parameters
weight = tf.Variable(tf.random.normal([3, 2]), name="weight")
bias = tf.Variable(tf.zeros([2]), name="bias")

print(f"Weight:\n{weight.numpy()}")
print(f"Bias: {bias.numpy()}")
print(f"Weight name: {weight.name}")

# Variables are mutable — use .assign() to change values
bias.assign([1.0, 2.0])
print(f"After assign: {bias.numpy()}")  # [1. 2.]

# In-place addition and subtraction
bias.assign_add([0.1, 0.1])
print(f"After assign_add: {bias.numpy()}")  # [1.1 2.1]

bias.assign_sub([0.05, 0.05])
print(f"After assign_sub: {bias.numpy()}")  # [1.05 2.05]

# Key difference: tf.constant is immutable
const = tf.constant([1.0, 2.0])
# const.assign([3.0, 4.0])  # ERROR: AttributeError
print(f"Constants cannot be modified: {const.numpy()}")

When to Use Variables vs Constants

The rule is simple: use tf.Variable for anything that needs to change (model weights, optimiser state), and tf.constant for fixed data (input features, hyperparameters). GradientTape automatically watches variables but requires explicit tape.watch() for constants.

Automatic Differentiation (GradientTape)

tf.GradientTape is TensorFlow's engine for automatic differentiation — the algorithm that makes neural network training possible. It records all operations performed on watched tensors inside a context manager, then computes gradients of a target (loss) with respect to sources (parameters) via the chain rule.

GradientTape Computation Flow

flowchart LR
    A["1. Enter tape context
with tf.GradientTape()"] --> B["2. Forward pass
Record operations"]
    B --> C["3. Compute loss
Scalar output"]
    C --> D["4. tape.gradient()
Backpropagation"]
    D --> E["5. Update weights
w -= lr × grad"]

    style A fill:#132440,stroke:#3B9797,color:#ffffff
    style B fill:#16476A,stroke:#3B9797,color:#ffffff
    style C fill:#3B9797,stroke:#132440,color:#ffffff
    style D fill:#BF092F,stroke:#132440,color:#ffffff
    style E fill:#132440,stroke:#3B9797,color:#ffffff

GradientTape Basics

The simplest use: compute the derivative of a function. GradientTape automatically watches tf.Variable objects. To watch a tf.constant, you must call tape.watch() explicitly:

import tensorflow as tf

# Example 1: Gradient of y = x^2 at x = 3.0
x = tf.Variable(3.0)

with tf.GradientTape() as tape:
    y = x ** 2  # y = x²

# dy/dx = 2x = 2 * 3 = 6
grad = tape.gradient(y, x)
print(f"y = x², dy/dx at x=3: {grad.numpy()}")  # 6.0

# Example 2: Watching a constant (must be explicit)
x_const = tf.constant(4.0)

with tf.GradientTape() as tape:
    tape.watch(x_const)  # Required for constants
    y = x_const ** 3     # y = x³

# dy/dx = 3x² = 3 * 16 = 48
grad = tape.gradient(y, x_const)
print(f"y = x³, dy/dx at x=4: {grad.numpy()}")  # 48.0

The mathematical gradient for $y = x^2$ is $\frac{dy}{dx} = 2x$. At $x = 3$, this gives $2 \times 3 = 6$. TensorFlow computes this automatically — no manual calculus needed.

Persistent Tapes

By default, a GradientTape releases its resources after a single gradient() call. If you need to compute multiple gradients from the same recorded operations (e.g., gradients for both weights and biases), use persistent=True:

import tensorflow as tf

x = tf.Variable(2.0)

# Persistent tape allows multiple gradient calls
with tf.GradientTape(persistent=True) as tape:
    y = x ** 2    # y = x²
    z = x ** 3    # z = x³

# Compute gradients for both y and z
dy_dx = tape.gradient(y, x)
dz_dx = tape.gradient(z, x)
print(f"dy/dx = 2x at x=2: {dy_dx.numpy()}")  # 4.0
print(f"dz/dx = 3x² at x=2: {dz_dx.numpy()}")  # 12.0

# IMPORTANT: delete persistent tape to free resources
del tape

Higher-Order Gradients

You can nest GradientTape contexts to compute higher-order derivatives — second derivatives, third derivatives, and beyond. This is useful for advanced optimization methods and physics-informed neural networks:

import tensorflow as tf

x = tf.Variable(3.0)

# Nested tapes for second derivative
with tf.GradientTape() as outer_tape:
    with tf.GradientTape() as inner_tape:
        y = x ** 3  # y = x³
    # First derivative: dy/dx = 3x²
    dy_dx = inner_tape.gradient(y, x)

# Second derivative: d²y/dx² = 6x
d2y_dx2 = outer_tape.gradient(dy_dx, x)

print(f"y = x³ at x=3")
print(f"  dy/dx  = 3x²  = {dy_dx.numpy()}")    # 27.0
print(f"  d²y/dx² = 6x  = {d2y_dx2.numpy()}")  # 18.0

Practical Example: Linear Regression from Scratch

Let's put everything together and build a simple linear regression model using raw TensorFlow operations — no Keras, no pre-built layers. We want to learn the parameters $w$ and $b$ in the equation $\hat{y} = wx + b$ by minimizing the mean squared error loss:

$$L = \frac{1}{n}\sum_{i=1}^{n}(y_i - \hat{y}_i)^2$$

The gradients we need are:

$$\frac{\partial L}{\partial w} = \frac{1}{n}\sum_{i=1}^{n} 2(wx_i + b - y_i) \cdot x_i$$

import tensorflow as tf
import numpy as np

# Generate synthetic data: y = 3x + 1 + noise
np.random.seed(42)
X_data = np.random.randn(100).astype(np.float32)
Y_data = 3.0 * X_data + 1.0 + np.random.randn(100).astype(np.float32) * 0.3

# Convert to tensors
X = tf.constant(X_data)
Y = tf.constant(Y_data)

# Learnable parameters (start with random values)
w = tf.Variable(0.0, name="weight")
b = tf.Variable(0.0, name="bias")

# Hyperparameters
learning_rate = 0.1
epochs = 100

# Training loop
for epoch in range(epochs):
    with tf.GradientTape() as tape:
        # Forward pass: y_hat = wx + b
        Y_pred = w * X + b

        # Compute MSE loss
        loss = tf.reduce_mean((Y - Y_pred) ** 2)

    # Compute gradients
    gradients = tape.gradient(loss, [w, b])

    # Manual gradient descent update
    w.assign_sub(learning_rate * gradients[0])
    b.assign_sub(learning_rate * gradients[1])

    if (epoch + 1) % 20 == 0:
        print(f"Epoch {epoch+1:3d} | Loss: {loss.numpy():.4f} | w: {w.numpy():.4f} | b: {b.numpy():.4f}")

print(f"\nLearned: y = {w.numpy():.2f}x + {b.numpy():.2f}")
print(f"True:    y = 3.00x + 1.00")

This is the essence of deep learning training: compute a forward pass, calculate the loss, use GradientTape to find gradients, and update the parameters. In Part 2, Keras will handle all of this for you with a single model.fit() call — but understanding the mechanics here is crucial.

Hands-On Linear Regression from Scratch

What just happened? We started with random parameters ($w = 0$, $b = 0$) and iteratively adjusted them to minimize the loss. After 100 epochs, $w$ converges to ≈3.0 and $b$ to ≈1.0 — exactly the true values from our synthetic data. This is gradient descent in action: follow the slope of the loss surface downhill until you reach the minimum.

Gradient Descent MSE Loss GradientTape

GPU Acceleration

TensorFlow automatically places tensors and operations on available GPUs. A single training step that takes 10 seconds on CPU might complete in 100 milliseconds on a modern GPU — a 100× speedup. Here's how to detect, configure, and use GPU devices:

Device Detection

Always check what hardware TensorFlow can see. This is the first thing you should run in any notebook or training script:

import tensorflow as tf

# List all available devices
print("All physical devices:")
for device in tf.config.list_physical_devices():
    print(f"  {device.device_type}: {device.name}")

# Specifically check for GPUs
gpus = tf.config.list_physical_devices('GPU')
print(f"\nGPUs found: {len(gpus)}")

# Check where a tensor is placed
x = tf.constant([1.0, 2.0, 3.0])
print(f"Tensor device: {x.device}")

# Check CUDA/cuDNN versions (if GPU available)
if gpus:
    print(f"Built with CUDA: {tf.test.is_built_with_cuda()}")
    print(f"GPU available: {tf.test.is_gpu_available()}")

Device Placement

TensorFlow places operations on the GPU automatically when available, but you can explicitly control placement with tf.device(). This is useful when you want to keep certain operations on the CPU (e.g., data preprocessing):

import tensorflow as tf

# Explicit device placement
with tf.device('/CPU:0'):
    cpu_tensor = tf.random.normal([1000, 1000])
    print(f"CPU tensor device: {cpu_tensor.device}")

# If GPU is available, place on GPU
if tf.config.list_physical_devices('GPU'):
    with tf.device('/GPU:0'):
        gpu_tensor = tf.random.normal([1000, 1000])
        print(f"GPU tensor device: {gpu_tensor.device}")

        # Operations between devices cause automatic transfer
        # TF handles this, but it adds overhead
        result = gpu_tensor + 1
        print(f"Result device: {result.device}")
else:
    print("No GPU available — running on CPU")
    # All code still works, just slower

Memory Configuration

By default, TensorFlow allocates all available GPU memory upfront — even if your model only needs a fraction. This prevents other programs from using the GPU. Enable memory growth to allocate memory incrementally:

import tensorflow as tf

# IMPORTANT: Configure memory growth BEFORE any TF operations
gpus = tf.config.list_physical_devices('GPU')

if gpus:
    try:
        # Option 1: Enable memory growth (allocate as needed)
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        print("Memory growth enabled for all GPUs")

        # Option 2: Set a hard memory limit (e.g., 4 GB)
        # tf.config.set_logical_device_configuration(
        #     gpus[0],
        #     [tf.config.LogicalDeviceConfiguration(memory_limit=4096)]
        # )

    except RuntimeError as e:
        # Memory growth must be set before GPUs are initialized
        print(f"Error: {e}")
else:
    print("No GPUs — memory configuration not needed")

                            
                            Best Practice: Always call set_memory_growth(True) at the very start of your script — before creating any tensors or operations. This prevents TensorFlow from hogging all GPU memory and allows you to run multiple notebooks or experiments simultaneously.
                        

tf.function & Graph Compilation

While eager execution is great for development and debugging, graph execution is faster for production. The @tf.function decorator compiles a Python function into an optimized TensorFlow graph — giving you the performance of TF 1's graph mode with the simplicity of TF 2's eager API:

import tensorflow as tf
import time

# Regular Python function (runs eagerly)
def eager_square(x):
    return x ** 2

# Compiled graph function (traced and optimized)
@tf.function
def graph_square(x):
    return x ** 2

# Both produce the same result
x = tf.constant(4.0)
print(f"Eager: {eager_square(x).numpy()}")   # 16.0
print(f"Graph: {graph_square(x).numpy()}")   # 16.0

# Performance comparison on a larger operation
@tf.function
def graph_matmul(a, b):
    return tf.matmul(a, b)

a = tf.random.normal([500, 500])
b = tf.random.normal([500, 500])

# Warm-up (first call triggers tracing)
_ = graph_matmul(a, b)

# Time eager vs graph
start = time.time()
for _ in range(1000):
    tf.matmul(a, b)
eager_time = time.time() - start

start = time.time()
for _ in range(1000):
    graph_matmul(a, b)
graph_time = time.time() - start

print(f"\nEager: {eager_time:.3f}s")
print(f"Graph: {graph_time:.3f}s")
print(f"Speedup: {eager_time/graph_time:.1f}x")

Tracing & Retracing

When you call a @tf.function for the first time, TensorFlow traces it — running the Python code once to build a graph, then using that graph for subsequent calls. Be aware that TF will retrace if you call the function with different input shapes or types:

import tensorflow as tf

@tf.function
def my_func(x):
    print("Tracing!")  # Only prints during tracing, not execution
    return x + 1

# First call with float32 — triggers tracing
result1 = my_func(tf.constant(1.0))
print(f"Result 1: {result1.numpy()}")  # Prints "Tracing!" then "Result 1: 2.0"

# Second call with same type — reuses graph (no retracing)
result2 = my_func(tf.constant(2.0))
print(f"Result 2: {result2.numpy()}")  # Only prints "Result 2: 3.0"

# Call with different type (int32) — triggers NEW trace
result3 = my_func(tf.constant(3))
print(f"Result 3: {result3.numpy()}")  # Prints "Tracing!" then "Result 3: 4"

# Use input_signature to prevent retracing
@tf.function(input_signature=[tf.TensorSpec(shape=None, dtype=tf.float32)])
def fixed_func(x):
    print("Tracing fixed_func!")
    return x * 2

print(fixed_func(tf.constant(5.0)).numpy())   # 10.0 — traces once
print(fixed_func(tf.constant(10.0)).numpy())  # 20.0 — reuses graph

When to Use @tf.function

Use @tf.function on your training step function and any compute-intensive function that gets called repeatedly. Don't use it everywhere — the tracing overhead can actually slow down simple operations:

                            
                            Rule of Thumb:
                            Use @tf.function for: training steps, inference functions, any function called 100+ times with the same input shapes
Don't use @tf.function for: one-time setup code, data preprocessing, debugging (use eager mode instead)
Avoid Python side effects inside @tf.function: print(), list mutations, and global variable changes only happen during tracing, not during execution

Conclusion & Next Steps

You now have a solid foundation in TensorFlow's core primitives. Let's recap what we covered:

Tensors are the fundamental data structure — immutable, GPU-ready multi-dimensional arrays
Eager execution lets you write and debug TensorFlow code like regular Python
Tensor operations include arithmetic, matrix multiplication, reshaping, slicing, and broadcasting
NumPy interoperability makes it easy to integrate TF into existing workflows
tf.Variable stores mutable state for model parameters
GradientTape computes gradients automatically — the engine behind all neural network training
GPU acceleration and memory configuration for faster computation
@tf.function compiles Python functions into optimized graphs for production speed

Next in the Series

In Part 2: Building Models with Keras, we'll use TensorFlow's high-level Keras API to build neural networks with just a few lines of code — Sequential models, the Functional API, custom layers, and model subclassing.

Next Part 2: Building Models with Keras

Cookie Consent

Table of Contents

What Is TensorFlow?

TensorFlow 1 vs TensorFlow 2

The TensorFlow Ecosystem

When to Choose TensorFlow vs PyTorch

Installation & Setup

CPU-Only Installation

GPU Installation

Verifying Your Installation

Tensors: The Core Data Structure

Creating Tensors

Data Types & Casting

Shapes & Ranks

Eager Execution

Eager Mode vs Graph Mode

Tensor Operations

Arithmetic & Matrix Operations

Reshaping, Slicing & Manipulation

Broadcasting

NumPy Interoperability

Device Awareness

Variables (tf.Variable)

When to Use Variables vs Constants

Automatic Differentiation (GradientTape)

GradientTape Basics

Persistent Tapes

Higher-Order Gradients

Practical Example: Linear Regression from Scratch

GPU Acceleration

Device Detection

Device Placement

Memory Configuration

tf.function & Graph Compilation

Tracing & Retracing

When to Use @tf.function

Conclusion & Next Steps

Next in the Series

Cookie Consent

Part 1: Tensors, Eager Execution & Autodiff

Table of Contents

What Is TensorFlow?

TensorFlow 1 vs TensorFlow 2

The TensorFlow Ecosystem

When to Choose TensorFlow vs PyTorch

Installation & Setup

CPU-Only Installation

GPU Installation

Verifying Your Installation

Tensors: The Core Data Structure

Creating Tensors

Data Types & Casting

Shapes & Ranks

Eager Execution

Eager Mode vs Graph Mode

Tensor Operations

Arithmetic & Matrix Operations

Reshaping, Slicing & Manipulation

Broadcasting

NumPy Interoperability

Device Awareness

Variables (tf.Variable)

When to Use Variables vs Constants

Automatic Differentiation (GradientTape)

GradientTape Basics

Persistent Tapes

Higher-Order Gradients

Practical Example: Linear Regression from Scratch

GPU Acceleration

Device Detection

Device Placement

Memory Configuration

tf.function & Graph Compilation

Tracing & Retracing

When to Use @tf.function

Conclusion & Next Steps

Next in the Series

Related Articles in This Series

Part 2: Building Models with Keras

Part 3: Training & Optimization

Part 9: Deployment, Performance & Best Practices