Back to Technology

Python Data Science Series Part 3: Data Visualization

December 27, 2025 Wasil Zafar 20 min read

Master data visualization with Matplotlib and Seaborn. Learn to create compelling charts, customize plots, and tell data stories effectively.

Table of Contents

  1. Introduction to Data Visualization
  2. Matplotlib Fundamentals
    • Basic Plots
    • Understanding plt.plot() Parameters
    • Format Strings & Multiple Datasets
    • Understanding plt.subplots() Parameters
    • Shared Axes & Custom Ratios
  3. Plot Customization
    • Colors, Styles & Markers
    • Themes and Styles
    • Figure-Level Customization
    • Axis Control & Annotations
    • GridSpec for Complex Layouts
  4. Seaborn: Statistical Graphics
  5. Visualizing Distributions
  6. Relationships & Correlations
  7. Categorical Data Plots
  8. Best Practices & Next Steps

Introduction to Data Visualization

Prerequisites: Before running the code examples in this tutorial, make sure you have Python and Jupyter notebooks properly set up. If you haven't configured your development environment yet, check out our complete setup guide for VS Code, PyCharm, Jupyter, and Colab.

After loading and analyzing data with NumPy and Pandas, the next step is visualization—transforming numbers into visual insights that communicate patterns, trends, and anomalies.

Why Visualization Matters: "A picture is worth a thousand rows." Humans process visual information 60,000x faster than text. Good visualizations reveal insights that would take hours to discover in raw data.

The Visualization Ecosystem

  • Matplotlib: Low-level, highly customizable plotting library (the foundation)
  • Seaborn: High-level interface built on Matplotlib for statistical graphics
  • Other tools: Plotly (interactive), Bokeh (web), Altair (declarative)
# Installation
pip install matplotlib seaborn

# Import convention
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

Matplotlib Fundamentals

Matplotlib provides complete control over every element of a plot. It has two interfaces:

  • pyplot (MATLAB-style): Quick plotting with plt.plot()
  • Object-oriented (OO): Explicit figure and axes objects (recommended)

Basic Plots

import matplotlib.pyplot as plt
import numpy as np

# Line plot
x = np.linspace(0, 10, 100)
y = np.sin(x)

plt.plot(x, y)
plt.title('Sine Wave')
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.grid(True)
plt.show()

# Scatter plot
plt.scatter(x, y, alpha=0.5)
plt.show()

# Bar chart
categories = ['A', 'B', 'C', 'D']
values = [25, 40, 30, 55]
plt.bar(categories, values, color='teal')
plt.show()

Understanding plt.plot() Parameters

The plot() function is highly flexible with many parameters for customization:

# Complete signature:
# plot([x], y, [fmt], *, data=None, **kwargs)

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

# Basic usage - x and y explicitly
plt.plot(x, y)
plt.title('Basic: x and y explicitly')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True, alpha=0.3)
plt.show()
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

# Format strings [marker][line][color]
plt.plot(x, y, 'ro-', label='red circles, solid')
plt.plot(x, y - 0.5, 'g^--', label='green triangles, dashed')
plt.plot(x, y - 1, 'bs:', label='blue squares, dotted')
plt.title('Format Strings: [marker][line][color]')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

# Keyword arguments (full customization)
plt.plot(x, y, 
         color='#3B9797',           # Hex color
         linestyle='--',            # Dashed line
         linewidth=2.5,             # Line thickness
         marker='o',                # Circle markers
         markersize=8,              # Marker size
         markerfacecolor='red',     # Marker fill color
         markeredgecolor='black',   # Marker edge color
         markeredgewidth=1.5,       # Marker edge thickness
         alpha=0.7,                 # Transparency (0-1)
         label='Custom styled')     # Legend label
plt.title('Keyword Arguments: Full Customization')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
Format String Quick Reference:
  • Colors: 'b' (blue), 'g' (green), 'r' (red), 'c' (cyan), 'm' (magenta), 'y' (yellow), 'k' (black), 'w' (white)
  • Markers: 'o' (circle), 's' (square), '^' (triangle up), 'v' (triangle down), '*' (star), '+' (plus), 'x' (x), 'D' (diamond)
  • Lines: '-' (solid), '--' (dashed), '-.' (dash-dot), ':' (dotted)
  • Example: 'ro--' = red circles with dashed line

Plotting NumPy Arrays

import matplotlib.pyplot as plt
import numpy as np

# Direct plotting from NumPy arrays
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])

plt.plot(x, y, 'bo-', linewidth=2, label='y = 2x')
plt.title('Plotting NumPy Arrays')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.grid(True)
plt.show()

Plotting Pandas Series

import matplotlib.pyplot as plt
import pandas as pd

# Create a Pandas Series
s = pd.Series([10, 15, 13, 17, 20], 
              index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
              name='Temperature')

# Plot Series directly
plt.plot(s, marker='o', linestyle='--', color='red')
plt.title('Plotting Pandas Series')
plt.xlabel('Day')
plt.ylabel('Temperature (°C)')
plt.grid(True, alpha=0.3)
plt.show()
import matplotlib.pyplot as plt
import pandas as pd

# Multiple Series from DataFrame columns
df = pd.DataFrame({
    'sales': [100, 150, 120, 200, 180],
    'profit': [20, 35, 25, 50, 40],
}, index=['Q1', 'Q2', 'Q3', 'Q4', 'Q5'])

# Plot multiple columns
plt.plot(df.index, df['sales'], 'o-', label='Sales', linewidth=2)
plt.plot(df.index, df['profit'], 's--', label='Profit', linewidth=2)
plt.title('Sales vs Profit by Quarter')
plt.xlabel('Quarter')
plt.ylabel('Amount ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()

Plotting Pandas DataFrames

import matplotlib.pyplot as plt
import pandas as pd

# DataFrame with numeric columns
df = pd.DataFrame({
    'Product_A': [10, 12, 15, 20, 18],
    'Product_B': [8, 14, 16, 19, 22],
    'Product_C': [5, 9, 11, 15, 20]
}, index=['Week 1', 'Week 2', 'Week 3', 'Week 4', 'Week 5'])

# Plot all columns (each column becomes a line)
plt.figure(figsize=(10, 6))
for column in df.columns:
    plt.plot(df.index, df[column], marker='o', label=column, linewidth=2)

plt.title('Product Sales Over 5 Weeks')
plt.xlabel('Week')
plt.ylabel('Sales')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
import pandas as pd

# Using DataFrame.plot() method (convenient for DataFrame visualization)
df = pd.DataFrame({
    'Open': [100, 102, 101, 105, 107],
    'Close': [102, 101, 104, 106, 108],
}, index=['Day 1', 'Day 2', 'Day 3', 'Day 4', 'Day 5'])

# Built-in DataFrame plotting
df.plot(kind='line', marker='o', figsize=(10, 6))
plt.title('Stock Prices: Open vs Close')
plt.ylabel('Price ($)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Plotting Multiple Datasets

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)

# Method 1: Multiple calls
plt.plot(x, np.sin(x), 'b-', label='sin(x)')
plt.plot(x, np.cos(x), 'r--', label='cos(x)')
plt.plot(x, np.tan(x), 'g:', label='tan(x)')
plt.legend()
plt.ylim(-2, 2)  # Limit y-axis for better visualization
plt.show()
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)

# Method 2: Single call with multiple datasets
plt.plot(x, np.sin(x), 'b-', 
         x, np.cos(x), 'r--',
         x, np.tan(x), 'g:')
plt.ylim(-2, 2)
plt.show()
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)

# Method 3: 2D arrays (each column is a dataset)
y_multi = np.column_stack([np.sin(x), np.cos(x), np.tan(x)])
plt.plot(x, y_multi)
plt.ylim(-2, 2)
plt.show()

Object-Oriented Interface (Preferred)

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create figure and axes
fig, ax = plt.subplots(figsize=(10, 6))

# Plot on axes
ax.plot(x, y, label='sin(x)', color='blue', linewidth=2)
ax.plot(x, np.cos(x), label='cos(x)', color='red', linestyle='--')

# Customize
ax.set_title('Trigonometric Functions', fontsize=16)
ax.set_xlabel('X', fontsize=12)
ax.set_ylabel('Y', fontsize=12)
ax.legend()
ax.grid(True, alpha=0.3)

plt.show()

Understanding plt.subplots() Parameters

The subplots() function creates a figure and grid of axes with extensive customization options:

import matplotlib.pyplot as plt

# Complete signature:
# subplots(nrows=1, ncols=1, *, sharex=False, sharey=False, 
#          squeeze=True, width_ratios=None, height_ratios=None,
#          subplot_kw=None, gridspec_kw=None, **fig_kw)

# Basic: single subplot
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [1, 4, 9])
plt.show()
import matplotlib.pyplot as plt

# Grid of subplots: 2 rows, 3 columns
fig, axes = plt.subplots(2, 3, figsize=(12, 8))
print(axes.shape)  # (2, 3)

# Access individual axes
axes[0, 0].plot([1, 2, 3], [1, 4, 9])
axes[0, 1].scatter([1, 2, 3], [1, 2, 3])
axes[1, 2].bar(['A', 'B', 'C'], [1, 2, 3])

plt.tight_layout()
plt.show()

Shared Axes

import matplotlib.pyplot as plt
import numpy as np

# Share x-axis across all subplots (aligned zoom)
fig, axes = plt.subplots(2, 2, sharex=True, sharey=True, figsize=(10, 8))

# When shared, only bottom/leftmost labels show
for ax in axes.flat:
    ax.plot(np.random.rand(10))
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
import numpy as np

# Share by column (x-axis shared within each column)
fig, axes = plt.subplots(2, 2, sharex='col', figsize=(10, 8))
for ax in axes.flat:
    ax.plot(np.random.rand(10))
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
import numpy as np

# Share by row (y-axis shared within each row)
fig, axes = plt.subplots(2, 2, sharey='row', figsize=(10, 8))
for ax in axes.flat:
    ax.plot(np.random.rand(10))
plt.tight_layout()
plt.show()

Custom Subplot Ratios

import matplotlib.pyplot as plt

# Different width ratios
fig, axes = plt.subplots(1, 3, figsize=(12, 4),
                         width_ratios=[1, 2, 1])  # Middle subplot 2x wider
axes[0].plot([1, 2, 3])
axes[0].set_title('Narrow')
axes[1].plot([1, 2, 3])
axes[1].set_title('Wide (2x)')
axes[2].plot([1, 2, 3])
axes[2].set_title('Narrow')
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt

# Different height ratios
fig, axes = plt.subplots(3, 1, figsize=(8, 10),
                         height_ratios=[1, 3, 1])  # Middle subplot 3x taller
axes[0].plot([1, 2, 3])
axes[0].set_title('Short')
axes[1].plot([1, 2, 3])
axes[1].set_title('Tall (3x)')
axes[2].plot([1, 2, 3])
axes[2].set_title('Short')
plt.tight_layout()
plt.show()

Advanced Subplot Configuration

import matplotlib.pyplot as plt
import numpy as np

# subplot_kw: parameters for each subplot (polar projection)
fig, axes = plt.subplots(2, 2, 
                         subplot_kw={'projection': 'polar'},
                         figsize=(10, 10))

# Add some polar plots
theta = np.linspace(0, 2*np.pi, 100)
for ax in axes.flat:
    ax.plot(theta, np.abs(np.sin(theta * np.random.randint(1, 5))))
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
import numpy as np

# gridspec_kw: control spacing between subplots
fig, axes = plt.subplots(2, 2, figsize=(10, 8),
                         gridspec_kw={'hspace': 0.4,    # Vertical spacing
                                     'wspace': 0.3,     # Horizontal spacing
                                     'left': 0.1,       # Left margin
                                     'right': 0.95,     # Right margin
                                     'top': 0.95,       # Top margin
                                     'bottom': 0.1})    # Bottom margin

for i, ax in enumerate(axes.flat):
    ax.plot(np.random.rand(10))
    ax.set_title(f'Plot {i+1}')
plt.show()
import matplotlib.pyplot as plt

# squeeze parameter: control return type
fig, axes = plt.subplots(2, 2, squeeze=False)  # Always returns 2D array
print(type(axes))  # numpy.ndarray, shape (2, 2)
print(axes.shape)  # (2, 2)

fig, ax = plt.subplots(1, 1, squeeze=True)     # Returns single Axes object
print(type(ax))    # matplotlib.axes._axes.Axes
Pro Tip

Subplots for Multiple Charts

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create 2x2 grid of subplots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

axes[0, 0].plot(x, y)
axes[0, 0].set_title('Line Plot')

axes[0, 1].scatter(x, y)
axes[0, 1].set_title('Scatter Plot')

axes[1, 0].hist(y, bins=20)
axes[1, 0].set_title('Histogram')

axes[1, 1].bar(['A', 'B', 'C'], [1, 2, 3])
axes[1, 1].set_title('Bar Chart')

plt.tight_layout()  # Prevent overlap
plt.show()

Plot Customization

Colors, Styles & Markers

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

# Color specifications
plt.figure(figsize=(10, 6))
plt.plot(x, y, color='#3B9797', linewidth=2, label='Hex color')
plt.plot(x, y - 0.5, color='teal', linewidth=2, label='Color name')
plt.plot(x, y - 1, color=(0.2, 0.4, 0.6), linewidth=2, label='RGB tuple')
plt.legend()
plt.title('Color Specifications')
plt.show()
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

# Line styles
plt.figure(figsize=(10, 6))
plt.plot(x, y, linestyle='-', linewidth=2, label='Solid')
plt.plot(x, y - 0.5, linestyle='--', linewidth=2, label='Dashed')
plt.plot(x, y - 1, linestyle=':', linewidth=2, label='Dotted')
plt.plot(x, y - 1.5, linestyle='-.', linewidth=2, label='Dash-dot')
plt.legend()
plt.title('Line Styles')
plt.show()
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 50)
y = np.sin(x)

# Markers
plt.figure(figsize=(10, 6))
plt.plot(x, y, marker='o', linewidth=2, markersize=6, label='Circles')
plt.plot(x, y - 0.5, marker='s', linewidth=2, markersize=6, label='Squares')
plt.plot(x, y - 1, marker='^', linewidth=2, markersize=6, label='Triangles')
plt.legend()
plt.title('Marker Styles')
plt.show()

Themes and Styles

import matplotlib.pyplot as plt

# Available styles
print(plt.style.available)
# ['Solarize_Light2', '_classic_test_patch', '_mpl-gallery', '_mpl-gallery-nogrid', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'petroff10', 'seaborn-v0_8', 'seaborn-v0_8-bright', 'seaborn-v0_8-colorblind', 'seaborn-v0_8-dark', 'seaborn-v0_8-dark-palette', 'seaborn-v0_8-darkgrid', 'seaborn-v0_8-deep', 'seaborn-v0_8-muted', 'seaborn-v0_8-notebook', 'seaborn-v0_8-paper', 'seaborn-v0_8-pastel', 'seaborn-v0_8-poster', 'seaborn-v0_8-talk', 'seaborn-v0_8-ticks', 'seaborn-v0_8-white', 'seaborn-v0_8-whitegrid', 'tableau-colorblind10']
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

# Apply style globally
plt.style.use('seaborn-v0_8-whitegrid')
plt.plot(x, y)
plt.title('Seaborn Style')
plt.show()
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

# Temporarily use style (context manager)
with plt.style.context('ggplot'):
    plt.plot(x, y)
    plt.title('ggplot Style (temporary)')
    plt.show()

Figure-Level Customization

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

# Create figure with custom size and DPI
fig = plt.figure(figsize=(12, 6), dpi=100, facecolor='white')

# Add subplot with specific position [left, bottom, width, height]
ax1 = fig.add_axes([0.1, 0.1, 0.8, 0.8])  # Main axes
ax2 = fig.add_axes([0.65, 0.65, 0.2, 0.2])  # Inset axes

ax1.plot(x, y, 'b-', linewidth=2)
ax1.set_title('Main Plot')
ax1.set_xlabel('X')
ax1.set_ylabel('Y')

ax2.plot(x, y**2, 'r-', linewidth=1.5)
ax2.set_title('Inset: y²', fontsize=10)

plt.show()

Axis Control

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0.1, 10, 100)
y = np.sin(x)

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, y, linewidth=2)

# Set axis limits
ax.set_xlim(0, 10)
ax.set_ylim(-1.5, 1.5)

# Grid customization
ax.grid(True, which='both', linestyle='--', linewidth=0.5, alpha=0.7)
ax.minorticks_on()  # Enable minor ticks

ax.set_title('Axis Limits and Grid')
ax.set_xlabel('X')
ax.set_ylabel('Y')
plt.show()
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0.1, 10, 100)
y = x**2

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, y, linewidth=2)

# Set axis scales
ax.set_xscale('log')    # Logarithmic x-axis
ax.set_yscale('log')    # Logarithmic y-axis

ax.set_title('Logarithmic Scales')
ax.set_xlabel('X (log scale)')
ax.set_ylabel('Y (log scale)')
ax.grid(True, alpha=0.3)
plt.show()
import matplotlib.pyplot as plt
import numpy as np

theta = np.linspace(0, 2*np.pi, 100)
x = np.cos(theta)
y = np.sin(theta)

fig, ax = plt.subplots(figsize=(8, 8))
ax.plot(x, y, linewidth=2)

# Equal aspect ratio for circle
ax.set_aspect('equal')
ax.set_title('Circle (equal aspect ratio)')
ax.grid(True, alpha=0.3)
plt.show()

Annotations and Text

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)
y = np.sin(x)

fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, y, linewidth=2)

# Add text at specific coordinates
ax.text(5, 0.5, 'Peak Region', fontsize=12, color='red', fontweight='bold')

# Add annotation with arrow
ax.annotate('Maximum', 
            xy=(np.pi/2, 1),           # Point to annotate
            xytext=(np.pi/2 + 1, 0.5), # Text location
            arrowprops=dict(arrowstyle='->', 
                           connectionstyle='arc3,rad=0.3',
                           color='red', lw=2),
            fontsize=12, 
            bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))

# Add horizontal/vertical lines
ax.axhline(y=0, color='k', linestyle='--', linewidth=0.8, label='y=0')
ax.axvline(x=np.pi, color='r', linestyle='--', linewidth=0.8, label='x=π')

# Shade regions
ax.axhspan(-0.5, 0.5, alpha=0.2, color='green', label='Middle range')
ax.axvspan(2, 4, alpha=0.2, color='blue', label='Region 2-4')

ax.set_title('Annotations, Lines, and Shaded Regions')
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()

Multiple Y-Axes (Twinx)

import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)

fig, ax1 = plt.subplots(figsize=(10, 6))

# First y-axis (left)
ax1.plot(x, np.sin(x), 'b-', linewidth=2, label='sin(x)')
ax1.set_xlabel('X', fontsize=12)
ax1.set_ylabel('sin(x)', color='b', fontsize=12)
ax1.tick_params(axis='y', labelcolor='b')
ax1.grid(True, alpha=0.3)

# Second y-axis (right) - shares x-axis
ax2 = ax1.twinx()
ax2.plot(x, np.exp(x/5), 'r-', linewidth=2, label='exp(x/5)')
ax2.set_ylabel('exp(x/5)', color='r', fontsize=12)
ax2.tick_params(axis='y', labelcolor='r')

plt.title('Dual Y-Axes Example', fontsize=14)
fig.tight_layout()
plt.show()
Advanced Layout

GridSpec for Complex Layouts

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.gridspec import GridSpec

x = np.linspace(0, 10, 100)

fig = plt.figure(figsize=(12, 8))
gs = GridSpec(3, 3, figure=fig, hspace=0.3, wspace=0.3)

# Subplot spanning multiple cells
ax1 = fig.add_subplot(gs[0, :])    # Top row, all columns
ax2 = fig.add_subplot(gs[1, :-1])  # Middle row, first 2 columns
ax3 = fig.add_subplot(gs[1:, -1])  # Right column, last 2 rows
ax4 = fig.add_subplot(gs[-1, 0])   # Bottom left
ax5 = fig.add_subplot(gs[-1, 1])   # Bottom middle

ax1.plot(x, np.sin(x), linewidth=2)
ax1.set_title('Wide Plot (spans 3 columns)')
ax1.grid(True, alpha=0.3)

ax2.plot(x, np.cos(x), linewidth=2)
ax2.set_title('Main Plot (2 columns)')
ax2.grid(True, alpha=0.3)

ax3.hist(np.random.randn(1000), orientation='horizontal', bins=30)
ax3.set_title('Tall Plot (2 rows)')

ax4.scatter(np.random.rand(50), np.random.rand(50), alpha=0.6)
ax4.set_title('Scatter')
ax4.grid(True, alpha=0.3)

ax5.bar(['A', 'B', 'C'], [1, 2, 3])
ax5.set_title('Bar Chart')
ax5.grid(True, alpha=0.3, axis='y')

plt.show()
Practice Exercises

Subplots & Customization Exercises

Exercise 1 (Beginner): Create a 2x2 grid of subplots. Plot sine, cosine, exponential, and logarithm functions. Add titles and labels to each subplot.

Exercise 2 (Beginner): Create subplots with different sizes: one large plot and three smaller plots. Use width_ratios or height_ratios. Apply a style.

Exercise 3 (Intermediate): Create 4 subplots with sharex and sharey. Explain how shared axes simplify zooming/panning. Create both row-wise and column-wise sharing.

Exercise 4 (Intermediate): Create a figure with custom colors, line styles, markers. Apply alpha transparency. Create custom colormaps and legends.

Challenge (Advanced): Create an inset plot (subplot within subplot). Use GridSpec for complex layouts. Customize every element (spines, ticks, labels).

Seaborn: Statistical Graphics

Seaborn builds on Matplotlib, providing beautiful defaults and high-level functions for statistical visualizations. It integrates seamlessly with Pandas DataFrames.

import seaborn as sns

# Set seaborn theme
sns.set_theme(style='whitegrid')

# Load sample dataset
tips = sns.load_dataset('tips')
print(tips.head())
#    total_bill   tip     sex smoker  day    time  size
# 0       16.99  1.01  Female     No  Sun  Dinner     2
# 1       10.34  1.66    Male     No  Sun  Dinner     3
# ...

Why Seaborn?

  • ✅ Beautiful defaults (colors, fonts, spacing)
  • ✅ Built for Pandas DataFrames (column names as labels)
  • ✅ Statistical visualizations in one line
  • ✅ Automatic legends and color schemes

Seaborn Datasets: Load and Explore

Seaborn provides built-in sample datasets perfect for learning and testing visualizations:

import seaborn as sns

# Get list of all available datasets
available_datasets = sns.get_dataset_names()
print("Available datasets:")
print(available_datasets)
# ['anagrams', 'anagrams_long', 'answer_keys', 'attention', 'brain_networks', 
#  'car_crashes', 'diamonds', 'dots', 'dowjones', 'exercise', 'flights', 
#  'fmri', 'gammas', 'geyser', 'glue', 'healthexp', 'iris', 'penguins', 'planets', 
#  'taxis', 'titanic', 'tips', ...]
import seaborn as sns
import pandas as pd

# Load a specific dataset
iris = sns.load_dataset('iris')
print(iris.info())
print("\nDataset shape:", iris.shape)
print(iris.describe())

# Another popular dataset
titanic = sns.load_dataset('titanic')
print("Titanic columns:", titanic.columns.tolist())
print("Titanic shape:", titanic.shape)
import seaborn as sns
import pandas as pd

# Load and explore diamonds dataset
diamonds = sns.load_dataset('diamonds')
print("Diamond columns:", diamonds.columns.tolist())
#  ['carat', 'cut', 'color', 'clarity', 'depth', 'table', 'price', 'x', 'y', 'z']
print("Data types:")
print(diamonds.dtypes)
print("\nSample rows:")
print(diamonds.head())

Visualizing Distributions

Histograms and KDE Plots

import matplotlib.pyplot as plt
import seaborn as sns

# Load sample dataset
tips = sns.load_dataset('tips')

# Histogram with KDE overlay
sns.histplot(tips['total_bill'], bins=20, kde=True)
plt.title('Distribution of Total Bill')
plt.xlabel('Total Bill ($)')
plt.ylabel('Frequency')
plt.show()

KDE (Kernel Density Estimate) Plots - Detailed Parameters

kdeplot() creates smooth probability density curves. Key parameters for customization:

import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Basic KDE plot
sns.kdeplot(data=tips, x='total_bill')
plt.title('KDE Plot: Total Bill Distribution')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# KDE with filled area and custom color
sns.kdeplot(data=tips, x='total_bill', 
            fill=True,              # Fill under curve (shade=True in older versions)
            color='teal',           # Line/fill color
            linewidth=2.5,          # Line thickness
            alpha=0.6)              # Transparency
plt.title('KDE with Custom Styling')
plt.xlabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Bandwidth adjustment (controls smoothness)
fig, axes = plt.subplots(1, 3, figsize=(14, 4))

for ax, bw in zip(axes, [0.2, 0.5, 1.0]):
    sns.kdeplot(data=tips, x='total_bill', bw_adjust=bw, 
                fill=True, color='teal', ax=ax)
    ax.set_title(f'Bandwidth Adjust: {bw}')
    ax.set_xlabel('Total Bill ($)')

plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Multiple distributions by category (hue parameter)
sns.kdeplot(data=tips, x='total_bill', hue='sex',
            fill=True,              # Fill curves
            common_norm=False)      # Each hue normalized separately
plt.title('Total Bill Distribution by Gender')
plt.xlabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# 2D KDE (bivariate density)
sns.kdeplot(data=tips, x='total_bill', y='tip',
            fill=True,              # Fill contours
            cmap='viridis',         # Color map for contour levels
            levels=10)              # Number of contour levels
plt.title('2D Density: Bill vs Tip')
plt.show()
KDE Parameters Summary:
  • data: DataFrame containing the data
  • x, y: Column names for axes (y optional for 2D KDE)
  • hue: Column for grouping by color
  • fill: Boolean to fill area under curve
  • bw_adjust: Bandwidth multiplier (0.1=detailed, 1.0=smooth)
  • color/palette: Line/fill color or color palette
  • linewidth: Thickness of KDE line
  • alpha: Transparency (0=transparent, 1=opaque)
  • cmap: Colormap for 2D density (for bivariate plots)
  • levels: Number of contour lines (for 2D)

Box Plots - Detailed Parameters & Customization

boxplot() shows quartiles, median, and outliers. Essential for statistical comparison.

import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Basic box plot (shows median, Q1, Q3, whiskers, outliers)
sns.boxplot(data=tips, x='day', y='total_bill')
plt.title('Bill Distribution by Day of Week')
plt.ylabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Box plot with hue (grouping variable)
sns.boxplot(data=tips, x='day', y='total_bill', hue='sex',
            palette='Set2')          # Color palette
plt.title('Bill Distribution by Day and Gender')
plt.ylabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Detailed customization
sns.boxplot(data=tips, x='day', y='total_bill', hue='sex',
            palette='husl',         # Color palette
            width=0.6,              # Box width (0-1)
            linewidth=2,            # Line thickness
            fliersize=8,            # Outlier marker size
            dodge=True,             # Separate boxes by hue
            showmeans=True,         # Show mean point
            meanprops=dict(marker='D', markerfacecolor='red', 
                          markersize=8, markeredgecolor='black'))
plt.title('Bill Distribution (Customized)')
plt.ylabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Violin plot (box + distribution shape)
sns.violinplot(data=tips, x='day', y='total_bill', hue='sex', 
               split=False,         # False=overlay, True=split (only 2 hues)
               palette='muted',     # Soft color palette
               inner='quartile',    # Shows quartiles ('box', 'point', 'stick', None)
               cut=0,               # Extend density to data range
               linewidth=2)
plt.title('Bill Distribution by Day and Gender (Violin)')
plt.ylabel('Total Bill ($)')
plt.show()
Box Plot Parameters Summary:
  • data: DataFrame
  • x, y: Column names (y is numeric for box plot)
  • hue: Column for grouping/coloring
  • palette: Color palette ('Set2', 'husl', 'pastel', etc.)
  • width: Box width (0-1, default 0.6)
  • linewidth: Border line thickness
  • fliersize: Outlier marker size
  • showmeans: Boolean to show mean point
  • meanprops: Dict customizing mean marker appearance
  • dodge: Separate boxes by hue or overlap

Additional Distribution Plots: Strip and Swarm

import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Strip plot (scatter plot with jitter for categorical x)
sns.stripplot(data=tips, x='day', y='total_bill', hue='sex',
              size=8,               # Point size
              jitter=True,          # Add random jitter to avoid overlap
              palette='Set1')
plt.title('Individual Points by Day (Strip Plot)')
plt.ylabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Swarm plot (strip plot with smart separation to avoid overlap)
sns.swarmplot(data=tips, x='day', y='total_bill', hue='sex',
              size=7,               # Point size
              palette='husl',
              dodge=True)           # Separate by hue
plt.title('Non-overlapping Points by Day (Swarm Plot)')
plt.ylabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Combine violin plot with strip plot (show distribution + raw data)
fig, ax = plt.subplots(figsize=(10, 6))

sns.violinplot(data=tips, x='day', y='total_bill', hue='sex',
               palette='muted', ax=ax)
sns.stripplot(data=tips, x='day', y='total_bill', hue='sex',
              size=5, alpha=0.4, palette='dark', dodge=True, ax=ax)

plt.title('Violin Plot with Raw Data Points')
plt.ylabel('Total Bill ($)')
plt.show()

Relationships & Correlations

Scatter Plots

import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Basic scatter
sns.scatterplot(data=tips, x='total_bill', y='tip')
plt.title('Total Bill vs Tip')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# With categories (color, style, size by different variables)
sns.scatterplot(data=tips, x='total_bill', y='tip', 
                hue='sex', style='smoker', size='size')
plt.title('Bill vs Tip Analysis (multi-dimensional)')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip ($)')
plt.show()

Regression Plots

import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Scatter with regression line
sns.regplot(data=tips, x='total_bill', y='tip')
plt.title('Linear Regression: Bill vs Tip')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip ($)')
plt.show()
import seaborn as sns

tips = sns.load_dataset('tips')

# Linear model with confidence interval (by category)
sns.lmplot(data=tips, x='total_bill', y='tip', hue='sex', height=6)
plt.show()

Correlation Heatmaps

import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Compute correlation matrix
corr = tips[['total_bill', 'tip', 'size']].corr()

# Heatmap
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0, vmin=-1, vmax=1, 
            square=True, linewidths=1)
plt.title('Correlation Matrix')
plt.show()

Pair Plots

import seaborn as sns

tips = sns.load_dataset('tips')

# All pairwise relationships
sns.pairplot(tips, hue='sex', diag_kind='kde')
plt.show()

Categorical Data Plots

Count Plots

import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Count by category
sns.countplot(data=tips, x='day', hue='sex')
plt.title('Customer Count by Day')
plt.xlabel('Day of Week')
plt.ylabel('Count')
plt.show()

Bar Plots (with aggregation)

import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Mean with 95% confidence interval
sns.barplot(data=tips, x='day', y='total_bill', hue='sex', ci=95)
plt.title('Average Bill by Day and Gender')
plt.xlabel('Day of Week')
plt.ylabel('Average Total Bill ($)')
plt.show()

Point Plots

import matplotlib.pyplot as plt
import seaborn as sns

tips = sns.load_dataset('tips')

# Show point estimates and confidence intervals
sns.pointplot(data=tips, x='day', y='tip', hue='sex')
plt.title('Average Tip by Day and Gender')
plt.xlabel('Day of Week')
plt.ylabel('Average Tip ($)')
plt.show()
Real-World Example

Complete Analysis Workflow

import matplotlib.pyplot as plt
import seaborn as sns

# Load data
df = sns.load_dataset('iris')

# 1. Distribution of features
fig, axes = plt.subplots(2, 2, figsize=(12, 10))

sns.histplot(df['sepal_length'], kde=True, ax=axes[0,0])
axes[0,0].set_title('Sepal Length Distribution')

sns.histplot(df['sepal_width'], kde=True, ax=axes[0,1])
axes[0,1].set_title('Sepal Width Distribution')

sns.histplot(df['petal_length'], kde=True, ax=axes[1,0])
axes[1,0].set_title('Petal Length Distribution')

sns.histplot(df['petal_width'], kde=True, ax=axes[1,1])
axes[1,1].set_title('Petal Width Distribution')

plt.suptitle('Iris Dataset: Feature Distributions', fontsize=16, y=1.00)
plt.tight_layout()
plt.show()
import seaborn as sns

df = sns.load_dataset('iris')

# 2. Pairwise relationships
sns.pairplot(df, hue='species', height=2.5)
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns

df = sns.load_dataset('iris')

# 3. Correlation heatmap
corr = df.drop('species', axis=1).corr()
sns.heatmap(corr, annot=True, cmap='viridis', square=True, linewidths=1)
plt.title('Iris Dataset: Feature Correlations')
plt.show()

Best Practices & Next Steps

Visualization Best Practices

  • Choose the right chart: Bar for comparisons, line for trends, scatter for relationships
  • Label everything: Title, axis labels, legends, units
  • Use color purposefully: Not just decoration; convey meaning
  • Avoid clutter: Remove gridlines, borders, and excessive decoration
  • Consider accessibility: Colorblind-safe palettes, sufficient contrast
  • Start axes at zero: For bar charts (avoid misleading comparisons)

When to Use What

Quick Reference
GoalUse
Compare categoriesBar chart
Show trend over timeLine chart
Distribution shapeHistogram, KDE, violin
Outlier detectionBox plot
Relationship between variablesScatter plot
Correlation matrixHeatmap
Part-of-wholePie chart (use sparingly!)

Saving Figures

# Save as PNG (for web/presentations)
plt.savefig('figure.png', dpi=300, bbox_inches='tight')

# Save as PDF (for publications)
plt.savefig('figure.pdf', bbox_inches='tight')

# Save as SVG (vector, scalable)
plt.savefig('figure.svg', bbox_inches='tight')
What's Next: In Part 4: Machine Learning with Scikit-learn, you'll apply everything learned—NumPy for computation, Pandas for data prep, and visualizations to understand results—to build predictive models.
Practice Exercises

Seaborn & Advanced Visualization Exercises

Exercise 1 (Beginner): Load a Seaborn dataset (tips, iris, or flights). Create box plots, violin plots, and bar plots using hue parameter for grouping.

Exercise 2 (Beginner): Create a correlation heatmap from a DataFrame. Customize colormap, annotations, and layout. Experiment with different cmaps (coolwarm, viridis, RdBu).

Exercise 3 (Intermediate): Create a pairplot with a dataset. Customize diagonal plot (histogram vs KDE). Add a hue variable to show group differences. Explain what patterns emerge.

Exercise 4 (Intermediate): Create categorical plots (count, bar, point, strip). Combine Matplotlib and Seaborn styling. Use FacetGrid for multi-plot grids by category.

Challenge (Advanced): Create a complex multi-plot dashboard combining multiple Seaborn plots. Use GridSpec or figure-level functions. Save as high-quality PNG/PDF for publication.

Matplotlib & Seaborn API Cheat Sheet

Quick reference for creating compelling data visualizations in Python.

Matplotlib Basics
plt.plot(x, y)Line plot
plt.scatter(x, y)Scatter plot
plt.bar(x, y)Bar chart
plt.hist(data, bins=20)Histogram
plt.pie(sizes, labels)Pie chart
plt.boxplot(data)Box plot
plt.imshow(img)Display image
plt.show()Display figure
Customization
plt.title('Title')Set title
plt.xlabel('X')X-axis label
plt.ylabel('Y')Y-axis label
plt.legend()Show legend
plt.grid(True)Add grid
plt.xlim(0, 10)Set x limits
plt.ylim(0, 10)Set y limits
color='red'Set color
Subplots
fig, ax = plt.subplots()Single subplot
fig, axes = plt.subplots(2,3)2×3 grid
ax.plot(x, y)Plot on axes
ax.set_title('Title')Axes title
ax.set_xlabel('X')Axes x-label
plt.tight_layout()Auto-adjust
sharex=TrueShare x-axis
figsize=(10,6)Figure size
Seaborn Plots
sns.scatterplot(x, y, data)Scatter plot
sns.lineplot(x, y, data)Line plot
sns.barplot(x, y, data)Bar plot
sns.boxplot(x, y, data)Box plot
sns.violinplot(x, y, data)Violin plot
sns.heatmap(data, annot=True)Heatmap
sns.pairplot(df)Pairwise plots
sns.regplot(x, y, data)Regression plot
Styling
plt.style.use('ggplot')Apply style
sns.set_theme()Seaborn theme
sns.set_palette('husl')Color palette
linestyle='--'Dashed line
marker='o'Circle markers
linewidth=2Line thickness
alpha=0.5Transparency
label='Data'Legend label
Saving Figures
plt.savefig('plot.png')Save as PNG
plt.savefig('plot.pdf')Save as PDF
plt.savefig('plot.svg')Save as SVG
dpi=300High resolution
bbox_inches='tight'Trim whitespace
transparent=TrueTransparent bg
facecolor='white'Background color
Pro Tips:
  • Object-oriented API: Use fig, ax = plt.subplots() for better control
  • Seaborn integration: Seaborn plots work with matplotlib customization
  • Format strings: 'ro-' = red circles with solid line
  • Interactive mode: Use %matplotlib inline in Jupyter notebooks