Introduction to Data Visualization
After loading and analyzing data with NumPy and Pandas, the next step is visualization—transforming numbers into visual insights that communicate patterns, trends, and anomalies.
Why Visualization Matters: "A picture is worth a thousand rows." Humans process visual information 60,000x faster than text. Good visualizations reveal insights that would take hours to discover in raw data.
The Visualization Ecosystem
- Matplotlib: Low-level, highly customizable plotting library (the foundation)
- Seaborn: High-level interface built on Matplotlib for statistical graphics
- Other tools: Plotly (interactive), Bokeh (web), Altair (declarative)
# Installation
pip install matplotlib seaborn
# Import convention
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
Matplotlib Fundamentals
Matplotlib provides complete control over every element of a plot. It has two interfaces:
- pyplot (MATLAB-style): Quick plotting with
plt.plot()
- Object-oriented (OO): Explicit figure and axes objects (recommended)
Basic Plots
import matplotlib.pyplot as plt
import numpy as np
# Line plot
x = np.linspace(0, 10, 100)
y = np.sin(x)
plt.plot(x, y)
plt.title('Sine Wave')
plt.xlabel('X axis')
plt.ylabel('Y axis')
plt.grid(True)
plt.show()
# Scatter plot
plt.scatter(x, y, alpha=0.5)
plt.show()
# Bar chart
categories = ['A', 'B', 'C', 'D']
values = [25, 40, 30, 55]
plt.bar(categories, values, color='teal')
plt.show()
Understanding plt.plot() Parameters
The plot() function is highly flexible with many parameters for customization:
# Complete signature:
# plot([x], y, [fmt], *, data=None, **kwargs)
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Basic usage - x and y explicitly
plt.plot(x, y)
plt.title('Basic: x and y explicitly')
plt.xlabel('X')
plt.ylabel('Y')
plt.grid(True, alpha=0.3)
plt.show()
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Format strings [marker][line][color]
plt.plot(x, y, 'ro-', label='red circles, solid')
plt.plot(x, y - 0.5, 'g^--', label='green triangles, dashed')
plt.plot(x, y - 1, 'bs:', label='blue squares, dotted')
plt.title('Format Strings: [marker][line][color]')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Keyword arguments (full customization)
plt.plot(x, y,
color='#3B9797', # Hex color
linestyle='--', # Dashed line
linewidth=2.5, # Line thickness
marker='o', # Circle markers
markersize=8, # Marker size
markerfacecolor='red', # Marker fill color
markeredgecolor='black', # Marker edge color
markeredgewidth=1.5, # Marker edge thickness
alpha=0.7, # Transparency (0-1)
label='Custom styled') # Legend label
plt.title('Keyword Arguments: Full Customization')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
Format String Quick Reference:
- Colors: 'b' (blue), 'g' (green), 'r' (red), 'c' (cyan), 'm' (magenta), 'y' (yellow), 'k' (black), 'w' (white)
- Markers: 'o' (circle), 's' (square), '^' (triangle up), 'v' (triangle down), '*' (star), '+' (plus), 'x' (x), 'D' (diamond)
- Lines: '-' (solid), '--' (dashed), '-.' (dash-dot), ':' (dotted)
- Example:
'ro--' = red circles with dashed line
Plotting NumPy Arrays
import matplotlib.pyplot as plt
import numpy as np
# Direct plotting from NumPy arrays
x = np.array([1, 2, 3, 4, 5])
y = np.array([2, 4, 6, 8, 10])
plt.plot(x, y, 'bo-', linewidth=2, label='y = 2x')
plt.title('Plotting NumPy Arrays')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()
plt.grid(True)
plt.show()
Plotting Pandas Series
import matplotlib.pyplot as plt
import pandas as pd
# Create a Pandas Series
s = pd.Series([10, 15, 13, 17, 20],
index=['Mon', 'Tue', 'Wed', 'Thu', 'Fri'],
name='Temperature')
# Plot Series directly
plt.plot(s, marker='o', linestyle='--', color='red')
plt.title('Plotting Pandas Series')
plt.xlabel('Day')
plt.ylabel('Temperature (°C)')
plt.grid(True, alpha=0.3)
plt.show()
import matplotlib.pyplot as plt
import pandas as pd
# Multiple Series from DataFrame columns
df = pd.DataFrame({
'sales': [100, 150, 120, 200, 180],
'profit': [20, 35, 25, 50, 40],
}, index=['Q1', 'Q2', 'Q3', 'Q4', 'Q5'])
# Plot multiple columns
plt.plot(df.index, df['sales'], 'o-', label='Sales', linewidth=2)
plt.plot(df.index, df['profit'], 's--', label='Profit', linewidth=2)
plt.title('Sales vs Profit by Quarter')
plt.xlabel('Quarter')
plt.ylabel('Amount ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
Plotting Pandas DataFrames
import matplotlib.pyplot as plt
import pandas as pd
# DataFrame with numeric columns
df = pd.DataFrame({
'Product_A': [10, 12, 15, 20, 18],
'Product_B': [8, 14, 16, 19, 22],
'Product_C': [5, 9, 11, 15, 20]
}, index=['Week 1', 'Week 2', 'Week 3', 'Week 4', 'Week 5'])
# Plot all columns (each column becomes a line)
plt.figure(figsize=(10, 6))
for column in df.columns:
plt.plot(df.index, df[column], marker='o', label=column, linewidth=2)
plt.title('Product Sales Over 5 Weeks')
plt.xlabel('Week')
plt.ylabel('Sales')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
import pandas as pd
# Using DataFrame.plot() method (convenient for DataFrame visualization)
df = pd.DataFrame({
'Open': [100, 102, 101, 105, 107],
'Close': [102, 101, 104, 106, 108],
}, index=['Day 1', 'Day 2', 'Day 3', 'Day 4', 'Day 5'])
# Built-in DataFrame plotting
df.plot(kind='line', marker='o', figsize=(10, 6))
plt.title('Stock Prices: Open vs Close')
plt.ylabel('Price ($)')
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Plotting Multiple Datasets
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
# Method 1: Multiple calls
plt.plot(x, np.sin(x), 'b-', label='sin(x)')
plt.plot(x, np.cos(x), 'r--', label='cos(x)')
plt.plot(x, np.tan(x), 'g:', label='tan(x)')
plt.legend()
plt.ylim(-2, 2) # Limit y-axis for better visualization
plt.show()
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
# Method 2: Single call with multiple datasets
plt.plot(x, np.sin(x), 'b-',
x, np.cos(x), 'r--',
x, np.tan(x), 'g:')
plt.ylim(-2, 2)
plt.show()
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
# Method 3: 2D arrays (each column is a dataset)
y_multi = np.column_stack([np.sin(x), np.cos(x), np.tan(x)])
plt.plot(x, y_multi)
plt.ylim(-2, 2)
plt.show()
Object-Oriented Interface (Preferred)
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Create figure and axes
fig, ax = plt.subplots(figsize=(10, 6))
# Plot on axes
ax.plot(x, y, label='sin(x)', color='blue', linewidth=2)
ax.plot(x, np.cos(x), label='cos(x)', color='red', linestyle='--')
# Customize
ax.set_title('Trigonometric Functions', fontsize=16)
ax.set_xlabel('X', fontsize=12)
ax.set_ylabel('Y', fontsize=12)
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()
Understanding plt.subplots() Parameters
The subplots() function creates a figure and grid of axes with extensive customization options:
import matplotlib.pyplot as plt
# Complete signature:
# subplots(nrows=1, ncols=1, *, sharex=False, sharey=False,
# squeeze=True, width_ratios=None, height_ratios=None,
# subplot_kw=None, gridspec_kw=None, **fig_kw)
# Basic: single subplot
fig, ax = plt.subplots()
ax.plot([1, 2, 3], [1, 4, 9])
plt.show()
import matplotlib.pyplot as plt
# Grid of subplots: 2 rows, 3 columns
fig, axes = plt.subplots(2, 3, figsize=(12, 8))
print(axes.shape) # (2, 3)
# Access individual axes
axes[0, 0].plot([1, 2, 3], [1, 4, 9])
axes[0, 1].scatter([1, 2, 3], [1, 2, 3])
axes[1, 2].bar(['A', 'B', 'C'], [1, 2, 3])
plt.tight_layout()
plt.show()
Shared Axes
import matplotlib.pyplot as plt
import numpy as np
# Share x-axis across all subplots (aligned zoom)
fig, axes = plt.subplots(2, 2, sharex=True, sharey=True, figsize=(10, 8))
# When shared, only bottom/leftmost labels show
for ax in axes.flat:
ax.plot(np.random.rand(10))
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
import numpy as np
# Share by column (x-axis shared within each column)
fig, axes = plt.subplots(2, 2, sharex='col', figsize=(10, 8))
for ax in axes.flat:
ax.plot(np.random.rand(10))
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
import numpy as np
# Share by row (y-axis shared within each row)
fig, axes = plt.subplots(2, 2, sharey='row', figsize=(10, 8))
for ax in axes.flat:
ax.plot(np.random.rand(10))
plt.tight_layout()
plt.show()
Custom Subplot Ratios
import matplotlib.pyplot as plt
# Different width ratios
fig, axes = plt.subplots(1, 3, figsize=(12, 4),
width_ratios=[1, 2, 1]) # Middle subplot 2x wider
axes[0].plot([1, 2, 3])
axes[0].set_title('Narrow')
axes[1].plot([1, 2, 3])
axes[1].set_title('Wide (2x)')
axes[2].plot([1, 2, 3])
axes[2].set_title('Narrow')
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
# Different height ratios
fig, axes = plt.subplots(3, 1, figsize=(8, 10),
height_ratios=[1, 3, 1]) # Middle subplot 3x taller
axes[0].plot([1, 2, 3])
axes[0].set_title('Short')
axes[1].plot([1, 2, 3])
axes[1].set_title('Tall (3x)')
axes[2].plot([1, 2, 3])
axes[2].set_title('Short')
plt.tight_layout()
plt.show()
Advanced Subplot Configuration
import matplotlib.pyplot as plt
import numpy as np
# subplot_kw: parameters for each subplot (polar projection)
fig, axes = plt.subplots(2, 2,
subplot_kw={'projection': 'polar'},
figsize=(10, 10))
# Add some polar plots
theta = np.linspace(0, 2*np.pi, 100)
for ax in axes.flat:
ax.plot(theta, np.abs(np.sin(theta * np.random.randint(1, 5))))
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
import numpy as np
# gridspec_kw: control spacing between subplots
fig, axes = plt.subplots(2, 2, figsize=(10, 8),
gridspec_kw={'hspace': 0.4, # Vertical spacing
'wspace': 0.3, # Horizontal spacing
'left': 0.1, # Left margin
'right': 0.95, # Right margin
'top': 0.95, # Top margin
'bottom': 0.1}) # Bottom margin
for i, ax in enumerate(axes.flat):
ax.plot(np.random.rand(10))
ax.set_title(f'Plot {i+1}')
plt.show()
import matplotlib.pyplot as plt
# squeeze parameter: control return type
fig, axes = plt.subplots(2, 2, squeeze=False) # Always returns 2D array
print(type(axes)) # numpy.ndarray, shape (2, 2)
print(axes.shape) # (2, 2)
fig, ax = plt.subplots(1, 1, squeeze=True) # Returns single Axes object
print(type(ax)) # matplotlib.axes._axes.Axes
Pro Tip
Subplots for Multiple Charts
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Create 2x2 grid of subplots
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
axes[0, 0].plot(x, y)
axes[0, 0].set_title('Line Plot')
axes[0, 1].scatter(x, y)
axes[0, 1].set_title('Scatter Plot')
axes[1, 0].hist(y, bins=20)
axes[1, 0].set_title('Histogram')
axes[1, 1].bar(['A', 'B', 'C'], [1, 2, 3])
axes[1, 1].set_title('Bar Chart')
plt.tight_layout() # Prevent overlap
plt.show()
Plot Customization
Colors, Styles & Markers
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Color specifications
plt.figure(figsize=(10, 6))
plt.plot(x, y, color='#3B9797', linewidth=2, label='Hex color')
plt.plot(x, y - 0.5, color='teal', linewidth=2, label='Color name')
plt.plot(x, y - 1, color=(0.2, 0.4, 0.6), linewidth=2, label='RGB tuple')
plt.legend()
plt.title('Color Specifications')
plt.show()
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Line styles
plt.figure(figsize=(10, 6))
plt.plot(x, y, linestyle='-', linewidth=2, label='Solid')
plt.plot(x, y - 0.5, linestyle='--', linewidth=2, label='Dashed')
plt.plot(x, y - 1, linestyle=':', linewidth=2, label='Dotted')
plt.plot(x, y - 1.5, linestyle='-.', linewidth=2, label='Dash-dot')
plt.legend()
plt.title('Line Styles')
plt.show()
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 50)
y = np.sin(x)
# Markers
plt.figure(figsize=(10, 6))
plt.plot(x, y, marker='o', linewidth=2, markersize=6, label='Circles')
plt.plot(x, y - 0.5, marker='s', linewidth=2, markersize=6, label='Squares')
plt.plot(x, y - 1, marker='^', linewidth=2, markersize=6, label='Triangles')
plt.legend()
plt.title('Marker Styles')
plt.show()
Themes and Styles
import matplotlib.pyplot as plt
# Available styles
print(plt.style.available)
# ['Solarize_Light2', '_classic_test_patch', '_mpl-gallery', '_mpl-gallery-nogrid', 'bmh', 'classic', 'dark_background', 'fast', 'fivethirtyeight', 'ggplot', 'grayscale', 'petroff10', 'seaborn-v0_8', 'seaborn-v0_8-bright', 'seaborn-v0_8-colorblind', 'seaborn-v0_8-dark', 'seaborn-v0_8-dark-palette', 'seaborn-v0_8-darkgrid', 'seaborn-v0_8-deep', 'seaborn-v0_8-muted', 'seaborn-v0_8-notebook', 'seaborn-v0_8-paper', 'seaborn-v0_8-pastel', 'seaborn-v0_8-poster', 'seaborn-v0_8-talk', 'seaborn-v0_8-ticks', 'seaborn-v0_8-white', 'seaborn-v0_8-whitegrid', 'tableau-colorblind10']
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Apply style globally
plt.style.use('seaborn-v0_8-whitegrid')
plt.plot(x, y)
plt.title('Seaborn Style')
plt.show()
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Temporarily use style (context manager)
with plt.style.context('ggplot'):
plt.plot(x, y)
plt.title('ggplot Style (temporary)')
plt.show()
Figure-Level Customization
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
# Create figure with custom size and DPI
fig = plt.figure(figsize=(12, 6), dpi=100, facecolor='white')
# Add subplot with specific position [left, bottom, width, height]
ax1 = fig.add_axes([0.1, 0.1, 0.8, 0.8]) # Main axes
ax2 = fig.add_axes([0.65, 0.65, 0.2, 0.2]) # Inset axes
ax1.plot(x, y, 'b-', linewidth=2)
ax1.set_title('Main Plot')
ax1.set_xlabel('X')
ax1.set_ylabel('Y')
ax2.plot(x, y**2, 'r-', linewidth=1.5)
ax2.set_title('Inset: y²', fontsize=10)
plt.show()
Axis Control
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0.1, 10, 100)
y = np.sin(x)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, y, linewidth=2)
# Set axis limits
ax.set_xlim(0, 10)
ax.set_ylim(-1.5, 1.5)
# Grid customization
ax.grid(True, which='both', linestyle='--', linewidth=0.5, alpha=0.7)
ax.minorticks_on() # Enable minor ticks
ax.set_title('Axis Limits and Grid')
ax.set_xlabel('X')
ax.set_ylabel('Y')
plt.show()
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0.1, 10, 100)
y = x**2
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, y, linewidth=2)
# Set axis scales
ax.set_xscale('log') # Logarithmic x-axis
ax.set_yscale('log') # Logarithmic y-axis
ax.set_title('Logarithmic Scales')
ax.set_xlabel('X (log scale)')
ax.set_ylabel('Y (log scale)')
ax.grid(True, alpha=0.3)
plt.show()
import matplotlib.pyplot as plt
import numpy as np
theta = np.linspace(0, 2*np.pi, 100)
x = np.cos(theta)
y = np.sin(theta)
fig, ax = plt.subplots(figsize=(8, 8))
ax.plot(x, y, linewidth=2)
# Equal aspect ratio for circle
ax.set_aspect('equal')
ax.set_title('Circle (equal aspect ratio)')
ax.grid(True, alpha=0.3)
plt.show()
Annotations and Text
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
y = np.sin(x)
fig, ax = plt.subplots(figsize=(10, 6))
ax.plot(x, y, linewidth=2)
# Add text at specific coordinates
ax.text(5, 0.5, 'Peak Region', fontsize=12, color='red', fontweight='bold')
# Add annotation with arrow
ax.annotate('Maximum',
xy=(np.pi/2, 1), # Point to annotate
xytext=(np.pi/2 + 1, 0.5), # Text location
arrowprops=dict(arrowstyle='->',
connectionstyle='arc3,rad=0.3',
color='red', lw=2),
fontsize=12,
bbox=dict(boxstyle='round', facecolor='wheat', alpha=0.5))
# Add horizontal/vertical lines
ax.axhline(y=0, color='k', linestyle='--', linewidth=0.8, label='y=0')
ax.axvline(x=np.pi, color='r', linestyle='--', linewidth=0.8, label='x=π')
# Shade regions
ax.axhspan(-0.5, 0.5, alpha=0.2, color='green', label='Middle range')
ax.axvspan(2, 4, alpha=0.2, color='blue', label='Region 2-4')
ax.set_title('Annotations, Lines, and Shaded Regions')
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.legend()
ax.grid(True, alpha=0.3)
plt.show()
Multiple Y-Axes (Twinx)
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
fig, ax1 = plt.subplots(figsize=(10, 6))
# First y-axis (left)
ax1.plot(x, np.sin(x), 'b-', linewidth=2, label='sin(x)')
ax1.set_xlabel('X', fontsize=12)
ax1.set_ylabel('sin(x)', color='b', fontsize=12)
ax1.tick_params(axis='y', labelcolor='b')
ax1.grid(True, alpha=0.3)
# Second y-axis (right) - shares x-axis
ax2 = ax1.twinx()
ax2.plot(x, np.exp(x/5), 'r-', linewidth=2, label='exp(x/5)')
ax2.set_ylabel('exp(x/5)', color='r', fontsize=12)
ax2.tick_params(axis='y', labelcolor='r')
plt.title('Dual Y-Axes Example', fontsize=14)
fig.tight_layout()
plt.show()
Advanced Layout
GridSpec for Complex Layouts
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.gridspec import GridSpec
x = np.linspace(0, 10, 100)
fig = plt.figure(figsize=(12, 8))
gs = GridSpec(3, 3, figure=fig, hspace=0.3, wspace=0.3)
# Subplot spanning multiple cells
ax1 = fig.add_subplot(gs[0, :]) # Top row, all columns
ax2 = fig.add_subplot(gs[1, :-1]) # Middle row, first 2 columns
ax3 = fig.add_subplot(gs[1:, -1]) # Right column, last 2 rows
ax4 = fig.add_subplot(gs[-1, 0]) # Bottom left
ax5 = fig.add_subplot(gs[-1, 1]) # Bottom middle
ax1.plot(x, np.sin(x), linewidth=2)
ax1.set_title('Wide Plot (spans 3 columns)')
ax1.grid(True, alpha=0.3)
ax2.plot(x, np.cos(x), linewidth=2)
ax2.set_title('Main Plot (2 columns)')
ax2.grid(True, alpha=0.3)
ax3.hist(np.random.randn(1000), orientation='horizontal', bins=30)
ax3.set_title('Tall Plot (2 rows)')
ax4.scatter(np.random.rand(50), np.random.rand(50), alpha=0.6)
ax4.set_title('Scatter')
ax4.grid(True, alpha=0.3)
ax5.bar(['A', 'B', 'C'], [1, 2, 3])
ax5.set_title('Bar Chart')
ax5.grid(True, alpha=0.3, axis='y')
plt.show()
Practice Exercises
Subplots & Customization Exercises
Exercise 1 (Beginner): Create a 2x2 grid of subplots. Plot sine, cosine, exponential, and logarithm functions. Add titles and labels to each subplot.
Exercise 2 (Beginner): Create subplots with different sizes: one large plot and three smaller plots. Use width_ratios or height_ratios. Apply a style.
Exercise 3 (Intermediate): Create 4 subplots with sharex and sharey. Explain how shared axes simplify zooming/panning. Create both row-wise and column-wise sharing.
Exercise 4 (Intermediate): Create a figure with custom colors, line styles, markers. Apply alpha transparency. Create custom colormaps and legends.
Challenge (Advanced): Create an inset plot (subplot within subplot). Use GridSpec for complex layouts. Customize every element (spines, ticks, labels).
Seaborn: Statistical Graphics
Seaborn builds on Matplotlib, providing beautiful defaults and high-level functions for statistical visualizations. It integrates seamlessly with Pandas DataFrames.
import seaborn as sns
# Set seaborn theme
sns.set_theme(style='whitegrid')
# Load sample dataset
tips = sns.load_dataset('tips')
print(tips.head())
# total_bill tip sex smoker day time size
# 0 16.99 1.01 Female No Sun Dinner 2
# 1 10.34 1.66 Male No Sun Dinner 3
# ...
Why Seaborn?
- ✅ Beautiful defaults (colors, fonts, spacing)
- ✅ Built for Pandas DataFrames (column names as labels)
- ✅ Statistical visualizations in one line
- ✅ Automatic legends and color schemes
Seaborn Datasets: Load and Explore
Seaborn provides built-in sample datasets perfect for learning and testing visualizations:
import seaborn as sns
# Get list of all available datasets
available_datasets = sns.get_dataset_names()
print("Available datasets:")
print(available_datasets)
# ['anagrams', 'anagrams_long', 'answer_keys', 'attention', 'brain_networks',
# 'car_crashes', 'diamonds', 'dots', 'dowjones', 'exercise', 'flights',
# 'fmri', 'gammas', 'geyser', 'glue', 'healthexp', 'iris', 'penguins', 'planets',
# 'taxis', 'titanic', 'tips', ...]
import seaborn as sns
import pandas as pd
# Load a specific dataset
iris = sns.load_dataset('iris')
print(iris.info())
print("\nDataset shape:", iris.shape)
print(iris.describe())
# Another popular dataset
titanic = sns.load_dataset('titanic')
print("Titanic columns:", titanic.columns.tolist())
print("Titanic shape:", titanic.shape)
import seaborn as sns
import pandas as pd
# Load and explore diamonds dataset
diamonds = sns.load_dataset('diamonds')
print("Diamond columns:", diamonds.columns.tolist())
# ['carat', 'cut', 'color', 'clarity', 'depth', 'table', 'price', 'x', 'y', 'z']
print("Data types:")
print(diamonds.dtypes)
print("\nSample rows:")
print(diamonds.head())
Visualizing Distributions
Histograms and KDE Plots
import matplotlib.pyplot as plt
import seaborn as sns
# Load sample dataset
tips = sns.load_dataset('tips')
# Histogram with KDE overlay
sns.histplot(tips['total_bill'], bins=20, kde=True)
plt.title('Distribution of Total Bill')
plt.xlabel('Total Bill ($)')
plt.ylabel('Frequency')
plt.show()
KDE (Kernel Density Estimate) Plots - Detailed Parameters
kdeplot() creates smooth probability density curves. Key parameters for customization:
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Basic KDE plot
sns.kdeplot(data=tips, x='total_bill')
plt.title('KDE Plot: Total Bill Distribution')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# KDE with filled area and custom color
sns.kdeplot(data=tips, x='total_bill',
fill=True, # Fill under curve (shade=True in older versions)
color='teal', # Line/fill color
linewidth=2.5, # Line thickness
alpha=0.6) # Transparency
plt.title('KDE with Custom Styling')
plt.xlabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Bandwidth adjustment (controls smoothness)
fig, axes = plt.subplots(1, 3, figsize=(14, 4))
for ax, bw in zip(axes, [0.2, 0.5, 1.0]):
sns.kdeplot(data=tips, x='total_bill', bw_adjust=bw,
fill=True, color='teal', ax=ax)
ax.set_title(f'Bandwidth Adjust: {bw}')
ax.set_xlabel('Total Bill ($)')
plt.tight_layout()
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Multiple distributions by category (hue parameter)
sns.kdeplot(data=tips, x='total_bill', hue='sex',
fill=True, # Fill curves
common_norm=False) # Each hue normalized separately
plt.title('Total Bill Distribution by Gender')
plt.xlabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# 2D KDE (bivariate density)
sns.kdeplot(data=tips, x='total_bill', y='tip',
fill=True, # Fill contours
cmap='viridis', # Color map for contour levels
levels=10) # Number of contour levels
plt.title('2D Density: Bill vs Tip')
plt.show()
KDE Parameters Summary:
data: DataFrame containing the data
x, y: Column names for axes (y optional for 2D KDE)
hue: Column for grouping by color
fill: Boolean to fill area under curve
bw_adjust: Bandwidth multiplier (0.1=detailed, 1.0=smooth)
color/palette: Line/fill color or color palette
linewidth: Thickness of KDE line
alpha: Transparency (0=transparent, 1=opaque)
cmap: Colormap for 2D density (for bivariate plots)
levels: Number of contour lines (for 2D)
Box Plots - Detailed Parameters & Customization
boxplot() shows quartiles, median, and outliers. Essential for statistical comparison.
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Basic box plot (shows median, Q1, Q3, whiskers, outliers)
sns.boxplot(data=tips, x='day', y='total_bill')
plt.title('Bill Distribution by Day of Week')
plt.ylabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Box plot with hue (grouping variable)
sns.boxplot(data=tips, x='day', y='total_bill', hue='sex',
palette='Set2') # Color palette
plt.title('Bill Distribution by Day and Gender')
plt.ylabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Detailed customization
sns.boxplot(data=tips, x='day', y='total_bill', hue='sex',
palette='husl', # Color palette
width=0.6, # Box width (0-1)
linewidth=2, # Line thickness
fliersize=8, # Outlier marker size
dodge=True, # Separate boxes by hue
showmeans=True, # Show mean point
meanprops=dict(marker='D', markerfacecolor='red',
markersize=8, markeredgecolor='black'))
plt.title('Bill Distribution (Customized)')
plt.ylabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Violin plot (box + distribution shape)
sns.violinplot(data=tips, x='day', y='total_bill', hue='sex',
split=False, # False=overlay, True=split (only 2 hues)
palette='muted', # Soft color palette
inner='quartile', # Shows quartiles ('box', 'point', 'stick', None)
cut=0, # Extend density to data range
linewidth=2)
plt.title('Bill Distribution by Day and Gender (Violin)')
plt.ylabel('Total Bill ($)')
plt.show()
Box Plot Parameters Summary:
data: DataFrame
x, y: Column names (y is numeric for box plot)
hue: Column for grouping/coloring
palette: Color palette ('Set2', 'husl', 'pastel', etc.)
width: Box width (0-1, default 0.6)
linewidth: Border line thickness
fliersize: Outlier marker size
showmeans: Boolean to show mean point
meanprops: Dict customizing mean marker appearance
dodge: Separate boxes by hue or overlap
Additional Distribution Plots: Strip and Swarm
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Strip plot (scatter plot with jitter for categorical x)
sns.stripplot(data=tips, x='day', y='total_bill', hue='sex',
size=8, # Point size
jitter=True, # Add random jitter to avoid overlap
palette='Set1')
plt.title('Individual Points by Day (Strip Plot)')
plt.ylabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Swarm plot (strip plot with smart separation to avoid overlap)
sns.swarmplot(data=tips, x='day', y='total_bill', hue='sex',
size=7, # Point size
palette='husl',
dodge=True) # Separate by hue
plt.title('Non-overlapping Points by Day (Swarm Plot)')
plt.ylabel('Total Bill ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Combine violin plot with strip plot (show distribution + raw data)
fig, ax = plt.subplots(figsize=(10, 6))
sns.violinplot(data=tips, x='day', y='total_bill', hue='sex',
palette='muted', ax=ax)
sns.stripplot(data=tips, x='day', y='total_bill', hue='sex',
size=5, alpha=0.4, palette='dark', dodge=True, ax=ax)
plt.title('Violin Plot with Raw Data Points')
plt.ylabel('Total Bill ($)')
plt.show()
Relationships & Correlations
Scatter Plots
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Basic scatter
sns.scatterplot(data=tips, x='total_bill', y='tip')
plt.title('Total Bill vs Tip')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip ($)')
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# With categories (color, style, size by different variables)
sns.scatterplot(data=tips, x='total_bill', y='tip',
hue='sex', style='smoker', size='size')
plt.title('Bill vs Tip Analysis (multi-dimensional)')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip ($)')
plt.show()
Regression Plots
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Scatter with regression line
sns.regplot(data=tips, x='total_bill', y='tip')
plt.title('Linear Regression: Bill vs Tip')
plt.xlabel('Total Bill ($)')
plt.ylabel('Tip ($)')
plt.show()
import seaborn as sns
tips = sns.load_dataset('tips')
# Linear model with confidence interval (by category)
sns.lmplot(data=tips, x='total_bill', y='tip', hue='sex', height=6)
plt.show()
Correlation Heatmaps
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Compute correlation matrix
corr = tips[['total_bill', 'tip', 'size']].corr()
# Heatmap
sns.heatmap(corr, annot=True, cmap='coolwarm', center=0, vmin=-1, vmax=1,
square=True, linewidths=1)
plt.title('Correlation Matrix')
plt.show()
Pair Plots
import seaborn as sns
tips = sns.load_dataset('tips')
# All pairwise relationships
sns.pairplot(tips, hue='sex', diag_kind='kde')
plt.show()
Categorical Data Plots
Count Plots
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Count by category
sns.countplot(data=tips, x='day', hue='sex')
plt.title('Customer Count by Day')
plt.xlabel('Day of Week')
plt.ylabel('Count')
plt.show()
Bar Plots (with aggregation)
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Mean with 95% confidence interval
sns.barplot(data=tips, x='day', y='total_bill', hue='sex', ci=95)
plt.title('Average Bill by Day and Gender')
plt.xlabel('Day of Week')
plt.ylabel('Average Total Bill ($)')
plt.show()
Point Plots
import matplotlib.pyplot as plt
import seaborn as sns
tips = sns.load_dataset('tips')
# Show point estimates and confidence intervals
sns.pointplot(data=tips, x='day', y='tip', hue='sex')
plt.title('Average Tip by Day and Gender')
plt.xlabel('Day of Week')
plt.ylabel('Average Tip ($)')
plt.show()
Real-World Example
Complete Analysis Workflow
import matplotlib.pyplot as plt
import seaborn as sns
# Load data
df = sns.load_dataset('iris')
# 1. Distribution of features
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
sns.histplot(df['sepal_length'], kde=True, ax=axes[0,0])
axes[0,0].set_title('Sepal Length Distribution')
sns.histplot(df['sepal_width'], kde=True, ax=axes[0,1])
axes[0,1].set_title('Sepal Width Distribution')
sns.histplot(df['petal_length'], kde=True, ax=axes[1,0])
axes[1,0].set_title('Petal Length Distribution')
sns.histplot(df['petal_width'], kde=True, ax=axes[1,1])
axes[1,1].set_title('Petal Width Distribution')
plt.suptitle('Iris Dataset: Feature Distributions', fontsize=16, y=1.00)
plt.tight_layout()
plt.show()
import seaborn as sns
df = sns.load_dataset('iris')
# 2. Pairwise relationships
sns.pairplot(df, hue='species', height=2.5)
plt.show()
import matplotlib.pyplot as plt
import seaborn as sns
df = sns.load_dataset('iris')
# 3. Correlation heatmap
corr = df.drop('species', axis=1).corr()
sns.heatmap(corr, annot=True, cmap='viridis', square=True, linewidths=1)
plt.title('Iris Dataset: Feature Correlations')
plt.show()
Best Practices & Next Steps
Visualization Best Practices
- ✅ Choose the right chart: Bar for comparisons, line for trends, scatter for relationships
- ✅ Label everything: Title, axis labels, legends, units
- ✅ Use color purposefully: Not just decoration; convey meaning
- ✅ Avoid clutter: Remove gridlines, borders, and excessive decoration
- ✅ Consider accessibility: Colorblind-safe palettes, sufficient contrast
- ✅ Start axes at zero: For bar charts (avoid misleading comparisons)
When to Use What
Quick Reference
| Goal | Use |
| Compare categories | Bar chart |
| Show trend over time | Line chart |
| Distribution shape | Histogram, KDE, violin |
| Outlier detection | Box plot |
| Relationship between variables | Scatter plot |
| Correlation matrix | Heatmap |
| Part-of-whole | Pie chart (use sparingly!) |
Saving Figures
# Save as PNG (for web/presentations)
plt.savefig('figure.png', dpi=300, bbox_inches='tight')
# Save as PDF (for publications)
plt.savefig('figure.pdf', bbox_inches='tight')
# Save as SVG (vector, scalable)
plt.savefig('figure.svg', bbox_inches='tight')
What's Next: In Part 4: Machine Learning with Scikit-learn, you'll apply everything learned—NumPy for computation, Pandas for data prep, and visualizations to understand results—to build predictive models.
Practice Exercises
Seaborn & Advanced Visualization Exercises
Exercise 1 (Beginner): Load a Seaborn dataset (tips, iris, or flights). Create box plots, violin plots, and bar plots using hue parameter for grouping.
Exercise 2 (Beginner): Create a correlation heatmap from a DataFrame. Customize colormap, annotations, and layout. Experiment with different cmaps (coolwarm, viridis, RdBu).
Exercise 3 (Intermediate): Create a pairplot with a dataset. Customize diagonal plot (histogram vs KDE). Add a hue variable to show group differences. Explain what patterns emerge.
Exercise 4 (Intermediate): Create categorical plots (count, bar, point, strip). Combine Matplotlib and Seaborn styling. Use FacetGrid for multi-plot grids by category.
Challenge (Advanced): Create a complex multi-plot dashboard combining multiple Seaborn plots. Use GridSpec or figure-level functions. Save as high-quality PNG/PDF for publication.
Matplotlib & Seaborn API Cheat Sheet
Quick reference for creating compelling data visualizations in Python.
plt.plot(x, y) | Line plot |
plt.scatter(x, y) | Scatter plot |
plt.bar(x, y) | Bar chart |
plt.hist(data, bins=20) | Histogram |
plt.pie(sizes, labels) | Pie chart |
plt.boxplot(data) | Box plot |
plt.imshow(img) | Display image |
plt.show() | Display figure |
plt.title('Title') | Set title |
plt.xlabel('X') | X-axis label |
plt.ylabel('Y') | Y-axis label |
plt.legend() | Show legend |
plt.grid(True) | Add grid |
plt.xlim(0, 10) | Set x limits |
plt.ylim(0, 10) | Set y limits |
color='red' | Set color |
fig, ax = plt.subplots() | Single subplot |
fig, axes = plt.subplots(2,3) | 2×3 grid |
ax.plot(x, y) | Plot on axes |
ax.set_title('Title') | Axes title |
ax.set_xlabel('X') | Axes x-label |
plt.tight_layout() | Auto-adjust |
sharex=True | Share x-axis |
figsize=(10,6) | Figure size |
sns.scatterplot(x, y, data) | Scatter plot |
sns.lineplot(x, y, data) | Line plot |
sns.barplot(x, y, data) | Bar plot |
sns.boxplot(x, y, data) | Box plot |
sns.violinplot(x, y, data) | Violin plot |
sns.heatmap(data, annot=True) | Heatmap |
sns.pairplot(df) | Pairwise plots |
sns.regplot(x, y, data) | Regression plot |
plt.style.use('ggplot') | Apply style |
sns.set_theme() | Seaborn theme |
sns.set_palette('husl') | Color palette |
linestyle='--' | Dashed line |
marker='o' | Circle markers |
linewidth=2 | Line thickness |
alpha=0.5 | Transparency |
label='Data' | Legend label |
plt.savefig('plot.png') | Save as PNG |
plt.savefig('plot.pdf') | Save as PDF |
plt.savefig('plot.svg') | Save as SVG |
dpi=300 | High resolution |
bbox_inches='tight' | Trim whitespace |
transparent=True | Transparent bg |
facecolor='white' | Background color |
Pro Tips:
- Object-oriented API: Use
fig, ax = plt.subplots() for better control
- Seaborn integration: Seaborn plots work with matplotlib customization
- Format strings:
'ro-' = red circles with solid line
- Interactive mode: Use
%matplotlib inline in Jupyter notebooks
Related Articles in This Series
Part 1: NumPy Foundations for Data Science
Master NumPy arrays, vectorization, broadcasting, and linear algebra operations—the foundation of Python data science.
Read Article
Part 2: Pandas for Data Analysis
Master Pandas DataFrames, Series, data cleaning, transformation, groupby operations, and merge techniques for real-world data analysis.
Read Article
Part 4: Machine Learning with Scikit-learn
Build predictive models with classification, regression, clustering algorithms, and complete ML pipelines using Scikit-learn.
Read Article