Part 4: Essential Linux Command Line Mastery

Before becoming proficient on the command line, you need fluency in navigating the filesystem. The combination of a few key commands with the right flags covers 80% of what you'll do every day.

# === Navigation ===
pwd                  # Print current directory (Print Working Directory)
cd /var/log          # Change to absolute path
cd ~                 # Go to home directory
cd -                 # Go to previous directory (very useful!)
cd ../..             # Go up two levels

# === Listing files ===
ls                   # Basic listing
ls -la               # Long format, show hidden files (starting with .)
ls -lh               # Human-readable sizes (KB, MB, GB)
ls -lt               # Sort by modification time (newest first)
ls -lS               # Sort by size (largest first)
ls --color=auto      # Colorised output (usually default)

# Tree view (install: apt install tree / brew install tree)
tree /etc/nginx/ -L 2  # Show 2 levels deep
tree -d /var/lib/      # Directories only

# === Paths ===
# Absolute path: starts from / (root)
# Relative path: relative to current directory
# ~ expands to $HOME (/home/username or /root)
# . is current directory, .. is parent directory

File Operations: cp, mv, rm, find

# === Copying ===
cp file.txt backup.txt           # Copy file
cp -r dir/ backup-dir/           # Copy directory recursively
cp -p file.txt dest/             # Preserve timestamps and permissions
cp -u src/ dest/ -r              # Copy only if source is newer

# === Moving & Renaming ===
mv old-name.txt new-name.txt     # Rename (same as moving within same FS)
mv *.log /var/archive/logs/      # Move multiple files with glob

# === Deleting ===
rm file.txt                      # Delete file
rm -rf directory/                # Delete directory recursively (DANGEROUS — no undo!)
# Safer alternative: move to /tmp first
mv risky-dir/ /tmp/risky-dir-backup/

# === Finding files ===
find /var/log -name "*.log"                    # Find by filename pattern
find /home -type f -size +100M                 # Files larger than 100MB
find . -type f -mtime -7                       # Modified in last 7 days
find /etc -name "*.conf" -user root            # .conf files owned by root
find . -name "*.py" -exec grep -l "import os" {} \;  # Files containing "import os"

# Find and delete (with confirmation prompt pattern)
find /tmp -name "*.tmp" -mtime +30 -print      # Preview first
find /tmp -name "*.tmp" -mtime +30 -delete     # Then delete

Text Processing Powerhouse

The real power of the Linux command line is text processing. Logs, configs, data files — they're all text, and Linux provides a suite of composable tools to extract, transform, and summarise text at any scale.

# === grep — find lines matching a pattern ===
grep "ERROR" /var/log/syslog            # Basic pattern search
grep -i "error" app.log                 # Case-insensitive
grep -n "WARN" app.log                  # Show line numbers
grep -c "Exception" app.log             # Count matching lines
grep -v "DEBUG" app.log                 # Invert: show lines NOT matching
grep -r "TODO" ./src/                   # Recursive search in directory
grep -E "error|warn|crit" app.log       # Extended regex (alternation)
grep -A 3 "FATAL" app.log              # Show 3 lines AFTER each match
grep -B 2 -A 5 "OutOfMemory" app.log   # 2 before, 5 after (context)
grep -o "[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}" access.log | sort -u  # Extract unique IPs

awk — Field Processing

awk processes text line by line, splitting each line into fields. It's indispensable for working with structured text like log files, CSV data, and command output.

# awk 'pattern { action }' file
# $0=whole line, $1=field1, $2=field2, NF=number of fields, NR=line number

# Print the 2nd field (space-separated by default)
echo "John 42 London" | awk '{print $2}'   # 42

# Print specific fields from ls -la output
ls -la /etc | awk '{print $1, $9}'         # permissions + filename

# Custom field separator (CSV example)
echo "alice,25,engineer" | awk -F',' '{print $1, "is", $2, "years old"}'

# Sum a column (e.g., total memory from ps output)
ps aux | awk 'NR>1 {sum += $6} END {print "Total RSS:", sum/1024, "MB"}'

# Filter: print lines where field 3 > 100
awk '$3 > 100 {print $0}' data.txt

# Real-world: top 10 slowest requests from nginx access log
# Format: IP - - [date] "METHOD /path HTTP/1.1" status bytes response_time
awk '{print $NF, $7}' access.log | sort -rn | head -10

# BEGIN/END blocks run before/after processing
awk 'BEGIN{print "=== Results ==="} /ERROR/{count++} END{print count, "errors found"}' app.log

sed — Stream Editing

# sed — edit text streams with substitution, deletion, insertion

# Basic substitution: s/pattern/replacement/flags
sed 's/foo/bar/' file.txt            # Replace first occurrence per line
sed 's/foo/bar/g' file.txt           # Replace ALL occurrences per line (global)
sed 's/foo/bar/gi' file.txt          # Global, case-insensitive
sed -i 's/localhost/db.prod/g' config.yaml  # Edit file IN-PLACE (careful!)
sed -i.bak 's/old/new/g' file.conf   # Edit in-place, keep .bak backup

# Delete lines
sed '/^#/d' config.conf              # Delete comment lines (starting with #)
sed '/^$/d' file.txt                 # Delete blank lines
sed '5,10d' file.txt                 # Delete lines 5-10

# Extract lines
sed -n '10,20p' large-file.txt       # Print only lines 10-20
sed -n '/START/,/END/p' file.txt     # Print between START and END markers

# Multi-command
sed -e 's/foo/bar/g' -e '/^#/d' config.txt  # Multiple operations

sort, uniq, cut, tr

# === sort ===
sort file.txt                  # Alphabetical sort
sort -n numbers.txt            # Numeric sort
sort -rn numbers.txt           # Reverse numeric sort
sort -t',' -k3 -n data.csv     # Sort CSV by 3rd column numerically
sort -u file.txt               # Sort and remove duplicates

# === uniq (must sort first — works on adjacent duplicates) ===
sort access.log | uniq -c      # Count occurrences of each line
sort ips.txt | uniq -d         # Show only duplicate lines
sort ips.txt | uniq -u         # Show only unique lines

# === cut — extract fields/columns ===
cut -d',' -f1,3 data.csv       # Fields 1 and 3 from CSV
cut -c1-10 file.txt            # Characters 1-10 from each line
ls -la | cut -c1-10            # Just the permissions column

# === tr — translate characters ===
echo "Hello World" | tr 'a-z' 'A-Z'   # Uppercase
echo "hello:world:foo" | tr ':' '\n'   # Replace : with newline
echo "line1" | tr -d '\n'             # Delete newlines
cat file.txt | tr -s ' '             # Squeeze multiple spaces to one

Pipes, Redirection & Streams

Every process in Linux has three standard streams: stdin (0), stdout (1), and stderr (2). Redirection and pipes connect these streams, enabling powerful data pipelines.

# === Redirection ===
ls -la > output.txt            # Redirect stdout to file (overwrite)
ls -la >> output.txt           # Redirect stdout to file (append)
ls /nonexistent 2> errors.txt  # Redirect stderr to file
ls /real /fake > out.txt 2>&1  # Both stdout AND stderr to same file
ls /real /fake &> all.txt      # Shorter: both streams to file (bash 4+)
command < input.txt            # Feed file as stdin to command

# /dev/null — the black hole (discard output)
noisy_command > /dev/null      # Discard stdout
noisy_command 2>/dev/null      # Discard stderr
noisy_command &>/dev/null      # Discard everything

# === Pipes ===
# | connects stdout of left command to stdin of right command
cat /var/log/syslog | grep "error" | tail -20

# tee — split output to file AND stdout simultaneously
./build.sh | tee build.log             # See output AND save to file
./build.sh 2>&1 | tee full-output.log  # Include stderr too

# xargs — convert stdin to command arguments
find . -name "*.log" | xargs rm        # Delete all found files
cat urls.txt | xargs -P 4 curl -O      # Parallel downloads (4 workers)
echo "file1.txt file2.txt" | xargs wc -l  # Word count each file

# Process substitution — treat command output as a file
diff <(sort file1.txt) <(sort file2.txt)  # Compare sorted versions

                            
                            Pipeline Mental Model: A pipeline like cat log | grep ERROR | awk '{print $5}' | sort | uniq -c | sort -rn | head -10 is not sequential batch processing — all processes run simultaneously. The OS connects their stdin/stdout via kernel pipe buffers, and each process blocks only when its buffer is full (producer) or empty (consumer). This is why pipelines are efficient even for very large files.
                        

Process Management

# === Viewing processes ===
ps aux                          # All processes (BSD syntax)
ps -ef                          # All processes (POSIX syntax)
ps aux | grep python            # Find python processes
pgrep nginx                     # Get PIDs of nginx processes
pstree -p                       # Show process hierarchy with PIDs

# htop / top — interactive process viewers
# top: press 'M' for memory sort, 'P' for CPU sort, 'k' to kill, 'q' to quit
# htop: more user-friendly, mouse support

# === Killing processes ===
kill PID                        # Send SIGTERM (graceful shutdown request)
kill -9 PID                     # Send SIGKILL (immediate, uncatchable)
kill -HUP PID                   # Send SIGHUP (often means "reload config")
killall nginx                   # Kill all processes named nginx
pkill -f "python app.py"        # Kill by matching full command line

# === Background & foreground jobs ===
long-running-command &          # Start in background
Ctrl+Z                          # Suspend foreground process
bg                              # Resume suspended process in background
fg                              # Bring background job to foreground
jobs                            # List background jobs
wait                            # Wait for all background jobs to finish

# === nohup — survive terminal close ===
nohup python3 server.py > server.log 2>&1 &
# Process keeps running even if SSH session disconnects

Environment Variables & Shell Config

# === View and set variables ===
env                              # Show all environment variables
echo $HOME                      # Access a variable
echo $PATH                      # Command search path (colon-separated)
echo $USER                      # Current username

# Set for current session
export MY_VAR="hello"
export PATH="$PATH:/usr/local/myapp/bin"  # Add to PATH

# Unset a variable
unset MY_VAR

# === Shell config files (bash) ===
# ~/.bashrc     — run for interactive non-login shells (new terminal tabs)
# ~/.bash_profile or ~/.profile — run for login shells (SSH sessions)
# /etc/profile  — system-wide login shell config
# /etc/bash.bashrc — system-wide interactive shell config

# Apply changes to current session without reopening
source ~/.bashrc   # or: . ~/.bashrc

# === Useful variable tricks ===
# Default value if unset
echo ${MY_VAR:-"default_value"}

# Required variable (error if unset)
: ${DB_PASSWORD:?"DB_PASSWORD must be set"}

# Subshell variable (set only for one command)
MY_ENV=production ./deploy.sh   # MY_ENV only exists for deploy.sh

# === which, type, command ===
which python3          # Full path of the python3 binary
type ls                # Whether ls is alias, builtin, or binary
command -v docker      # Reliable way to check if command exists

Disk Space & File Size Tools

# === df — disk free space (per filesystem) ===
df -h                    # Human-readable sizes
df -h /                  # Just the root filesystem
df -hT                   # Include filesystem type

# === du — disk usage (per directory) ===
du -sh /var/log          # Total size of /var/log
du -sh /var/log/*        # Size of each item in /var/log
du -h --max-depth=1 /    # Size of each top-level directory
du -sh * | sort -h       # Sort by size (human-readable)

# Find the 10 largest files anywhere on the system
find / -xdev -type f -printf '%s %p\n' 2>/dev/null | sort -rn | head -10

# Find the 10 largest directories
du -hx --max-depth=3 / 2>/dev/null | sort -rh | head -10

# === ncdu — interactive disk usage explorer ===
# sudo apt install ncdu
ncdu /var  # Interactive tree view with sizes

Real-World Pipelines

Production Patterns

Five Pipelines You'll Use in Production

# 1. Top 10 IP addresses in nginx access log
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

# 2. Count HTTP status codes
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

# 3. Find all files modified in the last 24h and show their sizes
find /var -mtime -1 -type f | xargs ls -lh 2>/dev/null | sort -k5 -rh | head -20

# 4. Monitor a log file in real-time, highlighting errors
tail -f /var/log/app.log | grep --color=always -E "ERROR|WARN|$"

# 5. Check which ports are listening and which process owns them
ss -tlnp | grep LISTEN
# or: sudo netstat -tlnp (if ss not available)

OpsDebuggingLog Analysis

Exercises

# Exercise 1: Build a pipeline to analyse /var/log/syslog (or /var/log/messages)
# Goal: find the top 5 most common hostnames/services logging messages
cat /var/log/syslog 2>/dev/null || cat /var/log/messages 2>/dev/null | \
    awk '{print $5}' | sort | uniq -c | sort -rn | head -5

# Exercise 2: Find large files eating disk space
find /var -type f -size +50M 2>/dev/null -exec ls -lh {} \; | awk '{print $5, $9}' | sort -rh

# Exercise 3: Extract all unique email addresses from a file
# (Create a test file first)
echo -e "Contact support@example.com or admin@test.org\nAlso: noreply@company.com and support@example.com" > /tmp/emails.txt
grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' /tmp/emails.txt | sort -u

# Exercise 4: Monitor CPU usage of a specific process every 2 seconds
# (Replace 'bash' with any running process name)
watch -n 2 "ps aux | grep bash | grep -v grep | awk '{print \$1, \$3\"%\", \$11}'"

# Exercise 5: Process environment inspection
env | sort | head -20     # See sorted environment variables
echo "PATH contains $(echo $PATH | tr ':' '\n' | wc -l) directories"

Conclusion & Next Steps

The Linux command line is a composable toolkit, not a collection of memorised commands. The key skills are: knowing which tool to reach for (grep for filtering, awk for fields, sed for substitution, find for locating), understanding stdin/stdout/stderr and how pipes connect them, and building pipelines by composing simple tools. These skills compound — every new command you learn multiplies with everything you already know.

Previous Part 3: Linux Fundamentals Next Part 5: Process Management

Cookie Consent

Part 4: Essential Linux Command Line Mastery

Table of Contents

Navigation & File System

File Operations: cp, mv, rm, find

Text Processing Powerhouse

awk — Field Processing

sed — Stream Editing

sort, uniq, cut, tr

Pipes, Redirection & Streams

Process Management

Environment Variables & Shell Config

Disk Space & File Size Tools

Real-World Pipelines

Five Pipelines You'll Use in Production

Exercises

Conclusion & Next Steps

Cookie Consent

Part 4: Essential Linux Command Line Mastery

Table of Contents

Navigation & File System

File Operations: cp, mv, rm, find

Text Processing Powerhouse

awk — Field Processing

sed — Stream Editing

sort, uniq, cut, tr

Pipes, Redirection & Streams

Process Management

Environment Variables & Shell Config

Disk Space & File Size Tools

Real-World Pipelines

Five Pipelines You'll Use in Production

Exercises

Conclusion & Next Steps

Continue the Series

Part 3: Linux Fundamentals — Architecture & Philosophy

Part 5: Process Management

Computing & Systems Foundations — Full Series