Understanding Build Bottlenecks
Before optimizing, you must understand where your build spends time. A typical C++ build pipeline has three major phases — compilation, linking, and dependency resolution — each with distinct characteristics and optimization strategies. The CMake build system documentation describes how targets flow through these phases.
flowchart LR
A[Source Files] --> B[Preprocessing]
B --> C[Compilation]
C --> D[Assembly]
D --> E[Object Files]
E --> F[Linking]
F --> G[Binary]
style C fill:#BF092F,color:#fff
style F fill:#16476A,color:#fff
style B fill:#3B9797,color:#fff
In most projects, compilation (parsing headers, template instantiation, code generation) consumes 70–90% of total build time. Linking dominates for large monolithic binaries with heavy template use or LTO enabled. Dependency resolution (CMake configure step, package downloads) matters most on cold CI environments.
Measuring Where Time Goes
Before optimizing, measure your baseline. CMake 3.18+ supports profiling output:
# Generate profiling data during configure
cmake -B build -S . --profiling-output=cmake-profile.json --profiling-format=google-trace
# Time a full build with Ninja (shows per-target timing)
cmake --build build --parallel -- -d stats
# Time a full build with Make
time cmake --build build -- -j$(nproc)
# Ninja's built-in build log for analysis
ninja -C build -t compdb > compile_commands.json
Precompiled Headers (PCH)
Precompiled headers serialize the compiler's internal representation of frequently-included headers into a binary file, eliminating redundant parsing across translation units. CMake 3.16+ provides native PCH support via target_precompile_headers().
flowchart TD
subgraph Without PCH
A1[main.cpp] --> H1[vector]
A1 --> H2[string]
A1 --> H3[map]
B1[utils.cpp] --> H1
B1 --> H2
B1 --> H4[algorithm]
C1[engine.cpp] --> H1
C1 --> H2
C1 --> H3
end
subgraph With PCH
PCH[pch.h.gch] --> H5[vector]
PCH --> H6[string]
PCH --> H7[map]
PCH --> H8[algorithm]
A2[main.cpp] --> PCH
B2[utils.cpp] --> PCH
C2[engine.cpp] --> PCH
end
cmake_minimum_required(VERSION 3.16)
project(MyApp LANGUAGES CXX)
add_library(core
src/engine.cpp
src/utils.cpp
src/renderer.cpp
src/physics.cpp
)
# Precompile commonly-used standard library headers
target_precompile_headers(core PRIVATE
<vector>
<string>
<unordered_map>
<memory>
<algorithm>
<functional>
<optional>
<filesystem>
)
# PUBLIC headers propagate to dependents
target_precompile_headers(core PUBLIC
<nlohmann/json.hpp>
)
Reusing PCH Across Targets
Multiple targets with similar header sets can share a single PCH using REUSE_FROM, avoiding redundant PCH compilation:
cmake_minimum_required(VERSION 3.16)
project(GameEngine LANGUAGES CXX)
# Primary library builds the PCH
add_library(engine src/engine.cpp src/renderer.cpp)
target_precompile_headers(engine PRIVATE
<vector>
<string>
<memory>
<unordered_map>
<glm/glm.hpp>
)
# Secondary targets reuse the same PCH binary
add_library(physics src/physics.cpp src/collision.cpp)
target_precompile_headers(physics REUSE_FROM engine)
add_library(audio src/audio.cpp src/mixer.cpp)
target_precompile_headers(audio REUSE_FROM engine)
# Executable also reuses
add_executable(game src/main.cpp)
target_precompile_headers(game REUSE_FROM engine)
target_link_libraries(game PRIVATE engine physics audio)
REUSE_FROM requires identical compile definitions and include paths between the source and reusing targets. Mismatches cause subtle ODR violations. Also avoid putting project-specific headers in PCH — only include stable, rarely-changed headers (STL, third-party libraries). Frequently-changing headers in PCH cause full rebuilds.
cmake_minimum_required(VERSION 3.16)
project(Selective LANGUAGES CXX)
add_library(mylib src/a.cpp src/b.cpp src/c.cpp)
target_precompile_headers(mylib PRIVATE <vector> <string>)
# Exclude specific files from using PCH
set_source_files_properties(src/c.cpp PROPERTIES
SKIP_PRECOMPILE_HEADERS ON
)
Unity/Jumbo Builds
Unity builds concatenate multiple source files into a single translation unit, reducing repeated header parsing and enabling better cross-file optimization. CMake 3.16+ supports this natively via the UNITY_BUILD target property.
cmake_minimum_required(VERSION 3.16)
project(LargeProject LANGUAGES CXX)
add_library(core
src/module_a.cpp
src/module_b.cpp
src/module_c.cpp
src/module_d.cpp
src/module_e.cpp
src/module_f.cpp
src/module_g.cpp
src/module_h.cpp
)
# Enable unity build for this target
set_target_properties(core PROPERTIES
UNITY_BUILD ON
UNITY_BUILD_BATCH_SIZE 8 # Files per unity source (default: 8)
)
# Or enable globally for all targets
set(CMAKE_UNITY_BUILD ON)
set(CMAKE_UNITY_BUILD_BATCH_SIZE 6)
Unity Build Tradeoffs and Exclusions
Unity builds can cause issues with static variables, anonymous namespaces, and identically-named symbols across files. Exclude problematic sources:
cmake_minimum_required(VERSION 3.16)
project(UnityExample LANGUAGES CXX)
add_library(renderer
src/opengl_backend.cpp
src/vulkan_backend.cpp
src/shader_compiler.cpp
src/mesh_loader.cpp
src/texture_manager.cpp
)
set_target_properties(renderer PROPERTIES UNITY_BUILD ON)
# Exclude files with conflicting static symbols
set_source_files_properties(
src/opengl_backend.cpp
src/vulkan_backend.cpp
PROPERTIES SKIP_UNITY_BUILD_INCLUSION ON
)
# Group related files into the same unity batch
set_source_files_properties(
src/shader_compiler.cpp
src/mesh_loader.cpp
PROPERTIES UNITY_GROUP "assets"
)
cmake -B build -DCMAKE_UNITY_BUILD=OFF.
Unity Builds in Chromium
The Chromium project adopted unity builds for its 35,000+ source files and reported 30–40% faster full builds on CI. They use a batch size of 8 and maintain an exclusion list of ~200 files with naming conflicts. Incremental builds remain non-unity for developer productivity, controlled via a GN variable mapped to CMake's UNITY_BUILD in ports.
Compiler Cache Integration
Compiler caches store compilation results keyed by preprocessed source content plus compiler flags. Cache hits skip compilation entirely, turning minutes into milliseconds. CMake integrates with caches via CMAKE_<LANG>_COMPILER_LAUNCHER.
flowchart TD
A[Compile Request] --> B{Hash preprocessed source + flags}
B --> C{Cache lookup}
C -->|Hit| D[Return cached .o]
C -->|Miss| E[Run compiler]
E --> F[Store result in cache]
F --> G[Return .o]
D --> H[Done — milliseconds]
G --> I[Done — full compile time]
style D fill:#3B9797,color:#fff
style H fill:#3B9797,color:#fff
style E fill:#BF092F,color:#fff
style I fill:#BF092F,color:#fff
cmake_minimum_required(VERSION 3.16)
project(CachedBuild LANGUAGES C CXX)
# Auto-detect ccache or sccache
find_program(CCACHE_PROGRAM ccache)
find_program(SCCACHE_PROGRAM sccache)
if(SCCACHE_PROGRAM)
set(CMAKE_C_COMPILER_LAUNCHER "${SCCACHE_PROGRAM}")
set(CMAKE_CXX_COMPILER_LAUNCHER "${SCCACHE_PROGRAM}")
message(STATUS "Using sccache: ${SCCACHE_PROGRAM}")
elseif(CCACHE_PROGRAM)
set(CMAKE_C_COMPILER_LAUNCHER "${CCACHE_PROGRAM}")
set(CMAKE_CXX_COMPILER_LAUNCHER "${CCACHE_PROGRAM}")
message(STATUS "Using ccache: ${CCACHE_PROGRAM}")
endif()
add_executable(app src/main.cpp src/engine.cpp)
sccache — Shared Compilation Cache
Mozilla's sccache supports cloud-backed storage (S3, GCS, Azure Blob), making it ideal for distributed teams sharing cache hits across CI and developer machines:
# Install sccache
cargo install sccache
# Or via package managers
brew install sccache # macOS
choco install sccache # Windows
# Configure S3 backend for team-wide sharing
export SCCACHE_BUCKET="my-team-build-cache"
export SCCACHE_REGION="us-east-1"
export SCCACHE_S3_USE_SSL=true
# Start the sccache server
sccache --start-server
# Configure CMake to use sccache
cmake -B build -S . \
-DCMAKE_C_COMPILER_LAUNCHER=sccache \
-DCMAKE_CXX_COMPILER_LAUNCHER=sccache
# Check cache statistics after a build
sccache --show-stats
# Optimize ccache hit rates
# Set ccache to ignore __DATE__ and __TIME__ macros
export CCACHE_SLOPPINESS="time_macros,include_file_mtime,file_stat_matches"
# Increase cache size for large projects
ccache --max-size=20G
# Enable compression to fit more entries
export CCACHE_COMPRESS=1
export CCACHE_COMPRESSLEVEL=6
# Share cache across branches (hash content, not path)
export CCACHE_BASEDIR="${HOME}/projects"
# View hit/miss statistics
ccache --show-stats
ccache --zero-stats # Reset counters
CCACHE_SLOPPINESS setting is crucial for cache hit rates. Without time_macros, any file using __DATE__ or __TIME__ will never cache. Without include_file_mtime, touching a header without changing its content causes misses. Expect 85–95% hit rates on incremental developer builds.
Parallel Compilation
Modern hardware has many cores — using them all dramatically reduces wall-clock build time. CMake provides multiple mechanisms for parallel compilation depending on the generator and platform.
cmake_minimum_required(VERSION 3.12)
project(ParallelBuild LANGUAGES CXX)
# Detect available processors at configure time
include(ProcessorCount)
ProcessorCount(NPROC)
if(NOT NPROC EQUAL 0)
message(STATUS "Detected ${NPROC} processors")
endif()
add_executable(app src/main.cpp src/module_a.cpp src/module_b.cpp)
# MSVC parallel compilation within a single target
if(MSVC)
target_compile_options(app PRIVATE /MP${NPROC})
endif()
# Method 1: CMAKE_BUILD_PARALLEL_LEVEL environment variable
export CMAKE_BUILD_PARALLEL_LEVEL=16
cmake --build build
# Method 2: --parallel flag (CMake 3.12+)
cmake --build build --parallel 16
# Method 3: Pass directly to underlying build tool
cmake --build build -- -j16 # Make/Ninja
cmake --build build -- /maxcpucount:16 # MSBuild
# Method 4: Ninja automatic detection (uses all cores by default)
cmake -G Ninja -B build -S .
cmake --build build # Ninja auto-detects core count
# Keep one core free for system responsiveness
cmake --build build --parallel $(($(nproc) - 1))
ninja -j$(nproc) -l$(nproc) limits load average. For 16 GB RAM machines, -j4 to -j8 is often optimal despite having 16 cores.
Link-Time Optimization (LTO)
LTO allows the compiler to optimize across translation unit boundaries during linking, producing smaller and faster binaries. However, it significantly increases link time and memory usage. CMake provides native LTO support via the INTERPROCEDURAL_OPTIMIZATION property.
CMake LTO Integration
cmake_minimum_required(VERSION 3.9)
project(LTOProject LANGUAGES CXX)
# Check if LTO is supported by the compiler
include(CheckIPOSupported)
check_ipo_supported(RESULT ipo_supported OUTPUT ipo_error)
add_executable(app src/main.cpp src/engine.cpp src/utils.cpp)
if(ipo_supported)
# Enable LTO for Release builds only
set_target_properties(app PROPERTIES
INTERPROCEDURAL_OPTIMIZATION_RELEASE ON
)
message(STATUS "LTO enabled for Release builds")
else()
message(WARNING "LTO not supported: ${ipo_error}")
endif()
# Or enable globally for all targets in Release
set(CMAKE_INTERPROCEDURAL_OPTIMIZATION_RELEASE ON)
cmake_minimum_required(VERSION 3.9)
project(ThinLTO LANGUAGES CXX)
add_executable(app src/main.cpp src/module_a.cpp src/module_b.cpp)
# Thin LTO — parallel, lower memory, nearly same optimization
# GCC/Clang: -flto=thin (Clang) or -flto=auto (GCC for parallel)
if(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
target_compile_options(app PRIVATE -flto=thin)
target_link_options(app PRIVATE -flto=thin)
elseif(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
target_compile_options(app PRIVATE -flto=auto -ffat-lto-objects)
target_link_options(app PRIVATE -flto=auto)
endif()
# Parallel LTO linking (Clang)
if(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
include(ProcessorCount)
ProcessorCount(NPROC)
target_link_options(app PRIVATE
"LINKER:--thinlto-jobs=${NPROC}"
)
endif()
Thin LTO vs Full LTO — Firefox Build
Mozilla's Firefox project benchmarks show: Full LTO produces 10–15% faster runtime code but increases link time from 2 minutes to 45 minutes with 32 GB peak memory. Thin LTO achieves 8–12% improvement with only 8 minutes link time and 12 GB peak memory. The compromise: use Thin LTO in CI, Full LTO for release builds only.
Reducing Header Dependencies
The single most impactful long-term strategy for build performance is reducing the transitive header inclusion graph. Every unnecessary #include multiplies parsing work across all translation units that include that header.
// BAD: widget.h pulls in entire engine dependency tree
// widget.h
#include "engine.h" // 50,000 lines of transitive includes
#include "renderer.h" // 30,000 lines
#include <vector>
#include <string>
class Widget {
Engine* engine_;
Renderer* renderer_;
std::vector<std::string> labels_;
public:
void render();
};
// GOOD: Forward declarations minimize header dependencies
// widget.h
#include <vector>
#include <string>
// Forward declarations — no #include needed for pointer/reference types
class Engine;
class Renderer;
class Widget {
Engine* engine_;
Renderer* renderer_;
std::vector<std::string> labels_;
public:
void render();
};
// widget.cpp — includes only here, in the translation unit
#include "widget.h"
#include "engine.h"
#include "renderer.h"
void Widget::render() {
engine_->beginFrame();
renderer_->draw(labels_);
}
// PIMPL (Pointer to Implementation) — complete header isolation
// database.h — stable ABI, minimal includes
#include <memory>
#include <string>
class Database {
public:
Database(const std::string& connection_string);
~Database();
Database(Database&&) noexcept;
Database& operator=(Database&&) noexcept;
bool execute(const std::string& query);
int rowCount() const;
private:
struct Impl;
std::unique_ptr<Impl> impl_;
};
// database.cpp — heavy includes only here
#include "database.h"
#include <pqxx/pqxx> // PostgreSQL — only compiled once
#include <spdlog/spdlog.h> // Logging library
#include <nlohmann/json.hpp> // JSON parsing
struct Database::Impl {
pqxx::connection conn;
spdlog::logger logger;
int last_row_count = 0;
};
Database::Database(const std::string& cs) : impl_(std::make_unique<Impl>()) {
impl_->conn = pqxx::connection(cs);
}
Database::~Database() = default;
Database::Database(Database&&) noexcept = default;
Database& Database::operator=(Database&&) noexcept = default;
cmake_minimum_required(VERSION 3.16)
project(IWYU_Integration LANGUAGES CXX)
# Include-What-You-Use integration
find_program(IWYU_PROGRAM NAMES include-what-you-use iwyu)
add_executable(app src/main.cpp src/widget.cpp src/database.cpp)
if(IWYU_PROGRAM)
set_target_properties(app PROPERTIES
CXX_INCLUDE_WHAT_YOU_USE "${IWYU_PROGRAM};-Xiwyu;--mapping_file=${CMAKE_SOURCE_DIR}/iwyu.imp"
)
message(STATUS "IWYU enabled: ${IWYU_PROGRAM}")
endif()
#include "heavy.h" to a forward declaration in a widely-included header can save minutes on full rebuilds. Run include-what-you-use periodically to identify unnecessary includes.
Object Libraries for Faster Iteration
CMake OBJECT libraries produce compiled object files without creating an archive or shared library. This avoids the linking step entirely for intermediate build products, speeding up incremental development cycles.
cmake_minimum_required(VERSION 3.12)
project(ObjectLibs LANGUAGES CXX)
# Object library — compiles but doesn't link
add_library(core_objects OBJECT
src/engine.cpp
src/renderer.cpp
src/physics.cpp
src/audio.cpp
)
target_include_directories(core_objects PUBLIC include)
target_compile_features(core_objects PUBLIC cxx_std_17)
# Multiple final targets share the same object files
# No re-compilation, just different linking
add_executable(game
src/main.cpp
$<TARGET_OBJECTS:core_objects>
)
add_executable(editor
src/editor_main.cpp
$<TARGET_OBJECTS:core_objects>
)
# Test executable reuses objects without rebuilding
add_executable(tests
tests/test_engine.cpp
tests/test_physics.cpp
$<TARGET_OBJECTS:core_objects>
)
# Modern CMake: link against OBJECT library directly (3.12+)
add_executable(benchmark src/benchmark.cpp)
target_link_libraries(benchmark PRIVATE core_objects)
Ninja as the Preferred Generator
Ninja consistently outperforms Make for C++ projects due to its minimal overhead, superior dependency tracking, and optimized scheduling. It was specifically designed for large codebases where Make's startup time and serialized recipe evaluation become bottlenecks.
# Generate Ninja build files (single-config)
cmake -G Ninja -B build -S . -DCMAKE_BUILD_TYPE=Release
# Generate Ninja Multi-Config (build all configurations from one tree)
cmake -G "Ninja Multi-Config" -B build -S .
cmake --build build --config Release
cmake --build build --config Debug
# Install Ninja
pip install ninja # Python/pip (cross-platform)
brew install ninja # macOS
apt install ninja-build # Ubuntu/Debian
choco install ninja # Windows
cmake_minimum_required(VERSION 3.17)
project(NinjaMultiConfig LANGUAGES CXX)
# Ninja Multi-Config: specify all desired configurations
set(CMAKE_CONFIGURATION_TYPES "Debug;Release;RelWithDebInfo" CACHE STRING "" FORCE)
# Per-config output directories (avoid collisions)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY_DEBUG ${CMAKE_BINARY_DIR}/Debug)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY_RELEASE ${CMAKE_BINARY_DIR}/Release)
add_executable(app src/main.cpp src/engine.cpp)
# Cross-config dependencies work automatically
add_custom_target(check
COMMAND $<TARGET_FILE:app> --self-test
DEPENDS app
)
LLVM Build — Ninja vs Unix Makefiles
Building LLVM (~3,500 targets) on a 32-core machine: Ninja completes no-op builds in 0.4 seconds vs Make's 12 seconds. Full parallel builds complete 5–8% faster with Ninja due to better job scheduling and reduced fork overhead. The difference is more dramatic on Windows where process creation is expensive — Ninja avoids spawning shells for each recipe.
Distributed Compilation
For very large projects, distributing compilation across multiple machines provides near-linear speedup. Tools like distcc, icecream, and Incredibuild integrate transparently with CMake via the compiler launcher mechanism.
cmake_minimum_required(VERSION 3.16)
project(Distributed LANGUAGES CXX)
# distcc — distribute compilation across network hosts
find_program(DISTCC_PROGRAM distcc)
if(DISTCC_PROGRAM)
set(CMAKE_C_COMPILER_LAUNCHER "${DISTCC_PROGRAM}")
set(CMAKE_CXX_COMPILER_LAUNCHER "${DISTCC_PROGRAM}")
message(STATUS "Using distcc: ${DISTCC_PROGRAM}")
endif()
# icecream (icecc) — automatic load balancing
find_program(ICECC_PROGRAM icecc)
if(ICECC_PROGRAM)
set(CMAKE_C_COMPILER_LAUNCHER "${ICECC_PROGRAM}")
set(CMAKE_CXX_COMPILER_LAUNCHER "${ICECC_PROGRAM}")
endif()
add_executable(app src/main.cpp src/engine.cpp src/renderer.cpp)
# distcc setup — configure available build hosts
export DISTCC_HOSTS="localhost/8 build-server-1/16 build-server-2/16"
# Verify host connectivity
distcc --show-hosts
# Set parallelism to match total distributed core count
# Local(8) + Server1(16) + Server2(16) = 40 jobs
cmake --build build --parallel 40
# icecream setup — automatic scheduler-based distribution
# Start the scheduler on one machine
icecc-scheduler &
# Start the daemon on each build node
iceccd --nice 5 --max-jobs 16 &
# Monitor distributed builds
icemon # GUI monitor showing job distribution
# Combine with ccache: ccache wraps distcc
export CCACHE_PREFIX="distcc"
cmake -B build -DCMAKE_CXX_COMPILER_LAUNCHER="ccache"
Build Profiling
To identify the specific files and operations consuming the most build time, use compiler-specific profiling features and CMake's own profiling output.
cmake_minimum_required(VERSION 3.18)
project(BuildProfiling LANGUAGES CXX)
add_executable(app src/main.cpp src/engine.cpp src/heavy_templates.cpp)
# Clang: -ftime-trace generates per-file Chrome trace JSON
if(CMAKE_CXX_COMPILER_ID MATCHES "Clang")
target_compile_options(app PRIVATE -ftime-trace)
endif()
# MSVC: /d1reportTime shows per-header parsing time
if(MSVC)
target_compile_options(app PRIVATE /d1reportTime)
endif()
# GCC: -ftime-report prints per-pass timing
if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")
target_compile_options(app PRIVATE -ftime-report)
endif()
# CMake configure-step profiling (3.18+)
cmake -B build -S . \
--profiling-output=cmake-profile.json \
--profiling-format=google-trace
# Open cmake-profile.json in Chrome's trace viewer:
# chrome://tracing or https://ui.perfetto.dev
# Clang -ftime-trace: generates .json next to each .o file
# View in Chrome tracing or ClangBuildAnalyzer
cmake --build build --parallel
# Analyze all trace files with ClangBuildAnalyzer
ClangBuildAnalyzer --all build/ capture.bin
ClangBuildAnalyzer --analyze capture.bin
# Ninja build graph visualization
ninja -C build -t graph | dot -Tsvg > build-graph.svg
# Ninja: show longest compilation paths (critical path)
ninja -C build -t targets depth 3
-ftime-trace is the most actionable profiling tool — it shows exactly which headers, template instantiations, and code generation phases consume time per file. Run ClangBuildAnalyzer to aggregate across all translation units and identify the "most expensive headers" and "slowest template instantiations" project-wide.
Identifying a 40-Second Header
Using -ftime-trace on a game engine project revealed that boost/spirit.hpp was transitively included in 180 translation units, adding 40 seconds per file of template instantiation. Moving the Spirit-dependent parser into a single .cpp file and exposing only a simple interface reduced total build time from 22 minutes to 8 minutes — a 63% improvement from one header refactoring.
C++20 Modules Impact on Build Times
C++20 modules fundamentally change the compilation model by replacing textual inclusion with pre-compiled binary module interfaces (BMIs). This eliminates redundant parsing — each module is compiled once into a BMI that dependents consume directly, similar to PCH but with proper encapsulation and dependency tracking.
// math_utils.cppm — Module interface unit
export module math_utils;
import <cmath>;
import <vector>;
import <algorithm>;
export namespace math {
double magnitude(const std::vector<double>& vec) {
double sum = 0.0;
for (auto v : vec) sum += v * v;
return std::sqrt(sum);
}
std::vector<double> normalize(std::vector<double> vec) {
double mag = magnitude(vec);
std::for_each(vec.begin(), vec.end(), [mag](double& v){ v /= mag; });
return vec;
}
}
// main.cpp — Consumer only imports the module interface
import math_utils; // Binary import — no re-parsing of cmath, vector, algorithm
#include <iostream>
int main() {
std::vector<double> v = {3.0, 4.0};
auto n = math::normalize(v);
std::cout << "Magnitude: " << math::magnitude(v) << "\n";
return 0;
}
cmake_minimum_required(VERSION 3.28)
project(ModulesExample LANGUAGES CXX)
# C++20 modules require CMake 3.28+ and supported compilers
set(CMAKE_CXX_STANDARD 20)
set(CMAKE_CXX_STANDARD_REQUIRED ON)
# Experimental module support
set(CMAKE_EXPERIMENTAL_CXX_MODULE_CMAKE_API "aa1f7df0-828a-4fcd-9afc-2dc80491dar7")
set(CMAKE_EXPERIMENTAL_CXX_MODULE_DYNDEP ON)
add_library(math_module)
target_sources(math_module
PUBLIC
FILE_SET CXX_MODULES FILES
src/math_utils.cppm
)
add_executable(app src/main.cpp)
target_link_libraries(app PRIVATE math_module)
dyndep), but Make generators do not support modules well. Always use Ninja when building with modules.
Conclusion & Next Steps
Build performance optimization is not a single switch — it's a layered strategy combining multiple techniques. Here's a recommended adoption order based on effort-to-impact ratio:
- Switch to Ninja — Zero-effort, immediate 5–15% improvement on incremental builds
- Enable ccache/sccache — One-line CMake change, massive improvement on repeated builds
- Add precompiled headers — Moderate effort, 20–50% faster compilation
- Enable unity builds for CI — Low effort, 30–70% faster clean builds
- Reduce header dependencies — High effort, highest long-term payoff (architectural change)
- Profile with -ftime-trace — Identifies specific bottlenecks unique to your project
- Consider C++20 modules — Future-facing, best benefit for new code
- Distributed compilation — Infrastructure investment, beneficial for 100k+ LOC projects
Next in the Series
In Part 31: Apple Platform Development, we'll explore building for macOS, iOS, tvOS, and watchOS with CMake — including Universal Binaries, code signing, framework bundles, and Xcode project generation.