Back to Software Engineering & Delivery Mastery Series

Part 13: Artifact Management & Build Provenance

May 13, 2026 Wasil Zafar 38 min read

Your build system produces artifacts. But where do they go? How do you know they haven't been tampered with? This article covers the complete artifact lifecycle — from container registries and image tagging to SBOMs, SLSA provenance, and the supply chain integrity practices that protect software from build to deployment.

Table of Contents

  1. Introduction
  2. Types of Artifacts
  3. Artifact Repositories
  4. Container Registries
  5. Image Tagging Strategies
  6. SBOM (Software Bill of Materials)
  7. Build Provenance & SLSA
  8. Artifact Promotion
  9. Cleanup & Retention
  10. Exercises
  11. Conclusion & Next Steps

Introduction — What Are Artifacts?

In Part 12, we covered how build systems transform source code into deployable software. But the build is only half the story. Once you have a compiled binary, a Docker image, or a packaged library — where does it go? How do you store it, version it, promote it through environments, and prove it hasn't been tampered with?

An artifact is any output of a build process that is intended for deployment or consumption by other systems. This includes Docker images, JAR files, npm packages, Go binaries, Helm charts, machine learning models, and documentation bundles.

Why Artifact Management Matters

Key Principle: Build once, deploy everywhere. An artifact should be built exactly once and then promoted through environments (dev → staging → production) without being rebuilt. Rebuilding introduces non-determinism — you might get a different binary from the same source code if dependencies or toolchains have changed.

Without proper artifact management, organisations face:

  • Traceability gaps — "Which version is running in production?" becomes unanswerable
  • Supply chain attacks — Unverified artifacts could contain malicious code
  • Wasted compute — Rebuilding the same artifact for every environment
  • Rollback failures — Can't revert to a known-good version if old artifacts are deleted
  • Compliance violations — Auditors require proof of what was deployed and when

Types of Artifacts

Artifact Type Format Registry/Repository Example
Container Image OCI image Docker Hub, ECR, GCR, ACR myapp:v2.1.0
Java Package .jar / .war Maven Central, Nexus mylib-1.3.0.jar
npm Package .tgz npmjs.com, GitHub Packages @org/utils-2.0.0.tgz
Python Package .whl / .tar.gz PyPI, Artifactory mylib-1.0.0-py3-none-any.whl
Go Binary Static binary GitHub Releases, GCS server-linux-amd64
Helm Chart .tgz ChartMuseum, OCI registries myapp-chart-1.2.0.tgz

The Immutability Principle

Once an artifact is published with a version tag, it must never be modified. If you need to fix a bug, you publish a new version — you never overwrite an existing one. This principle ensures that:

  • Rollbacks are always possible (the previous version still exists)
  • Audits are meaningful (version 2.1.0 always means the same thing)
  • Caching works correctly (same tag = same content, always)
  • Reproducibility is guaranteed (rebuilding from the same source yields the same artifact)

Artifact Repositories

An artifact repository is a server that stores, indexes, and serves build artifacts. It acts as the single source of truth for all deployable software in your organisation.

Repository Comparison

Repository Type Formats Supported Best For
JFrog Artifactory Universal All (Docker, Maven, npm, PyPI, Go, Helm, etc.) Enterprise, multi-format
Sonatype Nexus Universal Docker, Maven, npm, PyPI, NuGet, Go Self-hosted, OSS option
GitHub Packages Cloud Docker, npm, Maven, NuGet, RubyGems GitHub-native workflows
AWS ECR Cloud OCI images only AWS-native deployments
Google Artifact Registry Cloud Docker, Maven, npm, Python, Go, Apt GCP-native, multi-format
Azure Container Registry Cloud OCI images, Helm charts Azure-native deployments

Container Registries

Container registries are specialised artifact repositories for OCI (Docker) images. They handle layer deduplication, manifest management, and multi-architecture image support.

# Building and pushing a Docker image to a registry
# Build the image with proper tagging
docker build -t mycompany/api-server:2.1.0 .

# Tag for a specific registry
docker tag mycompany/api-server:2.1.0 \
    ghcr.io/mycompany/api-server:2.1.0

# Authenticate to the registry
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin

# Push to the registry
docker push ghcr.io/mycompany/api-server:2.1.0

# Pull from the registry (on another machine)
docker pull ghcr.io/mycompany/api-server:2.1.0

# Inspect image without pulling (check manifest)
docker manifest inspect ghcr.io/mycompany/api-server:2.1.0
# Multi-architecture builds (ARM64 + AMD64)
# Create a buildx builder
docker buildx create --name multiarch --use

# Build for multiple platforms simultaneously
docker buildx build \
    --platform linux/amd64,linux/arm64 \
    --tag ghcr.io/mycompany/api-server:2.1.0 \
    --push \
    .

# Verify the manifest includes both architectures
docker manifest inspect ghcr.io/mycompany/api-server:2.1.0
# Shows: linux/amd64 and linux/arm64 digests

Image Tagging Strategies

Why "latest" is Dangerous

Anti-Pattern: Never use :latest in production deployments. The "latest" tag is mutable — it points to whatever was last pushed. This means: (1) you can't tell what version is running, (2) different pods might pull different versions, (3) rollbacks are impossible because "latest" has already been overwritten.

Best Practices for Image Tags

Image Tagging Strategy Flow
flowchart TD
    A[Git Commit] --> B[CI Pipeline]
    B --> C[Build Image]
    C --> D[Tag: git SHA]
    C --> E[Tag: SemVer]
    C --> F[Tag: branch-buildnum]
    D --> G[Push to Registry]
    E --> G
    F --> G
    G --> H{Environment}
    H --> I["Dev: branch-buildnum"]
    H --> J["Staging: git SHA"]
    H --> K["Production: SemVer"]

    style K fill:#BF092F,color:#fff
    style J fill:#3B9797,color:#fff
    style I fill:#16476A,color:#fff
                            
Strategy Example Pros Cons
Git SHA myapp:a1b2c3d Unique, traceable to exact commit Not human-readable
SemVer myapp:2.1.0 Clear version communication Requires manual bump
Build number myapp:build-4521 Auto-incrementing, unique No semantic meaning
Combined myapp:2.1.0-a1b2c3d Best of both worlds Longer tag names
# Recommended: Tag with both SemVer AND git SHA
VERSION="2.1.0"
GIT_SHA=$(git rev-parse --short HEAD)
BUILD_DATE=$(date -u +"%Y-%m-%dT%H:%M:%SZ")

docker build \
    --label "org.opencontainers.image.version=${VERSION}" \
    --label "org.opencontainers.image.revision=${GIT_SHA}" \
    --label "org.opencontainers.image.created=${BUILD_DATE}" \
    -t "mycompany/api:${VERSION}" \
    -t "mycompany/api:${VERSION}-${GIT_SHA}" \
    -t "mycompany/api:${GIT_SHA}" \
    .

# All three tags point to the same image digest
# Use SemVer for human reference, SHA for exact traceability
echo "Tagged: ${VERSION}, ${VERSION}-${GIT_SHA}, ${GIT_SHA}"

SBOM — Software Bill of Materials

A Software Bill of Materials (SBOM) is a complete, machine-readable inventory of all components in a software artifact — every library, every dependency, every version. Think of it as a "nutrition label" for software.

Why SBOMs Are Becoming Mandatory

US Executive Order 14028 (May 2021) requires SBOMs for all software sold to the federal government. The EU Cyber Resilience Act extends similar requirements to all software sold in Europe. This isn't optional anymore — it's compliance.

SBOM Formats

Format Organisation Strengths Use Case
SPDX Linux Foundation ISO standard, license focus License compliance, legal
CycloneDX OWASP Security focus, VEX support Vulnerability management
# Generate SBOM using Syft (by Anchore)
# Install Syft
curl -sSfL https://raw.githubusercontent.com/anchore/syft/main/install.sh | sh -s

# Generate SBOM for a Docker image (CycloneDX format)
syft ghcr.io/mycompany/api:2.1.0 -o cyclonedx-json > sbom.json

# Generate SBOM for a local directory (SPDX format)
syft dir:./my-project -o spdx-json > sbom-spdx.json

# Generate SBOM using Trivy (combines SBOM + vulnerability scan)
trivy image --format cyclonedx \
    --output sbom.json \
    ghcr.io/mycompany/api:2.1.0

# Attach SBOM to image using Cosign
cosign attach sbom --sbom sbom.json \
    ghcr.io/mycompany/api:2.1.0

echo "SBOM generated and attached to image"

Build Provenance & SLSA Framework

SLSA (Supply-chain Levels for Software Artifacts, pronounced "salsa") is a security framework that defines levels of supply chain integrity. Each level adds stronger guarantees about how an artifact was produced.

SLSA Levels — Progressive Supply Chain Security
flowchart TD
    A["SLSA Level 0
No guarantees"] --> B["SLSA Level 1
Build provenance exists"] B --> C["SLSA Level 2
Hosted build service"] C --> D["SLSA Level 3
Hardened build platform"] A --- E["Anyone could have built this"] B --- F["We know WHO built it and HOW"] C --- G["Build ran on a trusted platform"] D --- H["Tamper-proof, isolated builds"] style A fill:#666,color:#fff style B fill:#16476A,color:#fff style C fill:#3B9797,color:#fff style D fill:#BF092F,color:#fff
SLSA Level Requirements What It Proves
Level 0 None Nothing (no provenance)
Level 1 Provenance generated, any build system Package has provenance showing how it was built
Level 2 Hosted build, signed provenance Provenance was generated by a trusted build service
Level 3 Hardened platform, isolated build Build was tamper-proof; no one could have modified it

Signing Artifacts with Sigstore/Cosign

# Cosign: Keyless signing for container images (uses Sigstore)
# Install cosign
go install github.com/sigstore/cosign/v2/cmd/cosign@latest

# Sign an image (keyless — uses OIDC identity from GitHub/Google)
cosign sign ghcr.io/mycompany/api:2.1.0
# This creates a signature in the Rekor transparency log

# Verify a signed image
cosign verify ghcr.io/mycompany/api:2.1.0 \
    --certificate-identity="https://github.com/mycompany/api/.github/workflows/build.yml@refs/tags/v2.1.0" \
    --certificate-oidc-issuer="https://token.actions.githubusercontent.com"

# Attach provenance attestation (SLSA)
cosign attest --predicate provenance.json \
    --type slsaprovenance \
    ghcr.io/mycompany/api:2.1.0

echo "Image signed and attested with SLSA provenance"
# GitHub Actions: Generate SLSA provenance automatically
name: Build and Attest
on:
  push:
    tags: ['v*']

jobs:
  build:
    runs-on: ubuntu-latest
    permissions:
      contents: read
      packages: write
      id-token: write  # Required for keyless signing
      attestations: write

    steps:
      - uses: actions/checkout@v4

      - name: Build Docker image
        run: |
          docker build -t ghcr.io/${{ github.repository }}:${{ github.ref_name }} .

      - name: Push to GHCR
        run: |
          echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u ${{ github.actor }} --password-stdin
          docker push ghcr.io/${{ github.repository }}:${{ github.ref_name }}

      - name: Generate SBOM
        uses: anchore/sbom-action@v0
        with:
          image: ghcr.io/${{ github.repository }}:${{ github.ref_name }}
          format: cyclonedx-json
          output-file: sbom.json

      - name: Sign with Cosign
        uses: sigstore/cosign-installer@v3
      - run: |
          cosign sign ghcr.io/${{ github.repository }}:${{ github.ref_name }}
          cosign attach sbom --sbom sbom.json ghcr.io/${{ github.repository }}:${{ github.ref_name }}
Supply Chain Attack SolarWinds, 2020

SolarWinds — Why Build Provenance Matters

In December 2020, it was revealed that Russian state-sponsored hackers had compromised the SolarWinds build system. They injected malicious code into the Orion software update, which was then signed with SolarWinds' legitimate certificate and distributed to 18,000+ organisations, including US government agencies.

What SLSA Level 3 would have prevented:

  • Isolated, hermetic builds would have detected the injected code (it wasn't in source control)
  • Build provenance attestation would have shown the artifact didn't match the expected source
  • Tamper-proof build logs would have recorded the modification
  • Verifiable provenance would have allowed consumers to validate the build chain

This attack was the catalyst for SLSA's creation and Executive Order 14028's requirements.

supply chain attack build compromise nation-state

Artifact Promotion

Artifact promotion is the practice of moving a single immutable artifact through environments without rebuilding. The same Docker image that passed tests in staging is the same image that runs in production — byte for byte.

Artifact Promotion Pipeline
flowchart LR
    A[Build] --> B[Dev Registry]
    B --> C{Tests Pass?}
    C -->|Yes| D[Staging Registry]
    D --> E{QA Approval?}
    E -->|Yes| F[Production Registry]
    C -->|No| G[Rejected]
    E -->|No| G

    style A fill:#132440,color:#fff
    style B fill:#16476A,color:#fff
    style D fill:#3B9797,color:#fff
    style F fill:#BF092F,color:#fff
    style G fill:#666,color:#fff
                            
# Artifact promotion: Copy image between registries (never rebuild!)
# Promote from dev to staging
crane copy \
    dev-registry.company.com/api:2.1.0-a1b2c3d \
    staging-registry.company.com/api:2.1.0-a1b2c3d

# Promote from staging to production (after QA approval)
crane copy \
    staging-registry.company.com/api:2.1.0-a1b2c3d \
    prod-registry.company.com/api:2.1.0

# Verify the digests match (proof of immutability)
crane digest dev-registry.company.com/api:2.1.0-a1b2c3d
crane digest prod-registry.company.com/api:2.1.0
# Both should output: sha256:abc123...

echo "Same artifact, promoted through environments without rebuild"
Key Insight: The critical difference between "promotion" and "rebuilding per environment" is this: if you rebuild, you might get a different artifact (non-deterministic builds, updated dependencies, different toolchain version). Promotion guarantees that what was tested is what gets deployed.
Best Practice Netflix Engineering

Netflix's Artifact Promotion Model

Netflix builds each service artifact once, tags it with a unique identifier, and promotes it through their pipeline: Build → Test → Canary → Regional → Global. The same AMI (Amazon Machine Image) that passes integration tests is the same one deployed to production regions worldwide.

Their "Spinnaker" deployment platform (now open source) manages promotion gates: automated test results, canary analysis scores, and manual approvals. An artifact can only advance if all gates pass — and it's the same binary at every stage.

immutable artifacts promotion gates Spinnaker

Cleanup & Retention Policies

Container registries and artifact repositories accumulate storage rapidly. Without cleanup policies, costs grow unbounded and registries become difficult to navigate.

Retention Policy Guidelines

Artifact Category Retention Period Rationale
Production releases Indefinite (or 2+ years) Rollback capability, audit compliance
Staging artifacts 90 days Debugging recent issues
Dev/feature branch builds 14-30 days Temporary development builds
Untagged images 7 days Intermediate build layers
# ECR lifecycle policy (AWS) — automatically delete old images
cat <<'EOF'
{
  "rules": [
    {
      "rulePriority": 1,
      "description": "Keep last 10 production images",
      "selection": {
        "tagStatus": "tagged",
        "tagPrefixList": ["v"],
        "countType": "imageCountMoreThan",
        "countNumber": 10
      },
      "action": { "type": "expire" }
    },
    {
      "rulePriority": 2,
      "description": "Delete untagged images after 7 days",
      "selection": {
        "tagStatus": "untagged",
        "countType": "sinceImagePushed",
        "countUnit": "days",
        "countNumber": 7
      },
      "action": { "type": "expire" }
    },
    {
      "rulePriority": 3,
      "description": "Delete dev images after 30 days",
      "selection": {
        "tagStatus": "tagged",
        "tagPrefixList": ["dev-", "feature-"],
        "countType": "sinceImagePushed",
        "countUnit": "days",
        "countNumber": 30
      },
      "action": { "type": "expire" }
    }
  ]
}
EOF
echo "ECR lifecycle policy configured"

Exercises

Exercise 1 — Tagging Strategy Design: Your team deploys a microservice 5 times per day from a trunk-based workflow. Design an image tagging strategy that supports: (1) identifying exactly which commit is in production, (2) rolling back to any previous version, (3) human-readable release names for stakeholder communication. Document your tag format and explain your reasoning.
Exercise 2 — SBOM Generation: Pick any Docker image you use (e.g., node:20-alpine). Generate an SBOM using Syft or Trivy. How many packages does it contain? Are there any known vulnerabilities? Would you be comfortable deploying this image to a system processing financial data? Why or why not?
Exercise 3 — SLSA Level Assessment: Evaluate your current build pipeline against the SLSA framework. What level does it achieve today? What specific changes would you need to make to reach Level 2? Level 3? Estimate the effort for each level increase.
Exercise 4 — Retention Policy: Your organisation has accumulated 15 TB of container images over 3 years. Storage costs $0.10/GB/month ($1,500/month). Design a retention policy that: (1) keeps all production releases available for rollback, (2) allows debugging recent staging failures, (3) reduces storage by at least 60%. Calculate the expected cost savings.

Conclusion & Next Steps

Artifact management is the bridge between "code that builds" and "software that runs in production." The key principles are deceptively simple: build once, tag immutably, promote through environments, sign everything, and know exactly what's inside your artifacts (SBOMs). But implementing them properly requires discipline and tooling.

The supply chain security landscape is evolving rapidly. SLSA, Sigstore, and mandatory SBOMs are moving from "nice to have" to "required for business." Organisations that invest in build provenance now will have a significant compliance advantage as regulations tighten.

Next in the Series

In Part 14: Continuous Integration — Pipelines & Automation, we'll explore how to design CI pipelines that catch bugs in minutes — covering GitHub Actions, GitLab CI, Jenkins, pipeline-as-code, parallelisation, caching, and the practices that make CI fast and reliable.