The Modern AI Ecosystem
The AI landscape of the early 2010s was defined by task-specific models: a sentiment classifier for product reviews, a separate named entity recogniser for document parsing, a different computer vision model for face detection. Each required its own training data, its own architecture search, its own deployment pipeline. Organisations maintained dozens of narrowly scoped ML systems, each a bespoke engineering project. The economics and logistics of this approach constrained who could afford to do AI at all.
The 2020s have brought a fundamentally different paradigm: foundation models — large neural networks pre-trained on vast, broad datasets that can be adapted to many downstream tasks with comparatively little effort. A single model trained on internet-scale text can write code, summarise documents, extract structured data, answer questions, and translate between languages — tasks that would previously have required entirely separate systems. This consolidation is reshaping the economics of AI and compressing the time-to-deployment for new applications from months to days.
Foundation Models & LLMs
The term "foundation model" was coined by researchers at Stanford in 2021 to describe large models trained on broad data at scale, capable of being adapted (via fine-tuning or prompting) to a wide range of downstream tasks. The architectural backbone of virtually all modern foundation models is the transformer — introduced in the 2017 paper "Attention Is All You Need." Transformers replace the sequential processing of RNNs with a self-attention mechanism that allows every token in a sequence to directly attend to every other token, enabling massively parallel training and superior modelling of long-range dependencies.
The landscape of large language models (LLMs) is now dominated by a handful of families: OpenAI's GPT-4o (the model powering ChatGPT and Copilot), Anthropic's Claude (optimised for safety and long-context reasoning), Google's Gemini (natively multimodal, integrated across Google's product suite), and Meta's Llama (open-weights, enabling community fine-tuning and on-premise deployment). Each represents a different point on the capability-openness-cost tradeoff curve. Beyond text, multimodal foundation models like GPT-4V, Gemini Ultra, and DALL-E 3 handle images, audio, and video within the same architectural family.
Two primary adaptation strategies define how practitioners use foundation models. In-context learning (or prompting) requires no parameter updates — you simply provide examples or instructions in the model's input context and let the model generalise. This is fast and flexible but constrained by context length and sensitive to prompt phrasing. Fine-tuning updates the model's weights on a task-specific dataset, yielding more reliable specialised behaviour at the cost of compute and the risk of catastrophic forgetting. Parameter-efficient fine-tuning techniques like LoRA (Low-Rank Adaptation) have made targeted specialisation practical even without access to the full model's compute budget.
The following JSON shows a real foundation model API request-response cycle — in this case a medical coding task using structured output prompting:
{
"request": {
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "You are a medical coding assistant."},
{"role": "user", "content": "Code the following diagnosis: Type 2 diabetes with peripheral neuropathy"}
],
"temperature": 0.1,
"max_tokens": 150
},
"response": {
"choices": [{
"message": {
"role": "assistant",
"content": "Primary: E11.40 (Type 2 diabetes mellitus with diabetic neuropathy, unspecified)\nSecondary: E11.65 (Type 2 diabetes mellitus with hyperglycemia)"
},
"finish_reason": "stop"
}],
"usage": {"prompt_tokens": 48, "completion_tokens": 42, "total_tokens": 90}
}
}
Case Study
From Search to Generation: How Foundation Models Are Reshaping the API Economy
Through 2018–2021, companies building NLP-powered features would stitch together a portfolio of task-specific ML APIs: a sentiment analysis endpoint, a separate named entity recognition service, a language detection call, and a translation API — each from different providers, each requiring independent evaluation and integration work. Starting in 2022, teams began replacing this entire stack with a single foundation model API call. A customer service analytics platform that previously maintained integrations with five different ML vendors consolidated to a single GPT-4 API call with structured output prompting — reducing integration surface area by 80% and cutting per-request latency by eliminating serial API chaining. The trade-off is real: a single API dependency creates concentration risk and cost unpredictability. But the productivity gains in development and maintenance have been decisive enough that the architectural shift is now mainstream across mid-size and enterprise product teams.
LLM
Production
API Integration
The Modern AI Ecosystem — Key Libraries by Domain
The Python ecosystem has converged around a set of well-maintained libraries that cover every layer of the AI stack. The following snippet serves as a practical orientation for new practitioners — showing which library to reach for in each domain and demonstrating a zero-shot classification pipeline in just three lines:
# The modern AI ecosystem — key library imports by domain
# Foundation Models
from openai import OpenAI # GPT-4o via API
import anthropic # Claude 3.5 via API
from transformers import pipeline as hf_pipe # HuggingFace Hub (local models)
# Computer Vision
import torch
import torchvision.transforms as T
from ultralytics import YOLO # YOLOv8 object detection
# NLP & Embeddings
from sentence_transformers import SentenceTransformer
import spacy # Industrial NLP
# Classical ML
from sklearn.ensemble import GradientBoostingClassifier, RandomForestClassifier
import xgboost as xgb # Structured/tabular data
# MLOps
import mlflow # Experiment tracking
from evidently import Report # Model monitoring
# Example: zero-shot classification in 3 lines
classifier = hf_pipe("zero-shot-classification", model="facebook/bart-large-mnli")
result = classifier("This movie was fantastic!", candidate_labels=["positive", "negative"])
print(result['labels'][0]) # → "positive"
The ML Stack in Production
Building a machine learning model is perhaps 20% of the work in getting AI into production. The remaining 80% is the infrastructure stack that makes models reliable, reproducible, and observable at scale. Data pipelines are the foundation: tools like Apache Airflow, Prefect, and dbt orchestrate the ingestion, transformation, validation, and versioning of training data. Poor data pipelines are responsible for more production failures than poor models.
Feature stores (Feast, Tecton, Hopsworks) solve the training-serving skew problem: they ensure that the features a model sees during training are computed using the exact same logic as the features it receives at inference time — a source of subtle, hard-to-diagnose errors when managed informally. Training infrastructure spans cloud GPU clusters (AWS SageMaker, Google Vertex AI, Azure ML) and distributed training frameworks (Ray Train, DeepSpeed, PyTorch FSDP) for large models.
Model registries (MLflow Model Registry, Weights & Biases, Hugging Face Hub) provide versioning, metadata tracking, and promotion workflows — ensuring that only validated models reach production. Serving infrastructure handles the online inference path: latency-optimised model servers (TorchServe, Triton Inference Server, vLLM for LLMs), A/B testing frameworks, and canary deployment tooling. Finally, monitoring closes the loop: data drift detection, prediction drift alerts, and business metric dashboards that can trigger retraining when model performance degrades in the wild. All of these layers are explored in depth in Part 20 (MLOps).
Real-World Applications at a Glance
AI is not a single technology applied uniformly — it is a family of techniques, each suited to particular problem structures, deployed across every major industry in forms shaped by that industry's data realities and regulatory constraints. This section provides a practitioner's survey of where AI is genuinely embedded in production systems today. Each domain covered here will receive a dedicated deep-dive article later in the series, so the goal now is orientation rather than exhaustive treatment.
Industry Verticals
Healthcare & Life Sciences: AI is most visibly deployed in medical imaging — algorithms from companies like Aidoc, Viz.ai, and Google Health detect pathologies in CT scans, chest X-rays, and retinal images with radiologist-level accuracy, and in some narrow tasks (diabetic retinopathy screening, for example) they exceed average human performance. In drug discovery, Schrödinger and Insilico Medicine use generative models and reinforcement learning to navigate chemical space and propose novel candidate molecules, compressing timelines that once measured in years. Clinical NLP systems extract structured diagnoses and medication records from free-text physician notes, powering revenue cycle management and population health analytics. Part 14 covers this domain in full.
Finance & Fraud Detection: Every major bank and payments processor runs ensemble models and graph neural networks for real-time fraud detection — Stripe's radar system and Mastercard's Decision Intelligence platform process transactions in under 100 milliseconds, evaluating hundreds of behavioural features. Credit underwriting has shifted from scorecards to gradient-boosted trees and, increasingly, alternative-data models that incorporate rent payment history and cash flow patterns. Algorithmic trading firms use reinforcement learning for execution optimisation and short-term price prediction. Regulatory pressure around explainability (adverse action notices, model risk management guidelines) creates a unique constraint not present in consumer internet AI. Part 15 explores these applications in depth.
Retail & E-commerce: Recommendation systems are the most commercially impactful AI systems ever deployed — Amazon's item-to-item collaborative filtering, Netflix's personalisation engine, and Spotify's Discover Weekly are canonical examples. Demand forecasting at scale (Walmart, Amazon) uses hierarchical time-series models to optimise inventory across thousands of SKUs and locations simultaneously. Visual search (ASOS, Pinterest Lens) allows shoppers to find products by uploading a photo. Generative AI is now entering the retail stack for product description generation, image background synthesis, and virtual try-on.
Autonomous Vehicles & Transportation: Self-driving systems are among the most complex AI deployments in existence — fusing inputs from lidar, radar, cameras, and HD maps through perception stacks (3D object detection, semantic segmentation), prediction modules (trajectory forecasting), and planning components (motion planning, control). Waymo, Cruise, and Tesla have each taken architecturally distinct approaches. More practically deployed today are advanced driver assistance systems (ADAS): lane-keeping, adaptive cruise control, and automatic emergency braking, all of which are in production on tens of millions of vehicles. Part 16 covers autonomous systems in full.
Manufacturing & Industry: Computer vision for visual quality inspection — detecting surface defects, assembly errors, and dimensional deviations — has replaced manual inspection on high-speed production lines at companies like Foxconn, BMW, and TSMC. Predictive maintenance models (trained on sensor time-series from turbines, CNC machines, and compressors) detect anomalies weeks before mechanical failure, dramatically reducing unplanned downtime. Digital twin systems combine physics simulations with ML surrogates to optimise production parameters in real time.
Media, Entertainment & Content: Recommendation engines at YouTube, TikTok, and Spotify are among the most sophisticated ranking systems ever built, driving measurable engagement improvements through multi-objective optimisation that balances clicks, watch time, and content diversity. Generative AI has entered the content creation stack: Midjourney and DALL-E generate concept art; Suno and Udio compose music; Runway and Pika produce video clips. In the news industry, the Associated Press has used NLP to automatically generate earnings reports and sports summaries since 2014, with newer LLM-based systems expanding the automation surface.
AI Capability Map
Practitioners benefit from thinking about AI in terms of capabilities rather than algorithms — matching the structure of the business problem to the class of ML solution that addresses it. Classification assigns inputs to discrete categories: churn prediction, disease detection, content moderation. The output is a label (or probability distribution over labels), and success is measured by precision, recall, and ROC-AUC. Regression predicts continuous quantities: revenue forecasting, property valuation, estimated time of arrival. Success metrics are MAE, RMSE, and MAPE depending on the business sensitivity to outliers.
Generation produces novel content — text, images, audio, code — conditioned on inputs. This is the domain of LLMs, diffusion models, and variational autoencoders. Segmentation partitions inputs into meaningful regions — semantic segmentation in autonomous driving assigns a class to every pixel; customer segmentation partitions a user base into cohorts for targeted treatment. Retrieval finds the most relevant items from a large corpus given a query — the foundation of search engines, recommendation systems, and RAG pipelines. Ranking orders a set of candidates by predicted relevance or quality — news feeds, search result pages, and ad auction systems are all ranking problems. Finally, planning sequences of decisions to achieve a goal — the domain of reinforcement learning and combinatorial optimisation, applied in robotics, logistics routing, and game-playing agents.
Important: AI capabilities do not map cleanly onto business problems by default — you must do that mapping explicitly. The most common and expensive failure mode is selecting a model architecture (or an entire platform) before the problem has been defined precisely enough to specify a measurable success criterion. Start with the decision you need to make, work backwards to the prediction you need, then choose your tools.
Careers in AI & Machine Learning: Roles, Skills & Pathways
The AI industry has grown from a handful of research labs employing a few hundred specialists into a global ecosystem with dozens of distinct roles spanning research, engineering, product, operations, and governance. Understanding these roles — their responsibilities, required skills, typical backgrounds, and how they interact — is essential whether you are planning a career in AI, hiring for an AI team, or simply trying to understand who does what in the organisations building and deploying AI systems.
The landscape of AI roles has evolved dramatically. In 2015, most companies had a single role for anyone working with data: "Data Scientist". By 2020, the field had fragmented into specialised roles reflecting the reality that building production AI systems requires fundamentally different skills at different stages of the pipeline. By 2026, the ecosystem has matured to include roles that didn't exist even five years ago — Prompt Engineers, AI Safety Researchers, Forward Deployed Engineers, and LLMOps Specialists.
The Evolution of AI Roles: A Brief History
The history of AI careers mirrors the broader arc of computing itself. In the 1950s–1970s, "AI researcher" was synonymous with academic — researchers like Alan Turing, John McCarthy, Marvin Minsky, and Herbert Simon worked in universities, publishing papers on symbolic reasoning, search algorithms, and early neural networks. There were no commercial AI jobs because there were no commercial AI products.
The expert systems era (1980s) created the first commercial AI role: the Knowledge Engineer, who interviewed domain experts, extracted their decision rules, and encoded them into rule-based systems. Companies like Digital Equipment Corporation deployed XCON, a system that configured VAX computer orders, reportedly saving $40 million annually. Knowledge Engineers were the bridge between domain expertise and computation — a role that foreshadows today's "Forward Deployed Engineer".
The Machine Learning renaissance (2000s–2010s) shifted the paradigm from rules to data. The term "Data Scientist" was coined by DJ Patil and Jeff Hammerbacher at LinkedIn and Facebook around 2008, and Harvard Business Review famously called it "the sexiest job of the 21st century" in 2012. Early Data Scientists were generalists: they cleaned data, built models, created visualisations, and sometimes deployed the results — because teams were small and tooling was immature.
The deep learning explosion (2012–2020) introduced scale that shattered the generalist model. Training a model on ImageNet required GPU infrastructure knowledge. Deploying a model in production required software engineering discipline. Monitoring a model required operations expertise. The generalist Data Scientist role fractured into specialised roles: ML Engineer, Data Engineer, Research Scientist, Applied Scientist, and MLOps Engineer. Each role addressed a specific gap in the pipeline from research to production.
The LLM era (2022–present) has created yet another wave of specialisation. Prompt Engineering emerged as a distinct skill set. AI Safety became a dedicated field. Forward Deployed Engineers — a role pioneered by Palantir — became widespread as companies realised that deploying AI in enterprise environments requires on-site engineering at the customer's premises. The current landscape reflects the maturation of AI from a research curiosity into a critical business infrastructure.
Research & Science Roles
Research roles create the foundational advances that the rest of the ecosystem builds upon. These roles require the deepest theoretical knowledge and are most closely connected to academic publication.
AI Research Scientist
AI Research Scientists work at the frontier of what is possible. They design new architectures, prove theoretical results, publish at top venues (NeurIPS, ICML, ICLR, ACL, CVPR), and push the state of the art on fundamental problems. At organisations like Google DeepMind, OpenAI, Meta FAIR, and Anthropic, Research Scientists often hold PhDs in machine learning, statistics, neuroscience, or mathematics.
Day-to-day: Reading 5–10 papers per week, designing and running experiments (often requiring weeks of GPU time), mathematical derivation of novel loss functions or training procedures, writing papers, presenting at internal research reviews, mentoring junior researchers.
Key skills: Deep mathematical foundations (linear algebra, probability theory, optimisation, information theory), strong programming in Python/PyTorch/JAX, experimental rigour, scientific writing, ability to identify tractable research questions with high potential impact.
Compensation (2026): $200K–$500K at top labs (total compensation including equity). Senior Research Scientists at frontier labs can exceed $700K.
Example contribution: Ashish Vaswani, Noam Shazeer, et al. at Google Brain authored "Attention Is All You Need" (2017) — the transformer paper that launched the LLM era. This single research contribution generated trillions of dollars in economic value.
Research Engineer
Research Engineers are the builders in research organisations. While Research Scientists focus on "what should we try?", Research Engineers focus on "how do we implement and scale this experiment efficiently?" They write optimised training loops, manage distributed training across hundreds or thousands of GPUs, build evaluation infrastructure, and create the tooling that accelerates research velocity.
Day-to-day: Implementing paper-described architectures from scratch, optimising CUDA kernels for novel attention mechanisms, building and maintaining experiment tracking infrastructure, debugging distributed training failures at 3am when a 1,000-GPU training run crashes after 4 days.
Key skills: Systems programming (C++, CUDA), distributed systems (NCCL, MPI, PyTorch Distributed), GPU architecture knowledge, strong ML fundamentals (enough to understand and implement papers), infrastructure-as-code (Docker, Kubernetes, Slurm).
Path: Many Research Engineers have strong software engineering backgrounds and developed ML expertise on the job. It is one of the most effective paths into AI research for people without PhDs — demonstrated ability to implement and scale cutting-edge research is highly valued.
Applied Scientist
Applied Scientists bridge the gap between research and product. They take frontier research techniques and adapt them to solve specific business problems — often publishing their adaptation work as well. At Amazon, Applied Scientists are embedded in product teams (Alexa, Search, Advertising) and are expected to both publish and ship production systems. At Microsoft, they sit within product groups and work alongside engineers to integrate ML capabilities into products like Office, Azure, and LinkedIn.
How it differs from Research Scientist: Research Scientists optimise for scientific novelty and publication; Applied Scientists optimise for product impact while contributing to scientific knowledge. An Applied Scientist might develop a novel attention mechanism specifically for real-time ad ranking — publishable, but motivated by product need.
Key skills: ML fundamentals + software engineering + domain expertise. Applied Scientists need to understand both the theoretical landscape (to identify which techniques might apply) and the engineering constraints (to build systems that work in production).
Career Insight: The distinction between "Research Scientist" and "Applied Scientist" varies enormously by company. At Google DeepMind, these are distinct tracks with different expectations. At most startups, the distinction doesn't exist — researchers ship code and engineers read papers. Focus on the actual job description rather than the title when evaluating roles.
Engineering & Production Roles
Engineering roles are responsible for building, deploying, and maintaining the systems that deliver AI capabilities to end users. The boundary between "ML Engineer" and "Software Engineer" is blurring as AI becomes embedded in every software product.
Machine Learning Engineer (MLE)
The Machine Learning Engineer is the most common AI engineering role. MLEs take models from prototype to production: they build training pipelines, implement feature engineering, design model serving infrastructure, create monitoring and alerting systems for model performance, and manage the entire ML lifecycle. If a Data Scientist proves that a model works on a notebook, the MLE makes it work at 10,000 requests per second with 99.9% uptime.
Day-to-day: Writing production training pipelines (Airflow, Kubeflow, Metaflow), building Feature Stores (Feast, Tecton), deploying models behind APIs (FastAPI, TensorFlow Serving, Triton), setting up monitoring dashboards (Grafana, Evidently AI), debugging production model degradation, running A/B tests comparing model versions.
Key skills: Python, SQL, Docker, Kubernetes, cloud platforms (AWS SageMaker, GCP Vertex AI, Azure ML), ML frameworks (PyTorch, scikit-learn, XGBoost), CI/CD for ML, experiment tracking (MLflow, Weights & Biases), strong software engineering practices (testing, code review, version control).
Compensation (2026): $150K–$350K depending on seniority and company. FAANG-level senior MLEs can exceed $400K total compensation.
Forward Deployed Engineer (FDE)
The Forward Deployed Engineer is one of the most distinctive roles in AI — pioneered by Palantir Technologies and now adopted by companies including Databricks, Scale AI, Anthropic, Weights & Biases, and numerous AI startups. FDEs are embedded at customer sites, working directly with clients to deploy, customise, and operationalise AI solutions in the customer's environment.
Why this role exists: Enterprise AI deployment is fundamentally different from consumer AI deployment. A hospital deploying a diagnostic AI needs the system integrated with their Epic EHR, trained on their patient demographics, validated by their clinical staff, and compliant with their specific regulatory requirements. A generic API won't work — someone needs to be on-site, understanding the customer's data, workflows, and constraints, and building custom integrations. That person is the FDE.
Day-to-day: On-site at a client (Fortune 500 company, government agency, or hospital system) for weeks or months. Understanding the client's data infrastructure, building custom data pipelines, configuring and fine-tuning AI models for client-specific use cases, training client teams on AI tools, acting as the technical bridge between the client's domain experts and the AI company's engineers. FDEs often work in ambiguous environments with messy data and unclear requirements — adaptability is the core skill.
Key skills: Full-stack engineering (frontend + backend + data), strong communication and client-facing skills, ability to learn new domains quickly (one quarter you might deploy AI in oil & gas, the next in defence), SQL and data wrangling, cloud infrastructure, the maturity to work autonomously without daily oversight from engineering management.
Compensation (2026): $140K–$250K base, typically with significant bonuses tied to client outcomes. Travel: 40–80% depending on company and client portfolio.
Case Study
Palantir's Forward Deployed Engineers in Ukraine
In 2022–2023, Palantir deployed Forward Deployed Engineers to support Ukraine's defence operations. FDEs worked alongside Ukrainian military analysts to integrate satellite imagery, open-source intelligence, and battlefield data into Palantir's Gotham platform. The FDEs didn't just install software — they built custom data integrations for Ukrainian data sources, trained analysts on the platform, and iterated on the interface based on real-time operational feedback. This case illustrates the FDE role at its most demanding: high-stakes, ambiguous, resource-constrained environments where technical skill and adaptability are equally critical. The experience also shaped Palantir's AIP (AI Platform) product, which now uses LLMs to make the same analytical capabilities accessible through natural language queries.
Forward Deployed
Enterprise AI
Palantir
Data Engineer
Data Engineers build and maintain the infrastructure that makes ML possible. They design data pipelines, build data warehouses and lakes, ensure data quality, and create the systems that deliver clean, labelled, feature-rich data to ML models. The often-cited statistic that "80% of ML work is data preparation" is really saying "80% of ML work is Data Engineering."
Key skills: SQL (expert-level), Python, Spark/Databricks, Airflow, dbt, cloud data services (Redshift, BigQuery, Snowflake), streaming systems (Kafka, Kinesis), data modelling, ETL/ELT design.
Why it matters for AI: The fastest way to improve a model's performance is almost always to improve its data, not its architecture. Data Engineers who understand ML requirements — feature freshness, label quality, training-serving skew — are extraordinarily valuable.
MLOps / LLMOps Engineer
MLOps Engineers specialise in the operational infrastructure for ML systems: CI/CD for models, automated retraining, model monitoring and drift detection, A/B testing infrastructure, and model versioning. As LLMs became dominant, the specialisation of LLMOps Engineer emerged — focused on prompt management, LLM serving infrastructure (vLLM, TGI), context window management, embedding pipeline operations, and RAG infrastructure.
Key skills: Kubernetes, Docker, Terraform, MLflow/Weights & Biases, Evidently AI/Arize, GitHub Actions, monitoring (Prometheus, Grafana), Python, cloud ML platforms. LLMOps additionally requires: vector databases (Pinecone, Weaviate, Qdrant), LLM serving frameworks (vLLM, TensorRT-LLM), and prompt versioning systems.
Compensation (2026): $140K–$280K. LLMOps specialists at AI-native companies command a premium due to scarcity.
Prompt Engineer
Prompt Engineering emerged as a distinct role in 2023 as organisations realised that the way you communicate with an LLM dramatically affects the quality of its output. Prompt Engineers design, test, optimise, and version system prompts and prompt templates for LLM-powered applications. They work at the intersection of linguistics, psychology, and software engineering.
Day-to-day: Writing and iterating on system prompts, building evaluation datasets, running systematic A/B tests of prompt variants, documenting prompt libraries, working with product teams to translate user requirements into effective prompt specifications, building few-shot example banks.
Key skills: Excellent writing and analytical reasoning, understanding of LLM behaviour and limitations, structured thinking about evaluation metrics, basic Python (for building evaluation harnesses), knowledge of multiple LLM providers' APIs and capabilities.
Future trajectory: Many industry observers expect "Prompt Engineer" to be absorbed into MLE and Product roles as prompt optimisation becomes automated (DSPy, automated prompt tuning). However, the underlying skills — clear specification of AI system behaviour, evaluation design, and human-AI interaction design — will remain valuable under different titles.
Data Science, Product & Strategy Roles
These roles connect AI capabilities to business outcomes. They require a blend of technical understanding and business acumen.
Data Scientist
The original AI role in industry. Data Scientists extract insights from data to inform business decisions. While the role has narrowed from its early "unicorn" definition, it remains central to analytics-driven organisations. Modern Data Scientists typically focus on statistical analysis, experimentation (A/B testing, causal inference), and building interpretable models for business stakeholders.
How it differs from MLE: Data Scientists optimise for insight and decision support; MLEs optimise for system reliability and scale. A Data Scientist might build a churn prediction model in a Jupyter notebook and present the findings to the VP of Customer Success. An MLE would deploy that model as a real-time scoring API that triggers automated retention campaigns.
Key skills: Statistics (hypothesis testing, regression, Bayesian methods), Python/R, SQL, data visualisation (matplotlib, seaborn, Tableau), communication and storytelling, domain expertise.
AI Product Manager
AI Product Managers define what AI-powered products should do and why. They bridge the gap between technical AI capabilities and user needs. Unlike traditional product managers, AI PMs must understand the probabilistic nature of ML systems — that models have accuracy rates, not certainties — and communicate this to stakeholders and users.
Key challenges: Setting realistic expectations about model performance, designing user experiences that handle AI errors gracefully, defining evaluation metrics that align with business KPIs, managing the "AI hype cycle" within the organisation, prioritising data collection and labelling efforts.
Key skills: Traditional PM skills (user research, roadmapping, stakeholder management) + ML literacy (understanding model capabilities, evaluation metrics, data requirements, and deployment constraints). Many AI PMs have a technical background (former engineers or data scientists) who transitioned into product.
AI Safety Researcher / AI Ethics Specialist
AI Safety Researchers work on ensuring AI systems behave as intended and do not cause harm. This field has grown from a niche academic concern to a well-funded industry priority, driven by the capabilities and risks of frontier LLMs. Anthropic, OpenAI, Google DeepMind, and the UK AI Safety Institute all have dedicated safety teams.
Focus areas: Alignment (ensuring AI systems pursue intended goals), robustness (ensuring systems behave predictably under adversarial or out-of-distribution inputs), interpretability (understanding why models make specific decisions), red-teaming (systematically finding failure modes), and governance (developing policies and standards for responsible AI development).
Key skills: Strong ML foundations, philosophy/ethics background (many safety researchers come from philosophy, cognitive science, or related fields), technical writing, adversarial thinking (the ability to imagine and test failure modes), policy literacy.
Compensation (2026): $150K–$400K. The field is talent-constrained — experienced AI Safety Researchers are among the most in-demand specialists in the industry.
AI/ML Roles Comparison
| Role |
Primary Focus |
Core Skills |
Typical Background |
Comp. Range (2026) |
| AI Research Scientist |
Novel algorithms & publications |
Math, PyTorch/JAX, scientific writing |
PhD in ML/Stats/Math |
$200K–$500K+ |
| Research Engineer |
Implement & scale research |
C++/CUDA, distributed systems, ML |
CS degree + ML experience |
$180K–$400K |
| Applied Scientist |
Research-to-product translation |
ML + SWE + domain expertise |
PhD or MS + industry |
$180K–$450K |
| ML Engineer |
Production ML systems |
Python, Docker, K8s, MLflow, cloud |
CS/SWE + ML skills |
$150K–$350K |
| Forward Deployed Engineer |
On-site customer AI deployment |
Full-stack, communication, adaptability |
SWE + client skills |
$140K–$250K |
| Data Engineer |
Data pipelines & infrastructure |
SQL, Spark, Airflow, cloud data |
CS/Data Engineering |
$130K–$280K |
| MLOps / LLMOps Engineer |
ML system operations |
K8s, CI/CD, monitoring, vLLM |
DevOps/SRE + ML |
$140K–$280K |
| Data Scientist |
Analysis & insight generation |
Statistics, Python/R, visualisation |
Stats/Math/Sciences |
$120K–$250K |
| Prompt Engineer |
LLM prompt design & optimisation |
Writing, LLM fluency, evaluation |
Varied (writing, SWE, linguistics) |
$100K–$200K |
| AI Product Manager |
AI product strategy & roadmap |
PM skills + ML literacy |
PM + technical background |
$150K–$300K |
| AI Safety Researcher |
Alignment, robustness, red-teaming |
ML + philosophy + adversarial thinking |
ML/Philosophy PhD or MS |
$150K–$400K |
How AI Teams Are Structured
Understanding team structures helps you see how these roles work together in practice. Three dominant models have emerged:
Model 1 — Centralised AI Team: A single AI/ML team serves the entire organisation. Data Scientists, MLEs, and Data Engineers sit together and take on projects from different business units. This model works well for companies with fewer than 10 ML practitioners — it avoids duplication and builds deep ML expertise. The risk is that the centralised team becomes a bottleneck, with business units queuing for months to get model development resources.
Model 2 — Embedded Model: ML practitioners are embedded directly in product or business unit teams. Each product team has its own Data Scientist and MLE. This model ensures tight alignment between ML work and business objectives but can lead to fragmented infrastructure, duplicated effort, and inconsistent ML practices across teams.
Model 3 — Hub-and-Spoke (ML Platform + Embedded): A central ML Platform team builds shared infrastructure (training pipelines, feature stores, model serving, monitoring), while Applied Scientists and MLEs are embedded in product teams. This is the dominant model at scale — companies like Google, Meta, Spotify, and Uber all use variants. The platform team accelerates embedded teams by providing reusable tools, and the embedded teams ensure ML efforts are tightly coupled to product goals.
Case Study
Spotify's ML Team Structure
Spotify employs hundreds of ML practitioners organised in the hub-and-spoke model. The central ML Platform team maintains "ML Ops" infrastructure including Hendrix (feature store), Luigi (workflow orchestration), and standardised model serving. Embedded ML teams sit within product squads — Discover Weekly, Search, Podcast Recommendations, Ad Targeting — each with dedicated Data Scientists and MLEs who understand the specific product domain deeply. An "ML Guild" (cross-cutting community of practice) ensures knowledge sharing across embedded teams, runs internal ML conferences, and maintains coding standards. This structure enables rapid experimentation (embedded teams can iterate on models independently) while ensuring infrastructure consistency (all teams deploy via the same platform). The result: Spotify runs over 5,000 ML models in production, with embedded teams shipping model updates weekly while the platform team ensures reliability.
Team Structure
Hub-and-Spoke
Spotify
Case Study
Scale AI: The Data Annotation Company That Became an AI Platform
Scale AI, founded in 2016 by Alexandr Wang (then 19 years old), illustrates how an entire company can be built around a single node in the AI value chain: data labelling. Scale started by providing labelled training data for autonomous vehicle companies (Waymo, Cruise, Lyft) — human annotators bounding-box pedestrians, vehicles, and lane markings in millions of driving images. The company's insight was that high-quality labelling at scale was the bottleneck for supervised learning. By 2024, Scale had expanded into LLM evaluation (RLHF data for fine-tuning), government AI contracts (becoming the largest AI vendor to the US Department of Defence), and its own "AI for enterprises" platform. Scale's team structure reflects its evolution: annotation operations (thousands of contract workers globally), a core ML team building automated labelling tools, a government-focused division with cleared engineers, and an enterprise sales team deploying AI solutions. The company's Forward Deployed Engineers work directly with defence and enterprise clients to integrate Scale's labelling and evaluation services into client ML pipelines.
Scale AI
Data Annotation
Enterprise AI
Career Pathways & How to Break In
There is no single path into AI — practitioners come from computer science, statistics, physics, neuroscience, linguistics, philosophy, and even music theory. However, certain pathways are more common and more efficient depending on your starting point.
From Software Engineering → ML Engineer: This is the most common transition. SWEs already have the engineering fundamentals (version control, testing, deployment, code quality). Adding ML knowledge through courses (fast.ai, Stanford CS229/CS231n), personal projects (Kaggle competitions, open-source contributions), and ML-focused work within your current role (volunteering for ML-adjacent projects) builds the bridge. Many companies have internal transfer programmes that move SWEs into ML roles with mentorship.
From Academia → Research Scientist: A PhD in ML, statistics, or a related field is still the standard path to Research Scientist roles at frontier labs. During your PhD, focus on: (a) publishing at top venues, (b) building a strong open-source portfolio, (c) doing at least one industry research internship (every major lab offers these). The academic-to-industry transition is well-trodden — many Research Scientists maintain academic collaborations and continue publishing after joining industry.
From Analytics/Business → Data Scientist: Business analysts with SQL and Excel skills can transition to Data Science by learning Python (pandas, scikit-learn), statistics (hypothesis testing, regression, Bayesian methods), and data visualisation. The advantage of this path is domain expertise — a marketing analyst who learns ML brings deep understanding of customer behaviour that a CS graduate lacks.
From Any Background → Prompt Engineer / AI Product Manager: These roles are the most accessible entry points for career changers. Strong writing, analytical reasoning, and systematic thinking are the core skills. Building a portfolio of well-documented prompt engineering projects or AI product case studies demonstrates capability. Many prompt engineers come from technical writing, UX research, content strategy, or teaching backgrounds.
"""
AI/ML Career Role Recommender — A simple decision-tree-style tool
that suggests suitable AI roles based on your background and interests.
This self-contained script helps beginners understand which AI roles
align with their existing skills and career aspirations.
"""
def recommend_roles(background: str, interests: list, experience_years: int) -> list:
"""Suggest AI/ML roles based on background, interests, and experience."""
roles = []
# Core role matching based on background
background_map = {
"software_engineering": [
("ML Engineer", "Your SWE skills transfer directly — learn ML fundamentals"),
("MLOps Engineer", "Combine your DevOps/infra skills with ML operations"),
("Research Engineer", "If you enjoy low-level systems + ML implementation"),
],
"data_analytics": [
("Data Scientist", "Natural progression — add Python + statistics depth"),
("AI Product Manager", "Your domain + analytics skill is the PM foundation"),
("Data Engineer", "Focus on the infrastructure side of the data stack"),
],
"academia_research": [
("AI Research Scientist", "Direct path — publish, intern, then join a lab"),
("Applied Scientist", "Bridge research to product impact"),
("AI Safety Researcher", "If alignment and robustness interest you"),
],
"non_technical": [
("Prompt Engineer", "Strong writing + systematic thinking = ideal fit"),
("AI Product Manager", "Domain expertise is your competitive advantage"),
("AI Ethics Specialist", "Policy, philosophy, or law backgrounds are assets"),
],
}
base_roles = background_map.get(background, background_map["non_technical"])
# Adjust based on interests
interest_bonus = {
"research": ["AI Research Scientist", "Applied Scientist"],
"deployment": ["Forward Deployed Engineer", "ML Engineer", "MLOps Engineer"],
"data": ["Data Engineer", "Data Scientist"],
"product": ["AI Product Manager", "Prompt Engineer"],
"safety": ["AI Safety Researcher", "AI Ethics Specialist"],
"client_facing": ["Forward Deployed Engineer", "AI Product Manager"],
}
# Score each role
role_scores = {}
for role, reason in base_roles:
role_scores[role] = {"score": 3, "reasons": [reason]}
for interest in interests:
for role in interest_bonus.get(interest, []):
if role not in role_scores:
role_scores[role] = {"score": 0, "reasons": []}
role_scores[role]["score"] += 2
role_scores[role]["reasons"].append(f"Matches your interest in {interest}")
# Sort by score and return top 3
sorted_roles = sorted(role_scores.items(), key=lambda x: x[1]["score"], reverse=True)
return [(role, info["reasons"][0]) for role, info in sorted_roles[:3]]
# Example 1: Software engineer interested in deployment
print("=== SWE interested in deployment & client work ===")
recs = recommend_roles("software_engineering", ["deployment", "client_facing"], 5)
for role, reason in recs:
print(f" → {role}: {reason}")
# Example 2: PhD researcher interested in safety
print("\n=== Academic researcher interested in safety ===")
recs = recommend_roles("academia_research", ["research", "safety"], 3)
for role, reason in recs:
print(f" → {role}: {reason}")
# Example 3: Marketing analyst wanting to enter AI
print("\n=== Marketing analyst entering AI ===")
recs = recommend_roles("data_analytics", ["product", "data"], 7)
for role, reason in recs:
print(f" → {role}: {reason}")
# Example 4: Career changer from non-technical background
print("\n=== Non-technical career changer ===")
recs = recommend_roles("non_technical", ["product", "safety"], 10)
for role, reason in recs:
print(f" → {role}: {reason}")
AI Roles Exercises
Beginner
Exercise: Map Your Skills to AI Roles
List your top 10 professional skills. For each skill, identify which AI roles (from the comparison table above) value that skill most. Create a 10×11 matrix scoring each skill-role combination from 0 (irrelevant) to 3 (essential). Sum the columns — the roles with the highest totals represent your strongest natural fits. Which roles surprised you? What are the top 2–3 skills you would need to develop to qualify for your preferred role?
Intermediate
Exercise: Interview an AI Team
Identify a company you admire that uses AI (e.g., Spotify, Stripe, Airbnb, or a local company). Research their AI team structure by examining: (a) their engineering blog posts about ML infrastructure, (b) their job listings for AI roles (what skills do they require, how do they describe the team?), (c) LinkedIn profiles of their ML team members (what are their backgrounds?). Write a 1-page summary of: How many distinct AI roles do they have? Which team structure model (centralised, embedded, or hub-and-spoke) do they use? What does this tell you about their AI maturity?
Advanced
Exercise: Design an AI Org for a Startup
You are the CTO of a Series B startup ($30M raised) building an AI-powered legal document review product. You have budget for 15 AI/ML hires. Design the team: (a) What roles would you hire, in what order? (b) What team structure would you use? (c) What would each person's first 90-day deliverable be? (d) How would the team structure change when you grow to 50 AI hires? Justify your choices with reference to the roles, team models, and case studies discussed in this section.