A deliberate 12-month learning plan for software engineers looking to expand into AI/ML engineering — building the depth needed to design, build, and operate reliable, scalable AI systems in production.
Disclaimer: This roadmap is provided for educational purposes only. Course availability, pricing, and platform features may change over time. Any cloud computing, software, or hardware costs incurred while following this curriculum are solely the responsibility of the individual. The author makes no guarantees about outcomes, job placement, or the accuracy of third-party pricing or availability information referenced herein.
In a perfect world, everything in this roadmap would be free. Much of it actually is — the Anthropic Academy courses, Google Colab, Kaggle, Ollama, and most of the software tools cost nothing. But some of it — the course subscriptions, an AI coding assistant subscription, cloud compute, and eventually an API key or two — will cost money.
The right way to think about this is as an investment in your career, not an expense. A few hundred dollars in subscriptions, spread over a year, is a rounding error compared to the salary trajectory it can unlock. Engineers with real, demonstrable AI/ML skills are among the highest-compensated in the industry right now. The ROI on this roadmap, if you follow through, is exceptional.
That said, there's no reason to spend more than you have to:
Check your employer's learning budget first. Many companies offer annual learning and development stipends — often $1,000–$5,000 per year — specifically for courses, certifications, and conferences. This money frequently goes unused. Before spending a dollar of your own, ask your manager or HR team whether you have a learning budget and whether it covers online course subscriptions and professional certifications. In many cases this roadmap can be funded entirely by your employer.
Time your purchases. Coursera Plus and deeplearning.ai Pro both run significant promotional discounts at predictable times of year — Black Friday, New Year, and back-to-school. If you're not in a rush to start, waiting for a sale can save $100–150 on subscriptions alone.
Use free tiers first. Several courses on this roadmap are available for free audit (without certificates). Start there to validate the content is right for you before paying for access.
Get an AI assistant — your personal tutor and coding partner. This is as close to a requirement as anything on this list. As a coding partner, it accelerates course exercises, portfolio project work, and debugging. As a personal tutor, it's an always-available resource for deepening your understanding — confused by a concept in a paper, want to talk through an architecture decision, or need something explained three different ways until it clicks? A conversational AI meets you exactly where you are, with no office hours and no judgment. Pick one (or more!) and use it (or them!) throughout the entire program.
| Tool | Subscription | Best For |
|---|---|---|
| Claude Pro + Claude Code | ~$20/mo + usage | Architecture discussions, long context, complex reasoning |
| ChatGPT Plus + Codex | ~$20/mo | Largest community, most Stack Overflow-style help available |
| Google AI Pro + Gemini CLI | ~$20/mo | Best for image generation (Imagen) and video generation (Veo) — recommended if you want flat-rate creative media without per-use API charges |
Depending on usage, you may eventually hit limits on the standard $20/month plans — Claude, ChatGPT, and Google AI all offer higher-tier plans (~$100/month) with significantly more capacity. Start low and upgrade/downgrade month-to-month as needed. If you regularly use image or video generation, Google AI Pro is worth adding alongside your primary AI subscription — it's significantly stronger than ChatGPT Plus for creative media and eliminates the need for pay-as-you-go API charges.
Anthropic Academy (anthropic.skilljar.com — all free)
- Claude 101
- Claude Code 101
- Claude Code in Action
- Introduction to Claude Cowork
- AI Fluency: Framework & Foundations
- Building with the Claude API
- Introduction to Model Context Protocol
- Model Context Protocol: Advanced Topics
- Introduction to Agent Skills
- Introduction to Subagents
- AI Capabilities and Limitations
- AI Fluency for educators
- AI Fluency for students
- AI Fluency for nonprofits
- Teaching AI Fluency
- Claude with Amazon Bedrock
- Claude with Google Cloud's Vertex AI
Other Certificates
- Real-World AI for Everyone — Advancing Women in Tech (Coursera, sponsored by Anthropic)
- IBM AI Developer Professional Certificate — Coursera
- IBM Generative AI Engineering Professional Certificate — Coursera
Month 1
- Deep Learning Specialization (part 1) — deeplearning.ai Pro
- Blog post: Implement backprop from scratch
- Portfolio project: Define architecture and start building — apply course concepts to a real project
- Update LinkedIn headline and about section to reflect AI engineering pivot
Month 2
- Deep Learning Specialization (part 2) — deeplearning.ai Pro
- Blog post: CNNs + RNNs explained for engineers
- Portfolio project: Implement a neural network component from scratch
Month 3
- NLP Specialization (part 1) — deeplearning.ai Pro
- Blog post: Attention mechanism deep dive
- Portfolio project: Add NLP or text processing capability to your project
- Blog post: Write about a design decision or architecture choice in your project
Month 4
- NLP Specialization (part 2) + transformer architecture — deeplearning.ai Pro
- Blog post: Building a toy transformer
- Portfolio project: Integrate a transformer or LLM into your project in a meaningful way
- Publish portfolio project publicly if not already — clean README, docs, examples
- Find and enter first AI hackathon — check lablab.ai, Hugging Face, Devpost, MLH
Milestone:
- Understand how LLMs actually work under the hood
- Portfolio project has a working LLM integration
Month 5
- Generative AI with LLMs — deeplearning.ai Pro + AWS (Coursera)
- Blog post: RLHF + fine-tuning tradeoffs
- Portfolio project: Implement or experiment with fine-tuning on a small model
- Enter first AI hackathon
- Take Claude Certified Architect exam ($99 or free via partner org)
- Add Claude Certified Architect credential to LinkedIn and resume
Month 6
- LLMOps — deeplearning.ai Pro
- Blog post: Deploying LLMs — what nobody tells you
- Portfolio project: Add a deployment pipeline or serving layer to your project
- Update LinkedIn skills to include ML/LLM engineering keywords
Month 7
- AI Agents in LangGraph — deeplearning.ai Pro
- Blog post: Comparing agent frameworks — what you learned from building your own
- Portfolio project: Implement a multi-agent workflow or agentic feature
- Update portfolio project README and docs to reflect current capabilities
Milestone:
- Build and deploy a production LLM-backed system end to end
- Portfolio project is publicly demonstrable
Month 8
- ML Engineering for Production / MLOps — Coursera (Google)
- Blog post: CI/CD for ML — a Staff engineer's take
- Portfolio project: Add observability, monitoring, or a CI/CD pipeline to your project
- Reassess hardware upgrade based on cloud compute usage
- Start building referral network — engage with AI company engineers on GitHub and LinkedIn
Month 9
- MLOps Specialization (cont.) — Coursera (Google)
- Blog post: Model monitoring + drift detection patterns
- Portfolio project: Implement model monitoring or evaluation tooling
- Enter second AI hackathon if opportunity arises
Month 10
- Cloud ML specialty — Coursera (AWS or GCP track)
- Blog post: LLM infra cost optimization on AWS vs GCP
- Portfolio project: Deploy your project to cloud with proper ML infrastructure
- Do a full LinkedIn profile review and refresh
- Review target company job boards — note language and keywords used in target roles
Milestone:
- Architect and operate ML systems at scale — the Staff-level bar
Choose your path: If you are targeting a promotion or expanded role at your current company, follow Phase 4a. If you are targeting a new role at an AI company, follow Phase 4b. You can also combine elements of both.
Month 11
- Identify an AI/ML initiative at your current company you can lead or contribute to
- Present your AI/ML learnings internally — lunch and learn, tech talk, or internal blog post
- Blog post: Your take on a research paper or concept relevant to your company's domain
- Portfolio project: Add a feature or capability directly relevant to your current work
- Polish portfolio project — docs, README, demo video or screenshots
- Polish personal blog — ensure all blog posts are published and well-presented
- Document your AI/ML contributions and impact for performance review
Month 12
- Lead or ship an AI-related project or improvement at work
- Blog post: Year-in-review + what you built
- Final LinkedIn polish — reflect your new AI/ML engineering depth
- Prepare promotion case — tie AI/ML skills to business impact
- Schedule promotion or role expansion conversation with your manager
Milestone:
- Demonstrable AI/ML impact at your current company
- Clear promotion case built on real delivered work
Note: This phase is specifically for those targeting a new role at an AI company. If that is not your goal, follow Phase 4a above instead.
Month 11
- Identify 2-3 target AI companies and research their published work deeply
- Read key research papers from your target companies (see reading list below for Anthropic examples)
- Bookmark and regularly read your target companies' engineering and research blogs
- Blog post: Your take on a research paper or technical concept from your target company
- Portfolio project: Add a safety, evaluation, or alignment-relevant feature
- Polish portfolio project — docs, README, demo video or screenshots
- Polish personal blog — ensure all blog posts are published and well-presented
- Investigate fellowship or residency programs at target companies
Month 12
- Portfolio project: Final polish and any remaining features
- Blog post: Year-in-review + what you built
- Final LinkedIn polish — ensure your profile speaks your target companies' language
- Identify specific remote-friendly roles at target companies
- Reach out to referrals at target companies
- Begin applying
Milestone:
- Portfolio, blog, and projects speak the language of top AI companies
Note: This reading list is primarily relevant for those following Phase 4b, but the approach of deeply studying your target company's published research applies to any path. The list below uses Anthropic as an example — substitute or supplement with papers from your own target company or employer.
- Constitutional AI: Harmlessness from AI Feedback (Bai et al., 2022)
- The foundational paper behind how Claude is trained. Introduces RLAIF.
- Measuring Progress on Scalable Oversight for Large Language Models
- Core safety problem: supervising AI systems that outperform us on the task.
- Collective Constitutional AI
- Extends CAI with democratic/public input into the constitution.
- Towards Monosemanticity: Decomposing Language Models With Dictionary Learning
- Foundational mechanistic interpretability paper from Anthropic.
- The Engineering Challenges of Scaling Interpretability
- Directly relevant: Anthropic explicitly states the next obstacle is engineering, not science.
- Anthropic Alignment Science Blog
- Active research: scalable oversight, sleeper agents, alignment faking, sabotage evals.
- Anthropic Fellows Program
- No PhD required, 40%+ of fellows join Anthropic full-time.
Build or choose one meaningful open source project and evolve it throughout the 12 months. The best portfolio projects are ones you actually use or care about — not toy examples. As you progress through the curriculum, apply what you're learning directly to the project. Frame architecture decisions around LLM inference, context management, multi-agent orchestration, and production reliability. See each phase for specific project milestones.
One post per month minimum, timed to learning milestones. See each phase for specific post topics.
Goals
- Demonstrate genuine depth, not just course completion
- Use vocabulary aligned with top AI companies (constitutional AI, RLHF, scalable oversight, evals)
- Document real implementations, not just summaries
- Target first hackathon around Month 4–5
- Platforms
Official Anthropic certification launched March 12, 2026.
- Exam registration: anthropic.skilljar.com
- Free prep courses (13 total): anthropic.skilljar.com
- Cost: $99 (free for first 5,000 employees of Claude Partner Network member orgs)
| Domain | Weight |
|---|---|
| Agentic Architecture & Orchestration | 27% |
| Claude Code Configuration & Workflows | 20% |
| Prompt Engineering & Structured Output | 20% |
| Tool Design & MCP Integration | 18% |
| Context Management & Reliability | 15% |
Passing score: 720 / 1000
- Check if current employer joins the Claude Partner Network (free access for employees)
- Complete the 13 free prep courses at anthropic.skilljar.com
- Key prep courses: Claude 101, Building Applications with Claude API, Introduction to MCP, Claude Code developer training
- Take exam — free via partner org access, or $99 out of pocket
- Add credential to LinkedIn and resume
This is an official Anthropic credential that validates production-level Claude implementation skills — directly relevant to the target role. The exam is architecture-focused (how you design and deploy Claude-powered systems), which plays to Staff SWE strengths. As the first major AI lab to offer formal certifications, this credential will carry increasing weight with AI company recruiters.
- deeplearning.ai Pro — primary for ML fundamentals and LLMs
- Coursera Plus — MLOps and production ML engineering
Cost tips:
- Both are annual subscriptions — check for promotional pricing before committing
- Coursera Plus frequently offers significant discounts ($100–150 off) around Black Friday, New Year, and back-to-school periods
- deeplearning.ai Pro often runs holiday and new year deals — worth waiting for if you're not in a rush to start
- Coursera Plus gives access to the full catalog including IBM, Google, and DeepLearning.AI specializations — check if courses you need are included before paying separately
- Some employers offer learning stipends that cover these subscriptions — worth asking before paying out of pocket
Hardware is split into two tiers: Get Started (sufficient for Phases 1–2) and Go Deeper (recommended for Phases 3–4 and serious local inference/fine-tuning).
Cost tips:
- The Get Started tier is sufficient for all coursework — don't over-invest in hardware early
- Black Friday and back-to-school are the best times to buy GPUs and workstation components
- For Windows/Linux builds, refurbished or open-box components from reputable sellers can save 20–40%
- Apple Silicon Macs hold their value well — consider refurbished models from Apple's official refurb store for meaningful savings
- Cloud compute (Colab, spot instances) is almost always more cost-effective than buying high-end local hardware for occasional training runs — reassess the Go Deeper tier at Month 8–9 based on actual usage
| CPU | RAM | Storage | GPU | |
|---|---|---|---|---|
| Get Started | Apple M2 or later | 16GB unified | 512GB SSD | Integrated (MPS) |
| Go Deeper | Apple M4 Max or later | 64GB+ unified | 1TB+ SSD | Integrated (MPS, 40-core GPU+) |
Note: MPS (Metal Performance Shaders) gives Apple Silicon meaningful GPU acceleration for PyTorch. 64GB+ unified memory allows running 30B+ parameter models locally.
| CPU | RAM | Storage | GPU | |
|---|---|---|---|---|
| Get Started | Intel Core i7 / AMD Ryzen 7 (8+ cores) | 16GB DDR4 | 512GB SSD | NVIDIA RTX 3060 (12GB VRAM) |
| Go Deeper | Intel Core i9 / AMD Ryzen 9 (16+ cores) | 64GB DDR5 | 2TB NVMe SSD | NVIDIA RTX 4080/4090 (16–24GB VRAM) |
Note: NVIDIA CUDA is the most widely supported GPU acceleration for ML frameworks. VRAM is the critical constraint — 12GB minimum, 24GB for comfortable local fine-tuning.
| CPU | RAM | Storage | GPU | |
|---|---|---|---|---|
| Get Started | Intel Core i7 / AMD Ryzen 7 (8+ cores) | 16GB DDR4 | 512GB SSD | NVIDIA RTX 3060 (12GB VRAM) |
| Go Deeper | AMD Threadripper / Intel Core i9 (16+ cores) | 64GB+ DDR5 | 2TB+ NVMe SSD | NVIDIA RTX 4090 or A-series (24GB+ VRAM) |
Note: Linux is the most flexible platform for ML work — native CUDA support, best tooling compatibility, and preferred in production environments. AMD GPUs are increasingly supported via ROCm but NVIDIA remains the safer choice.
Cloud is essential for this roadmap — use it for training runs, MLOps deployments, and anything that exceeds local hardware limits.
- Google Colab — free tier includes GPU access (T4), sufficient for most course exercises. Colab Pro adds more runtime and better GPUs.
- AWS Free Tier — 12 months of limited EC2, S3, and SageMaker Studio Lab (free, no credit card for Studio Lab)
- GCP Free Tier — $300 in credits for new accounts, plus always-free tier for small compute
- Azure Free Tier — $200 in credits for new accounts, plus free tier for select services
- Kaggle Notebooks — free GPU/TPU access (30 hrs/week), no setup required
| Service | Use Case |
|---|---|
| EC2 (p3/p4/g4/g5 instances) | GPU training and inference |
| SageMaker | Managed ML training, deployment, and pipelines |
| SageMaker Studio | Jupyter-based ML IDE in the cloud |
| S3 | Dataset and model artifact storage |
| ECR + ECS/EKS | Containerized model serving |
| Lambda | Serverless LLM inference endpoints |
| Bedrock | Managed access to foundation models including Claude |
Cost tip: Use spot instances for training (up to 90% cheaper). Set billing alerts. Shut down instances when not in use.
| Service | Use Case |
|---|---|
| Vertex AI | Managed ML platform — training, pipelines, and model registry |
| Compute Engine (A2/G2 instances) | GPU training and inference |
| Cloud Storage | Dataset and artifact storage |
| Cloud Run | Serverless containerized model serving |
| GKE | Kubernetes-based model serving at scale |
| Vertex AI Model Garden | Access to foundation models including Claude via Anthropic |
Cost tip: Committed use discounts and preemptible VMs significantly reduce training costs.
| Service | Use Case |
|---|---|
| Azure Machine Learning | Managed ML platform — training, pipelines, and deployment |
| NC/ND-series VMs | GPU training and inference |
| Azure Blob Storage | Dataset and artifact storage |
| Azure Container Apps | Serverless containerized model serving |
| AKS | Kubernetes-based model serving at scale |
| Azure OpenAI Service | Managed access to foundation models |
Cost tip: Azure Hybrid Benefit and reserved instances reduce long-term costs significantly.
- Always set billing alerts before experimenting — runaway GPU instances are expensive
- Use spot/preemptible/interruptible instances for training jobs — save 60–90%
- Store data in cloud storage, not on compute instances
- Use free notebook environments (Colab, Kaggle) for course exercises — save paid cloud for MLOps and deployment work
- Tear down resources immediately after use — idle GPU instances still incur charges
- For LLM inference experiments, use API endpoints (Anthropic, OpenAI) rather than self-hosting — far cheaper at low volume
| Tool | Purpose | When Needed | Install |
|---|---|---|---|
| Python 3.11+ | Primary ML language | Phase 1 — start here | python.org or via package manager |
| Node.js LTS | JS tooling, some LLM SDKs | As needed | nodejs.org or nvm |
| Git | Version control | Day 1 | git-scm.com |
| Package | Purpose | When Needed | Install |
|---|---|---|---|
| PyTorch | Primary deep learning framework | Phase 1 | pip install torch torchvision torchaudio |
| NumPy | Numerical computing | Phase 1 | pip install numpy |
| Pandas | Data manipulation | Phase 1 | pip install pandas |
| Matplotlib | Data visualization | Phase 1 | pip install matplotlib |
| scikit-learn | Classical ML algorithms | Phase 1 | pip install scikit-learn |
| SciPy | Scientific computing | Phase 1 | pip install scipy |
| Transformers | Hugging Face model library | Phase 2 | pip install transformers |
| Datasets | Hugging Face datasets | Phase 2 | pip install datasets |
| LangChain | LLM application framework | Phase 2 | pip install langchain |
| LangGraph | Agent workflow framework | Phase 2 | pip install langgraph |
| Hugging Face Hub | Model and dataset hub CLI | Phase 2 | pip install huggingface_hub |
| Accelerate | Distributed training | Phase 3 | pip install accelerate |
| PEFT | Parameter-efficient fine-tuning | Phase 3 | pip install peft |
| MLflow | Experiment tracking | Phase 3 | pip install mlflow |
| Weights & Biases | Experiment tracking (alternative) | Phase 3 | pip install wandb |
Tip: Use a virtual environment (
python -m venv .venv) or conda to keep project dependencies isolated.
Your portfolio project will need access to LLM models. There are several options depending on your budget and hardware — you don't need to spend money to get started.
| Provider | API Console | Notes |
|---|---|---|
| Anthropic (Claude) | console.anthropic.com | Best for instruction following and long context |
| OpenAI (GPT) | platform.openai.com | Largest ecosystem and community |
| Google (Gemini) | aistudio.google.com | Free tier available, cheap at scale |
| Mistral | console.mistral.ai | Cheapest major provider, EU-based |
All of the above have free tiers or trial credits to get started — you don't need to spend money immediately.
| Option | How | Notes |
|---|---|---|
| Ollama | Run models locally | Free, no API key needed, works on Mac/Windows/Linux. Limited by local hardware. |
| LM Studio | Run models locally (GUI) | Same as Ollama but with a user interface. Good for beginners. |
| LocalAI | Run models locally (OpenAI-compatible API) | Drop-in OpenAI API replacement, no GPU required |
| Google Gemini Free Tier | Cloud API | Generous free tier via Google AI Studio — good for experimentation |
| Groq | Cloud API | Free tier, extremely fast inference on open-source models (Llama, Mixtral) |
| Together AI | Cloud API | Free trial credits, cheap pay-as-you-go for open-source models |
| OpenRouter | Cloud API | Single API for 100+ models, free tier available, pay only for what you use |
Recommendation for getting started: Use Ollama locally for free experimentation during Phases 1–2. Add a paid API key when you're ready to build something in Phase 2–3.
| Tool | Purpose | When Needed | Install |
|---|---|---|---|
| Ollama | Run LLMs locally | Phase 1 | ollama.com |
| Anthropic SDK | Claude API client | Phase 2 | pip install anthropic |
| OpenAI SDK | OpenAI API client | Phase 2 | pip install openai |
| Hugging Face CLI | Download and manage models | Phase 2 | pip install huggingface_hub |
| LM Studio | GUI for local model inference | Optional | lmstudio.ai |
| vLLM | High-performance LLM serving | Phase 3 | pip install vllm (Linux/GPU only) |
| Tool | Purpose | When Needed | Install |
|---|---|---|---|
| JupyterLab | Interactive notebooks for coursework | Phase 1 | pip install jupyterlab |
| VS Code | Code editor | Day 1 | code.visualstudio.com |
| Docker | Containerization for ML deployments | Phase 3 | docker.com |
| Postman | API testing | Phase 2 | postman.com |
| Tool | Purpose | When Needed | Install |
|---|---|---|---|
| AWS CLI | AWS resource management | Phase 3 | aws.amazon.com/cli |
| gcloud CLI | GCP resource management | Phase 3 | cloud.google.com/sdk |
| Azure CLI | Azure resource management | Phase 3 | aka.ms/installazurecli |
| kubectl | Kubernetes cluster management | Phase 3 | kubernetes.io/docs/tasks/tools |