This capstone combines skills from across the curriculum into one production-style application: a spam detection API that classifies email messages.

Architecture

  Client → FastAPI → Scikit-learn Model (joblib)
                → Logging / Health checks
                → Docker container
                → GitHub Actions CI
  

What You’ll Build

  curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"message": "Win a free iPhone! Click here now!"}'

# {"label": "spam", "confidence": 0.97}
  

Project Structure

  spam-detector/
├── app/
│   ├── __init__.py
│   ├── main.py
│   ├── model.py
│   └── schemas.py
├── ml/
│   ├── train.py
│   └── model.pkl          # generated
├── tests/
│   └── test_api.py
├── Dockerfile
├── docker-compose.yml
├── requirements.txt
├── .github/workflows/ci.yml
└── README.md
  

Step 1: Train the Model

  # ml/train.py
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import cross_val_score
import joblib

# Sample training data (use a real dataset for production)
texts = [
    "Hey, are we still meeting for lunch tomorrow?",
    "Please review the attached report by Friday.",
    "Win a FREE iPhone! Click here NOW!!!",
    "Your account has been compromised. Verify immediately.",
    "Can you send me the meeting notes?",
    "Congratulations! You've won $1,000,000!!!",
    "The project deadline is next Monday.",
    "URGENT: Claim your prize before it expires!",
    "Let's schedule a call to discuss the proposal.",
    "Buy cheap medications online with no prescription!",
]
labels = [0, 0, 1, 1, 0, 1, 0, 1, 0, 1]  # 0=ham, 1=spam

pipeline = Pipeline([
    ("tfidf", TfidfVectorizer(max_features=5000, ngram_range=(1, 2))),
    ("classifier", LogisticRegression(max_iter=1000)),
])

scores = cross_val_score(pipeline, texts, labels, cv=3)
print(f"CV accuracy: {scores.mean():.2f}")

pipeline.fit(texts, labels)
joblib.dump(pipeline, "ml/model.pkl")
print("Model saved to ml/model.pkl")
  

Run: python ml/train.py

Step 2: FastAPI Application

  # app/schemas.py
from pydantic import BaseModel, Field

class MessageRequest(BaseModel):
    message: str = Field(min_length=1, max_length=10000)

class PredictionResponse(BaseModel):
    label: str
    confidence: float
  
  # app/model.py
import joblib
from pathlib import Path

MODEL_PATH = Path(__file__).parent.parent / "ml" / "model.pkl"
_model = None

def get_model():
    global _model
    if _model is None:
        _model = joblib.load(MODEL_PATH)
    return _model

def predict(message: str) -> tuple[str, float]:
    model = get_model()
    proba = model.predict_proba([message])[0]
    label_idx = proba.argmax()
    label = "spam" if label_idx == 1 else "ham"
    return label, float(proba[label_idx])
  
  # app/main.py
import logging
from fastapi import FastAPI, HTTPException
from app.schemas import MessageRequest, PredictionResponse
from app.model import predict

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

app = FastAPI(title="Spam Detector API", version="1.0.0")

@app.get("/health")
def health():
    return {"status": "ok"}

@app.post("/predict", response_model=PredictionResponse)
def classify_message(request: MessageRequest):
    try:
        label, confidence = predict(request.message)
        logger.info("Classified as %s (%.2f)", label, confidence)
        return PredictionResponse(label=label, confidence=round(confidence, 4))
    except Exception as e:
        logger.exception("Prediction failed")
        raise HTTPException(status_code=500, detail="Prediction failed")
  

Step 3: Tests

  # tests/test_api.py
from fastapi.testclient import TestClient
from app.main import app

client = TestClient(app)

def test_health():
    assert client.get("/health").json() == {"status": "ok"}

def test_spam_detection():
    response = client.post("/predict", json={
        "message": "Win a free iPhone! Click here now!"
    })
    assert response.status_code == 200
    data = response.json()
    assert data["label"] == "spam"
    assert data["confidence"] > 0.5

def test_ham_detection():
    response = client.post("/predict", json={
        "message": "Can we reschedule our meeting to Thursday?"
    })
    assert response.status_code == 200
    assert response.json()["label"] == "ham"
  

Run: pytest tests/ -v

Step 4: Docker

  # Dockerfile
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
RUN python ml/train.py
EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]
  
  # docker-compose.yml
services:
  api:
    build: .
    ports:
      - "8000:8000"
    environment:
      - LOG_LEVEL=INFO
  
  docker compose up --build
curl -X POST http://localhost:8000/predict \
  -H "Content-Type: application/json" \
  -d '{"message": "Free money! Act now!"}'
  

Step 5: CI/CD with GitHub Actions

  # .github/workflows/ci.yml
name: CI

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.12"
      - run: pip install -r requirements.txt
      - run: python ml/train.py
      - run: pytest tests/ -v
      - run: pip install flake8 && flake8 app/ ml/
  

Skills Combined

Stage Chapters Used
ML training Scikit-learn
API FastAPI
Validation Type Hints / Pydantic
Testing pytest
Logging Logging
Docker DevOps
Security Security

Bonus Extensions

  1. Real dataset — use the SMS Spam Collection from Kaggle
  2. Model versioning — save models with timestamps, add /model/info endpoint
  3. Rate limiting — add slowapi middleware
  4. Auth — protect /predict with API keys (FastAPI Auth)
  5. Monitoring — add Prometheus metrics endpoint
  6. Frontend — simple HTML form that calls the API
  7. Deploy — push to Railway, Render, or AWS ECS

This capstone demonstrates the full lifecycle: train → serve → test → containerize → automate — the workflow of a production ML engineer.