samuel.brooks — builds data pipelines
~/portfolio — zsh — 122×33

samuel@portfolio:~/portfolio % cat intro.md

London  ·  Data Engineer  ·  MSc Business Analytics, UCL

I build data pipelines and analytics systems.

Most of the work here is about taking messy source data and making it usable. I am currently completing an MSc in Business Analytics at UCL in London.

Interested in data engineering, analytics engineering, and applied machine learning infrastructure.

dataset metrics

1,216games in flagship dataset
8sources joined into pipeline
62features in final output
pipeline.py — python3 — 96×25

samuel@portfolio:~/work % python pipeline.py --info

featured work

Built around 1,216 indie games and eight data sources, this project moves from collection and scraping through Spark processing into a dataset that is actually usable for analysis.

8 integrated data sources · PostgreSQL + MongoDB storage · Apache Spark processing · Parquet + DuckDB analytics · Success tier classification

>open project
projects.sh — zsh — 80×24

samuel@portfolio:~/projects % ls -la

#name
02Premier League Predictive Analytics

The project focuses on disciplined evaluation rather than inflated forecasting claims, using walk-forward validation, calibration, and model-odds blending to test whether the pipeline adds signal beyond the market.

Bet365 and Pinnacle benchmarks · Best single model: 0.9678 log-loss · Best blend beats Bet365, p=0.007 · Dixon-Coles season simulation · 6.7/20 exact positions

>open project-02

Built to a UCL Data Engineering brief requiring RAG, graph databases, LLM agents, data lineage, and API deployment — all against live, real-world news pipelines.

PostgreSQL · MongoDB · Neo4j · ChromaDB · LangChain agent with 6 tools · Airflow orchestration · OpenLineage / Marquez lineage · FastAPI + MCP interface · Docker Compose (7 services) · spaCy NER · sentence-transformers

>open project-03
04Marketing Analytics Segmentation

A supporting analytics project focused on the less glamorous but more useful work of cleaning structure, shaping features, and producing interpretable segments.

Analytics workflow · Feature framing · Decision-ready outputs

>open project-04
notes.md — vim — 72×30

samuel@portfolio:~/notes % vim notes.md

-- NORMAL --notes.md 2L
1

Lessons from building a Spark pipeline

What broke, what became clearer, and what changed once the work had to run repeatedly rather than just succeed once in a notebook.

2

Designing reproducible data pipelines

Some thoughts on traceable movement, storage boundaries, and why quiet documentation habits matter more than people admit.

contact.sh — zsh — 60×18

samuel@portfolio:~ % ./contact.sh

checking endpoints...

LinkedInLinkedIn
EmailEmail
PhonePhone
CV/ResumeCV/Resume
GitHubGitHub
UCL EmailUCL Email

done.