Built to a UCL Data Engineering brief requiring RAG, graph databases, LLM agents, data lineage, and API deployment — all against live, real-world news pipelines.
PostgreSQL · MongoDB · Neo4j · ChromaDB · 5,500+ article embeddings · Neo4j: 2,500+ actors · 10k+ relationships · LangChain agent with 7 tools · Airflow orchestration · OpenLineage / Marquez lineage · FastAPI + MCP (12 endpoints) · Docker Compose (8 services) · spaCy NER · sentence-transformers · LibreTranslate · 216 tests across ingestion, storage, API, agent
>open project


