Built around 1,216 indie games and eight data sources, this project moves from collection and scraping through Spark processing into a dataset that is actually usable for analysis.
8 integrated data sources · PostgreSQL + MongoDB storage · Apache Spark processing · Parquet + DuckDB analytics · Success tier classification
>open project


