This repository showcases end-to-end data engineering projects focused on building scalable, high-performance data systems using relational databases, distributed processing frameworks, spatial analytics, and NoSQL storage engines.
- PostgreSQL
- Apache Spark & SparkSQL
- Scala
- SQL
- Docker
- RocksDB
- C++
- Relational Database Design & Query Optimization
- Distributed Spatial Queries using SparkSQL
- Spatio-Temporal Hotspot Analysis using Apache Spark
- NoSQL Key-Value Store Implementation using RocksDB
- Large-scale data ingestion
- Distributed query processing
- Spatial and spatio-temporal analytics
- Storage engine internals (LSM trees)
- Performance-aware system design
Note: Datasets are not included due to size and licensing constraints.