This demo involves modeling energy usage for buildings in NYC.
- Create Spark DataFrames from imported CSV files
- Run some data exploration commands to inspect the data
- Define functions to clean and prepare the data for modeling
- Create a model using sklearn
- Visualize the model accuracy
- Detect buildings that consume energy inefficiently
- Python, through Jupyter Notebooks
- Spark DataFrames
- Scikit Learn for model creation
Linear Regression
- Response variable is energy usage in kWh
- Features include age of the building, square feet, number of stories, total plugged equipment, etc.
Unsupervised Learning
- Perform PCA and Clustering
- Metrics used: plugged-in equipment, air conditioning, domestic gas, heating gas
- Visualize clusters using two out of four dimensions and K-means to determine center locations
Classification using Logistic Regression
- Model used to identify inefficient buildings
- Visualize accuracy using a confusion matrix
Notebooks
BlocPower.ipynb
Data Assets
BlocPower_T.csvCDD-HDD-Features.csvHDD-Features.csv