For each resampled (chunked) pump csv file, did you only mark 1 chunk as True?
E.g. if there is a pump at 2019-03-1 17.00 and I chunked my csv data into 5 second chunks (and only taking into consideration the pump day and 1 day before and after), I only marked the chunk from 17.00.00 to 17.00.05 as True.
This leaves me with an extremely imbalanced dataset so that a RandomForrestClassifier ends up predicting every chunk as False.
What am I missing here?
------- Offtopic -----------
Also thank you guys for your effort to collect all the data. I enjoyed reading your paper too and got lots of useful information out of it. It's a welcome distraction to fiddle around with your data during all the restrictions :)
For each resampled (chunked) pump csv file, did you only mark 1 chunk as True?
E.g. if there is a pump at
2019-03-1 17.00and I chunked my csv data into 5 second chunks (and only taking into consideration the pump day and 1 day before and after), I only marked the chunk from17.00.00to17.00.05asTrue.This leaves me with an extremely imbalanced dataset so that a RandomForrestClassifier ends up predicting every chunk as False.
What am I missing here?
------- Offtopic -----------
Also thank you guys for your effort to collect all the data. I enjoyed reading your paper too and got lots of useful information out of it. It's a welcome distraction to fiddle around with your data during all the restrictions :)