This folder is for any Python scripts or notebooks you use to explore and understand your datasets. These files should:
- Read in prepared datasets from
0_datasets - Explore and understand the dataset without running a deep analysis:
- Generate some visualizations (in a notebook, or in a separate image file saved to this folder)
- Run some descriptive statistics (beware the Datasaurus Dozen!)
- ... let your curiosity guide you, but avoid running any inferential statistics or using any machine learning at this stage.
DO NOT modify an existing dataset in 0_datasets! This is critical to open
research: Someone should be able to clone this repository and run your scripts
to replicate your research. If you modify an original dataset, others cannot
replicate your work.
Chapter 4 - Exploratory Data Analysis from the Art of Data Science is a good starting reference.
Use the README in this folder to give a quick summary of each script/notebook - which dataset(s) it explores, and how.