1. Add codes to remove missing data (above a certain threshold) 2. Add codes to imput missing data (KNN, or others) 3. Add codes to normalise data (log, quantile normalisation)