Pandas attributes and methods:
df[col].unique()-> return a list of unique values in the seriesdf[col].nunique()-> return the number of unique values in the seriesdf.isnull().sum()-> return the number of null values in the dataframe
Matplotlib and seaborn methods:
%matplotlib inline-> assure that plots are displayed in jupyter notebook's cellssns.histplot()-> show the histogram of a series
Numpy methods:
np.log1p()-> apply log transformation to a variable, after adding one to each input value.
Long-tail distributions usually confuse the ML models, so the recommendation is to transform the target variable distribution to a normal one whenever possible.
The entire code of this project is available in this jupyter notebook.
|
The notes are written by the community. If you see an error here, please create a PR with a fix. |