This session covered data obtention and some procedures of data preparation.
Commands, functions, and methods:
!wget- Linux shell command for downloading datapd.read.csv()- read csv filesdf.head()- take a look of the dataframedf.head().T- take a look of the transposed dataframedf.columns- retrieve column names of a dataframedf.columns.str.lower()- lowercase all the letters in the columns names of a dataframedf.columns.str.replace(' ', '_')- replace the space separator in the columns names of a dataframedf.dtypes- retrieve data types of all seriesdf.index- retrieve indices of a dataframepd.to_numeric()- convert a series values to numerical values. Theerrors='coerce'argument allows making the transformation despite some encountered errors.df.fillna()- replace NAs with some value(df.x == "yes").astype(int)- convert x series of yes-no values to numerical values.
The entire code of this project is available in this jupyter notebook.
|
The notes are written by the community. If you see an error here, please create a PR with a fix. |