Correlation coefficient measures the degree of dependency between two variables. This value is negative if one variable grows while the other decreases, and it is positive if both variables increase. Depending on its size, the dependency between both variables could be low, moderate, or strong. It allows measuring the importance of numerical variables.
If r is correlation coefficient, then the correlation between two variables is:
- LOW when
ris between [0, -0.2) or [0, 0.2) - MEDIUM when
ris between [-0.2, -0.5) or [0.2, 0.5) - STRONG when
ris between [-0.5, -1.0] or [0.5, 1.0]
Positive Correlation vs. Negative Correlation
- When
ris positive, an increase in x will increase y. - When
ris negative, an increase in x will decrease y. - When
ris 0, a change in x does not affect y.
Functions and methods:
df[x].corrwith(y)- returns the correlation between x and y series. This is a function from pandas.
The entire code of this project is available in this jupyter notebook.
|
The notes are written by the community. If you see an error here, please create a PR with a fix. |