Skip to content

Commit 639aeff

Browse files
committed
Clean up and comment out
1 parent cead59f commit 639aeff

File tree

4 files changed

+94
-3224
lines changed

4 files changed

+94
-3224
lines changed

content/titanic/CaseStudy_Titanic-Template.en.ipynb renamed to content/titanic/Case Study - Titanic.ipynb

Lines changed: 12 additions & 136 deletions
Original file line numberDiff line numberDiff line change
@@ -2052,145 +2052,21 @@
20522052
"cell_type": "markdown",
20532053
"metadata": {},
20542054
"source": [
2055-
"[Back to Top](#Table-of-Contents)\n",
2055+
"### Conclusion\n",
20562056
"\n",
2057-
"## Step 4: Modeling\n",
2057+
"In this case study, we explored the Titanic dataset following the steps of the data mining process:\n",
20582058
"\n",
2059-
"Now we have a relatively clean dataset(Except for the **Cabin** column which has many missing values). We can do a classification on Survived to predict whether a passenger could survive the disaster or a regression on Fare to predict ticket fare. This dataset is not a good dataset for regression. But since we don't talk about classification in this workshop we will construct a linear regression on Fare in this exercise."
2060-
]
2061-
},
2062-
{
2063-
"cell_type": "markdown",
2064-
"metadata": {},
2065-
"source": [
2066-
"##### Task16: Construct a regression on Fare\n",
2067-
"Construct regression model with statsmodels.\n",
2059+
"1. We started by understanding the business context and the objectives of the analysis.\n",
2060+
"2. We then explored and understood the data, identifying important features and their relationships.\n",
2061+
"3. Finally, we prepared the data by handling missing values and creating new features.\n",
20682062
"\n",
2069-
"Pick Pclass, Embarked, FamilySize as independent variables."
2070-
]
2071-
},
2072-
{
2073-
"cell_type": "code",
2074-
"execution_count": 25,
2075-
"metadata": {
2076-
"scrolled": false
2077-
},
2078-
"outputs": [
2079-
{
2080-
"data": {
2081-
"text/html": [
2082-
"<table class=\"simpletable\">\n",
2083-
"<caption>OLS Regression Results</caption>\n",
2084-
"<tr>\n",
2085-
" <th>Dep. Variable:</th> <td>Fare</td> <th> R-squared: </th> <td> 0.427</td> \n",
2086-
"</tr>\n",
2087-
"<tr>\n",
2088-
" <th>Model:</th> <td>OLS</td> <th> Adj. R-squared: </th> <td> 0.424</td> \n",
2089-
"</tr>\n",
2090-
"<tr>\n",
2091-
" <th>Method:</th> <td>Least Squares</td> <th> F-statistic: </th> <td> 131.9</td> \n",
2092-
"</tr>\n",
2093-
"<tr>\n",
2094-
" <th>Date:</th> <td>Wed, 24 Apr 2019</td> <th> Prob (F-statistic):</th> <td>1.92e-104</td>\n",
2095-
"</tr>\n",
2096-
"<tr>\n",
2097-
" <th>Time:</th> <td>12:07:17</td> <th> Log-Likelihood: </th> <td> -4495.8</td> \n",
2098-
"</tr>\n",
2099-
"<tr>\n",
2100-
" <th>No. Observations:</th> <td> 891</td> <th> AIC: </th> <td> 9004.</td> \n",
2101-
"</tr>\n",
2102-
"<tr>\n",
2103-
" <th>Df Residuals:</th> <td> 885</td> <th> BIC: </th> <td> 9032.</td> \n",
2104-
"</tr>\n",
2105-
"<tr>\n",
2106-
" <th>Df Model:</th> <td> 5</td> <th> </th> <td> </td> \n",
2107-
"</tr>\n",
2108-
"<tr>\n",
2109-
" <th>Covariance Type:</th> <td>nonrobust</td> <th> </th> <td> </td> \n",
2110-
"</tr>\n",
2111-
"</table>\n",
2112-
"<table class=\"simpletable\">\n",
2113-
"<tr>\n",
2114-
" <td></td> <th>coef</th> <th>std err</th> <th>t</th> <th>P>|t|</th> <th>[0.025</th> <th>0.975]</th> \n",
2115-
"</tr>\n",
2116-
"<tr>\n",
2117-
" <th>Intercept</th> <td> 79.2989</td> <td> 3.543</td> <td> 22.381</td> <td> 0.000</td> <td> 72.345</td> <td> 86.253</td>\n",
2118-
"</tr>\n",
2119-
"<tr>\n",
2120-
" <th>C(Pclass)[T.2]</th> <td> -59.0955</td> <td> 3.921</td> <td> -15.073</td> <td> 0.000</td> <td> -66.790</td> <td> -51.401</td>\n",
2121-
"</tr>\n",
2122-
"<tr>\n",
2123-
" <th>C(Pclass)[T.3]</th> <td> -68.8790</td> <td> 3.253</td> <td> -21.174</td> <td> 0.000</td> <td> -75.264</td> <td> -62.494</td>\n",
2124-
"</tr>\n",
2125-
"<tr>\n",
2126-
" <th>C(Embarked)[T.Q]</th> <td> -11.8147</td> <td> 5.446</td> <td> -2.169</td> <td> 0.030</td> <td> -22.504</td> <td> -1.126</td>\n",
2127-
"</tr>\n",
2128-
"<tr>\n",
2129-
" <th>C(Embarked)[T.S]</th> <td> -14.9202</td> <td> 3.414</td> <td> -4.371</td> <td> 0.000</td> <td> -21.620</td> <td> -8.220</td>\n",
2130-
"</tr>\n",
2131-
"<tr>\n",
2132-
" <th>FamilySize</th> <td> 7.8256</td> <td> 0.789</td> <td> 9.919</td> <td> 0.000</td> <td> 6.277</td> <td> 9.374</td>\n",
2133-
"</tr>\n",
2134-
"</table>\n",
2135-
"<table class=\"simpletable\">\n",
2136-
"<tr>\n",
2137-
" <th>Omnibus:</th> <td>1043.506</td> <th> Durbin-Watson: </th> <td> 2.040</td> \n",
2138-
"</tr>\n",
2139-
"<tr>\n",
2140-
" <th>Prob(Omnibus):</th> <td> 0.000</td> <th> Jarque-Bera (JB): </th> <td>118621.734</td>\n",
2141-
"</tr>\n",
2142-
"<tr>\n",
2143-
" <th>Skew:</th> <td> 5.718</td> <th> Prob(JB): </th> <td> 0.00</td> \n",
2144-
"</tr>\n",
2145-
"<tr>\n",
2146-
" <th>Kurtosis:</th> <td>58.357</td> <th> Cond. No. </th> <td> 13.4</td> \n",
2147-
"</tr>\n",
2148-
"</table><br/><br/>Warnings:<br/>[1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
2149-
],
2150-
"text/plain": [
2151-
"<class 'statsmodels.iolib.summary.Summary'>\n",
2152-
"\"\"\"\n",
2153-
" OLS Regression Results \n",
2154-
"==============================================================================\n",
2155-
"Dep. Variable: Fare R-squared: 0.427\n",
2156-
"Model: OLS Adj. R-squared: 0.424\n",
2157-
"Method: Least Squares F-statistic: 131.9\n",
2158-
"Date: Wed, 24 Apr 2019 Prob (F-statistic): 1.92e-104\n",
2159-
"Time: 12:07:17 Log-Likelihood: -4495.8\n",
2160-
"No. Observations: 891 AIC: 9004.\n",
2161-
"Df Residuals: 885 BIC: 9032.\n",
2162-
"Df Model: 5 \n",
2163-
"Covariance Type: nonrobust \n",
2164-
"====================================================================================\n",
2165-
" coef std err t P>|t| [0.025 0.975]\n",
2166-
"------------------------------------------------------------------------------------\n",
2167-
"Intercept 79.2989 3.543 22.381 0.000 72.345 86.253\n",
2168-
"C(Pclass)[T.2] -59.0955 3.921 -15.073 0.000 -66.790 -51.401\n",
2169-
"C(Pclass)[T.3] -68.8790 3.253 -21.174 0.000 -75.264 -62.494\n",
2170-
"C(Embarked)[T.Q] -11.8147 5.446 -2.169 0.030 -22.504 -1.126\n",
2171-
"C(Embarked)[T.S] -14.9202 3.414 -4.371 0.000 -21.620 -8.220\n",
2172-
"FamilySize 7.8256 0.789 9.919 0.000 6.277 9.374\n",
2173-
"==============================================================================\n",
2174-
"Omnibus: 1043.506 Durbin-Watson: 2.040\n",
2175-
"Prob(Omnibus): 0.000 Jarque-Bera (JB): 118621.734\n",
2176-
"Skew: 5.718 Prob(JB): 0.00\n",
2177-
"Kurtosis: 58.357 Cond. No. 13.4\n",
2178-
"==============================================================================\n",
2179-
"\n",
2180-
"Warnings:\n",
2181-
"[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
2182-
"\"\"\""
2183-
]
2184-
},
2185-
"execution_count": 25,
2186-
"metadata": {},
2187-
"output_type": "execute_result"
2188-
}
2189-
],
2190-
"source": [
2191-
"# import statsmodels.formula.api as smf\n",
2192-
"# result = smf.ols(\"Fare ~ C(Pclass) + C(Embarked) + FamilySize\", data=df_titanic).fit()\n",
2193-
"# result.summary()"
2063+
"This analysis allowed us to draw several interesting conclusions about the factors that influenced survival and ticket prices on the Titanic. However, it's important to note that this is just a beginning. For a more in-depth analysis, we could consider:\n",
2064+
"\n",
2065+
"- Using classification techniques to predict survival.\n",
2066+
"- Exploring other features or combinations of features.\n",
2067+
"- Using more advanced modeling techniques.\n",
2068+
"\n",
2069+
"This case study illustrates how data analysis can help us understand historical events and draw lessons that could be applicable in other contexts."
21942070
]
21952071
}
21962072
],

0 commit comments

Comments
 (0)