rightstech
diff --git a/‎…anic/CaseStudy_Titanic-Template.en.ipynb‎ ‎…ntent/titanic/Case Study - Titanic.ipynb‎content/titanic/CaseStudy_Titanic-Template.en.ipynb renamed to content/titanic/Case Study - Titanic.ipynb
Lines changed: 12 additions & 136 deletions b/‎…anic/CaseStudy_Titanic-Template.en.ipynb‎ ‎…ntent/titanic/Case Study - Titanic.ipynb‎content/titanic/CaseStudy_Titanic-Template.en.ipynb renamed to content/titanic/Case Study - Titanic.ipynb
Lines changed: 12 additions & 136 deletions
@@ -2052,145 +2052,21 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "[Back to Top](#Table-of-Contents)\n",
+    "### Conclusion\n",
     "\n",
-    "## Step 4: Modeling\n",
+    "In this case study, we explored the Titanic dataset following the steps of the data mining process:\n",
     "\n",
-    "Now we have a relatively clean dataset(Except for the **Cabin** column which has many missing values). We can do a classification on Survived to predict whether a passenger could survive the disaster or a regression on Fare to predict ticket fare. This dataset is not a good dataset for regression. But since we don't talk about classification in this workshop we will construct a linear regression on Fare in this exercise."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "##### Task16: Construct a regression on Fare\n",
-    "Construct regression model with statsmodels.\n",
+    "1. We started by understanding the business context and the objectives of the analysis.\n",
+    "2. We then explored and understood the data, identifying important features and their relationships.\n",
+    "3. Finally, we prepared the data by handling missing values and creating new features.\n",
     "\n",
-    "Pick Pclass, Embarked, FamilySize as independent variables."
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "metadata": {
-    "scrolled": false
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<table class=\"simpletable\">\n",
-       "<caption>OLS Regression Results</caption>\n",
-       "<tr>\n",
-       "  <th>Dep. Variable:</th>          <td>Fare</td>       <th>  R-squared:         </th> <td>   0.427</td> \n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>Model:</th>                   <td>OLS</td>       <th>  Adj. R-squared:    </th> <td>   0.424</td> \n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>Method:</th>             <td>Least Squares</td>  <th>  F-statistic:       </th> <td>   131.9</td> \n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>Date:</th>             <td>Wed, 24 Apr 2019</td> <th>  Prob (F-statistic):</th> <td>1.92e-104</td>\n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>Time:</th>                 <td>12:07:17</td>     <th>  Log-Likelihood:    </th> <td> -4495.8</td> \n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>No. Observations:</th>      <td>   891</td>      <th>  AIC:               </th> <td>   9004.</td> \n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>Df Residuals:</th>          <td>   885</td>      <th>  BIC:               </th> <td>   9032.</td> \n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>Df Model:</th>              <td>     5</td>      <th>                     </th>     <td> </td>    \n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>Covariance Type:</th>      <td>nonrobust</td>    <th>                     </th>     <td> </td>    \n",
-       "</tr>\n",
-       "</table>\n",
-       "<table class=\"simpletable\">\n",
-       "<tr>\n",
-       "          <td></td>            <th>coef</th>     <th>std err</th>      <th>t</th>      <th>P>|t|</th>  <th>[0.025</th>    <th>0.975]</th>  \n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>Intercept</th>        <td>   79.2989</td> <td>    3.543</td> <td>   22.381</td> <td> 0.000</td> <td>   72.345</td> <td>   86.253</td>\n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>C(Pclass)[T.2]</th>   <td>  -59.0955</td> <td>    3.921</td> <td>  -15.073</td> <td> 0.000</td> <td>  -66.790</td> <td>  -51.401</td>\n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>C(Pclass)[T.3]</th>   <td>  -68.8790</td> <td>    3.253</td> <td>  -21.174</td> <td> 0.000</td> <td>  -75.264</td> <td>  -62.494</td>\n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>C(Embarked)[T.Q]</th> <td>  -11.8147</td> <td>    5.446</td> <td>   -2.169</td> <td> 0.030</td> <td>  -22.504</td> <td>   -1.126</td>\n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>C(Embarked)[T.S]</th> <td>  -14.9202</td> <td>    3.414</td> <td>   -4.371</td> <td> 0.000</td> <td>  -21.620</td> <td>   -8.220</td>\n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>FamilySize</th>       <td>    7.8256</td> <td>    0.789</td> <td>    9.919</td> <td> 0.000</td> <td>    6.277</td> <td>    9.374</td>\n",
-       "</tr>\n",
-       "</table>\n",
-       "<table class=\"simpletable\">\n",
-       "<tr>\n",
-       "  <th>Omnibus:</th>       <td>1043.506</td> <th>  Durbin-Watson:     </th>  <td>   2.040</td> \n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>Prob(Omnibus):</th>  <td> 0.000</td>  <th>  Jarque-Bera (JB):  </th> <td>118621.734</td>\n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>Skew:</th>           <td> 5.718</td>  <th>  Prob(JB):          </th>  <td>    0.00</td> \n",
-       "</tr>\n",
-       "<tr>\n",
-       "  <th>Kurtosis:</th>       <td>58.357</td>  <th>  Cond. No.          </th>  <td>    13.4</td> \n",
-       "</tr>\n",
-       "</table><br/><br/>Warnings:<br/>[1] Standard Errors assume that the covariance matrix of the errors is correctly specified."
-      ],
-      "text/plain": [
-       "<class 'statsmodels.iolib.summary.Summary'>\n",
-       "\"\"\"\n",
-       "                            OLS Regression Results                            \n",
-       "==============================================================================\n",
-       "Dep. Variable:                   Fare   R-squared:                       0.427\n",
-       "Model:                            OLS   Adj. R-squared:                  0.424\n",
-       "Method:                 Least Squares   F-statistic:                     131.9\n",
-       "Date:                Wed, 24 Apr 2019   Prob (F-statistic):          1.92e-104\n",
-       "Time:                        12:07:17   Log-Likelihood:                -4495.8\n",
-       "No. Observations:                 891   AIC:                             9004.\n",
-       "Df Residuals:                     885   BIC:                             9032.\n",
-       "Df Model:                           5                                         \n",
-       "Covariance Type:            nonrobust                                         \n",
-       "====================================================================================\n",
-       "                       coef    std err          t      P>|t|      [0.025      0.975]\n",
-       "------------------------------------------------------------------------------------\n",
-       "Intercept           79.2989      3.543     22.381      0.000      72.345      86.253\n",
-       "C(Pclass)[T.2]     -59.0955      3.921    -15.073      0.000     -66.790     -51.401\n",
-       "C(Pclass)[T.3]     -68.8790      3.253    -21.174      0.000     -75.264     -62.494\n",
-       "C(Embarked)[T.Q]   -11.8147      5.446     -2.169      0.030     -22.504      -1.126\n",
-       "C(Embarked)[T.S]   -14.9202      3.414     -4.371      0.000     -21.620      -8.220\n",
-       "FamilySize           7.8256      0.789      9.919      0.000       6.277       9.374\n",
-       "==============================================================================\n",
-       "Omnibus:                     1043.506   Durbin-Watson:                   2.040\n",
-       "Prob(Omnibus):                  0.000   Jarque-Bera (JB):           118621.734\n",
-       "Skew:                           5.718   Prob(JB):                         0.00\n",
-       "Kurtosis:                      58.357   Cond. No.                         13.4\n",
-       "==============================================================================\n",
-       "\n",
-       "Warnings:\n",
-       "[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.\n",
-       "\"\"\""
-      ]
-     },
-     "execution_count": 25,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# import statsmodels.formula.api as smf\n",
-    "# result = smf.ols(\"Fare ~ C(Pclass) + C(Embarked) + FamilySize\", data=df_titanic).fit()\n",
-    "# result.summary()"
+    "This analysis allowed us to draw several interesting conclusions about the factors that influenced survival and ticket prices on the Titanic. However, it's important to note that this is just a beginning. For a more in-depth analysis, we could consider:\n",
+    "\n",
+    "- Using classification techniques to predict survival.\n",
+    "- Exploring other features or combinations of features.\n",
+    "- Using more advanced modeling techniques.\n",
+    "\n",
+    "This case study illustrates how data analysis can help us understand historical events and draw lessons that could be applicable in other contexts."
    ]
   }
  ],