Drew's suggestions

j-m-dean · j-m-dean · commit ce49ed669d66 · 2020-05-04T18:50:42.000+01:00
diff --git a/content/basics/numpy_arrays.ipynb b/content/basics/numpy_arrays.ipynb
@@ -22,16 +22,16 @@
     "\n",
     "| Experiment       | Melting Point ($^{\\circ}$C)  |\n",
     "| ------------- |:-------------:|\n",
-    "| 1      | 99.5 $\\pm$ 0.1|\n",
+    "| 1      | 98.5 $\\pm$ 0.1|\n",
     "| 2      |   99.9  $\\pm$ 0.1  |\n",
-    "| 3 |   99.6  $\\pm$ 0.1  |\n",
+    "| 3 |   100.6  $\\pm$ 0.1  |\n",
     "| 4 |   99.3  $\\pm$ 0.1  |\n",
-    "| 5 |    99.7 $\\pm$ 0.1  |\n",
+    "| 5 |    100.7 $\\pm$ 0.1  |\n",
     "| 6 |   99.4   $\\pm$ 0.1 |\n",
-    "| 7 |    99.4  $\\pm$ 0.1 |\n",
+    "| 7 |    98.4  $\\pm$ 0.1 |\n",
     "| 8 |    99.5  $\\pm$ 0.1 |\n",
     "| 9 |    99.3  $\\pm$ 0.1 |\n",
-    "| 10 |   99.7  $\\pm$ 0.1  |\n"
+    "| 10 |   100.7  $\\pm$ 0.1  |\n"
    ]
   },
   {
@@ -41,13 +41,20 @@
     "### Importing NumPy & creating an array"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "We want to store this information in a NumPy array so that we can calculate certain properties. Before we can use a NumPy array we must import the NumPy module as follows"
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 1,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.266233Z",
-     "start_time": "2020-05-03T17:23:26.052585Z"
+     "end_time": "2020-05-04T17:48:21.986993Z",
+     "start_time": "2020-05-04T17:48:21.752459Z"
     }
    },
    "outputs": [],
@@ -67,13 +74,13 @@
    "execution_count": 2,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.271913Z",
-     "start_time": "2020-05-03T17:23:26.268125Z"
+     "end_time": "2020-05-04T17:48:21.995012Z",
+     "start_time": "2020-05-04T17:48:21.989331Z"
     }
    },
    "outputs": [],
    "source": [
-    "melting_point_data = numpy.array([99.5, 99.9, 99.6, 99.3, 99.7, 99.4, 99.4, 99.5, 99.3, 99.7])"
+    "melting_point_data = numpy.array([98.5, 99.9, 100.6, 99.3, 100.7, 99.4, 98.4, 99.5, 99.3, 100.7])"
    ]
   },
   {
@@ -90,8 +97,8 @@
    "execution_count": 3,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.279590Z",
-     "start_time": "2020-05-03T17:23:26.274561Z"
+     "end_time": "2020-05-04T17:48:22.008428Z",
+     "start_time": "2020-05-04T17:48:21.999481Z"
     }
    },
    "outputs": [
@@ -119,16 +126,16 @@
    "execution_count": 4,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.291698Z",
-     "start_time": "2020-05-03T17:23:26.283151Z"
+     "end_time": "2020-05-04T17:48:22.024986Z",
+     "start_time": "2020-05-04T17:48:22.014280Z"
     }
    },
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "[99.4 99.4 99.5 99.3 99.7]\n"
+      "[ 99.4  98.4  99.5  99.3 100.7]\n"
      ]
     }
    ],
@@ -150,16 +157,16 @@
    "execution_count": 5,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.303771Z",
-     "start_time": "2020-05-03T17:23:26.295793Z"
+     "end_time": "2020-05-04T17:48:22.036661Z",
+     "start_time": "2020-05-04T17:48:22.029171Z"
     }
    },
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "[100.  100.4 100.1  99.8 100.2  99.9  99.9 100.   99.8 100.2]\n"
+      "[ 99.  100.4 101.1  99.8 101.2  99.9  98.9 100.   99.8 101.2]\n"
      ]
     }
    ],
@@ -181,8 +188,8 @@
    "execution_count": 6,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.313972Z",
-     "start_time": "2020-05-03T17:23:26.308114Z"
+     "end_time": "2020-05-04T17:48:22.047129Z",
+     "start_time": "2020-05-04T17:48:22.039321Z"
     }
    },
    "outputs": [
@@ -191,7 +198,7 @@
      "output_type": "stream",
      "text": [
       "Subtracting 0.5 from the original melting point data results in:\n",
-      "[99.  99.4 99.1 98.8 99.2 98.9 98.9 99.  98.8 99.2]\n"
+      "[ 98.   99.4 100.1  98.8 100.2  98.9  97.9  99.   98.8 100.2]\n"
      ]
     }
    ],
@@ -207,8 +214,8 @@
    "execution_count": 7,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.324863Z",
-     "start_time": "2020-05-03T17:23:26.316791Z"
+     "end_time": "2020-05-04T17:48:22.058298Z",
+     "start_time": "2020-05-04T17:48:22.050874Z"
     }
    },
    "outputs": [
@@ -217,7 +224,7 @@
      "output_type": "stream",
      "text": [
       "Multiplying the original melting point data by two results in:\n",
-      "[199.  199.8 199.2 198.6 199.4 198.8 198.8 199.  198.6 199.4]\n"
+      "[197.  199.8 201.2 198.6 201.4 198.8 196.8 199.  198.6 201.4]\n"
      ]
     }
    ],
@@ -233,8 +240,8 @@
    "execution_count": 8,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.336392Z",
-     "start_time": "2020-05-03T17:23:26.329536Z"
+     "end_time": "2020-05-04T17:48:22.075289Z",
+     "start_time": "2020-05-04T17:48:22.066940Z"
     }
    },
    "outputs": [
@@ -243,7 +250,7 @@
      "output_type": "stream",
      "text": [
       "Dividing the original melting point data by 2 results in:\n",
-      "[49.75 49.95 49.8  49.65 49.85 49.7  49.7  49.75 49.65 49.85]\n"
+      "[49.25 49.95 50.3  49.65 50.35 49.7  49.2  49.75 49.65 50.35]\n"
      ]
     }
    ],
@@ -254,6 +261,30 @@
     "print(division_example)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Speed up and datatypes\n",
+    "\n",
+    "Why is NumPy's broadcasting capabilities preferable to using lists? **NumPy arrays are significantly faster**. For example, if we were to take an array and a list of 100 values each, and we wished to enact the four simple operations above on every element in each collection then it would take the following times for python to complete these calculations. \n",
+    "\n",
+    "| Operation    | Array  | List  | Speed up factor  |\n",
+    "| ------------- |:-------------:|:-------:|:------:\n",
+    "| Addition      |  9.92E-07 | 6.66E-6  | 6.7  |\n",
+    "| Subtraction      |  1.04E-6 | 9.73E-6  | 9.4   |\n",
+    "| Multiplication      | 1.03E-6  | 8.66E-6  |  8.4 |\n",
+    "| Division   | 1.11E-6  | 9.11E-6  | 8.2  |\n",
+    "\n",
+    "The use of NumPy arrays provides a significant speed up. This is just for one operation. When compounded throughout a code, this could be the difference between making a code feasible to run or not. \n",
+    "\n",
+    "This speed up occurs because an array is a simpler, less maleable collection than a list. As you have previously seen, a list can contain any type of data. Arrays are designed to only use one type of data. Every value in the array should be the same data type for maximum efficient ie all pieces of data should be floats.\n",
+    "\n",
+    "\n",
+    "> #### A note on NumPy datatypes\n",
+    ">  It is possible to have different types of data in a NumPy array. **This is not recommended**. Using different datatypes in an NumPy array will, at best, significantly reduce the efficiency of your code or, at worst, may stop your code working completely. "
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -268,17 +299,17 @@
    "execution_count": 9,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.351301Z",
-     "start_time": "2020-05-03T17:23:26.343358Z"
+     "end_time": "2020-05-04T17:48:22.091657Z",
+     "start_time": "2020-05-04T17:48:22.079804Z"
     }
    },
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "The mean of our data is  100.03 degrees Celsius.\n",
-      "The median of our data is  100.0 degrees Celsius.\n"
+      "The mean of our data is  100.13 degrees Celsius.\n",
+      "The median of our data is  99.95 degrees Celsius.\n"
      ]
     }
    ],
@@ -294,7 +325,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "It can be seen the mean and median melting points determined by our ten experiments are 100.03 $^{o}$C and 100.0 $^{o}$C respectively. To obtain the mean and median values we had to utilise the mean and median functions inside the numpy package. This was indicated by numpy.mean and numpy.median . The word before the dot is the package we wish to use, and the word after the dot is the function we wish to use.  \n",
+    "It can be seen the mean and median melting points determined by our ten experiments are 100.13 $^{o}$C and 99.95 $^{o}$C respectively. To obtain the mean and median values we had to utilise the mean and median functions inside the numpy package. This was indicated by numpy.mean and numpy.median . The word before the dot is the package we wish to use, and the word after the dot is the function we wish to use.  \n",
     "\n",
     "NumPy is a very common package in python. It can become laborious writing numpy.function whenever we wish to utilise one of its functions. Programmers often look to be as efficient as possible with their time and use the following to reduce what they have to type"
    ]
@@ -304,17 +335,17 @@
    "execution_count": 10,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.363042Z",
-     "start_time": "2020-05-03T17:23:26.354305Z"
+     "end_time": "2020-05-04T17:48:22.106626Z",
+     "start_time": "2020-05-04T17:48:22.094406Z"
     }
    },
    "outputs": [
     {
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "The mean of our data is  100.03 degrees Celsius.\n",
-      "The median of our data is  100.0 degrees Celsius.\n"
+      "The mean of our data is  100.13 degrees Celsius.\n",
+      "The median of our data is  99.95 degrees Celsius.\n"
      ]
     }
    ],
@@ -355,8 +386,8 @@
    "execution_count": 11,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.372703Z",
-     "start_time": "2020-05-03T17:23:26.366079Z"
+     "end_time": "2020-05-04T17:48:22.116600Z",
+     "start_time": "2020-05-04T17:48:22.109204Z"
     }
    },
    "outputs": [],
@@ -383,8 +414,8 @@
    "execution_count": 12,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.394695Z",
-     "start_time": "2020-05-03T17:23:26.376334Z"
+     "end_time": "2020-05-04T17:48:22.134729Z",
+     "start_time": "2020-05-04T17:48:22.119998Z"
     }
    },
    "outputs": [
@@ -417,8 +448,8 @@
    "execution_count": 13,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.404011Z",
-     "start_time": "2020-05-03T17:23:26.397718Z"
+     "end_time": "2020-05-04T17:48:22.144742Z",
+     "start_time": "2020-05-04T17:48:22.137763Z"
     }
    },
    "outputs": [
@@ -448,11 +479,11 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 16,
+   "execution_count": 14,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:29:16.555475Z",
-     "start_time": "2020-05-03T17:29:16.549626Z"
+     "end_time": "2020-05-04T17:48:22.154779Z",
+     "start_time": "2020-05-04T17:48:22.147119Z"
     }
    },
    "outputs": [
@@ -462,7 +493,7 @@
        "array([ 0,  1,  4, -5,  3,  0])"
       ]
      },
-     "execution_count": 16,
+     "execution_count": 14,
      "metadata": {},
      "output_type": "execute_result"
     }
@@ -492,7 +523,7 @@
     "| 4 |   78.6  $\\pm$ 0.1  |\n",
     "| 5 |    78.4 $\\pm$ 0.1  |\n",
     "| 6 |   77.9   $\\pm$ 0.1 |\n",
-    "| 7 |    99.4  $\\pm$ 0.1 |\n",
+    "| 7 |    79.4  $\\pm$ 0.1 |\n",
     "| 8 |    78.0  $\\pm$ 0.1 |\n",
     "| 9 |    78.7  $\\pm$ 0.1 |\n",
     "| 10 |   78.4  $\\pm$ 0.1  |\n",
@@ -511,8 +542,8 @@
    "execution_count": 15,
    "metadata": {
     "ExecuteTime": {
-     "end_time": "2020-05-03T17:23:26.421014Z",
-     "start_time": "2020-05-03T17:23:26.417948Z"
+     "end_time": "2020-05-04T17:48:22.165877Z",
+     "start_time": "2020-05-04T17:48:22.158093Z"
     }
    },
    "outputs": [],