diff --git a/.DS_Store b/.DS_Store
new file mode 100644
index 0000000..5008ddf
Binary files /dev/null and b/.DS_Store differ
diff --git a/00-pil.ipynb b/00-pil.ipynb
deleted file mode 100644
index 656e91d..0000000
--- a/00-pil.ipynb
+++ /dev/null
@@ -1,622 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "view-in-github"
- },
- "source": [
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "copyright"
- },
- "source": [
- "#### Copyright 2020 Google LLC."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "24p97VuTvYVT"
- },
- "outputs": [],
- "source": [
- "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
- "# you may not use this file except in compliance with the License.\n",
- "# You may obtain a copy of the License at\n",
- "#\n",
- "# https://www.apache.org/licenses/LICENSE-2.0\n",
- "#\n",
- "# Unless required by applicable law or agreed to in writing, software\n",
- "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
- "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
- "# See the License for the specific language governing permissions and\n",
- "# limitations under the License."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "CVmV0M74xwm7"
- },
- "source": [
- "# Manipulating an Image in Python"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "XuFjSsW53I9_"
- },
- "source": [
- "So far in this course, the data that we have encountered has been in some text format, such as comma separated values of strings and numbers. Other data has been directly loaded from scikit-learn as a `Bunch` of NumPy arrays, also containing strings and numbers.\n",
- "\n",
- "Data scientists sometimes find themselves working with collections of images, which are represented in a much more compact binary format. One of the most common examples of working with images is image classification, e.g., reverse image search.\n",
- "\n",
- "These images are often contained in a zip file, but they can also be in a directory on your computer or even on the internet. Once you have the images, you'll typically need to perform some type of preprocessing on them before you can do any sort of modeling.\n",
- "\n",
- "Most models expect a specific size of image, so you'll need to resize the images you feed your model if they differ from what is expected. Resizing might include cropping, stretching, padding, and scaling an image. Resizing to a smaller size also helps speed up your model by reducing the size of the input data.\n",
- "\n",
- "Images can also be encoded in many different ways. Some are grayscale; others are color. Color images might be encoded red-green-blue (RGB), blue-green-red (BGR), rgb-alpha, bgr-alpha, hue-saturation-lightness (HSL), hue-saturation-value (HSV), or some other encoding scheme. You will need to make sure your input images' encoding for all of your training data is the same.\n",
- "\n",
- "It is also common to normalize or standardize your images, which are just two different ways of reducing a wide range of pixel values (typically `0 `to `255` inclusive) into a tighter range.\n",
- "\n",
- "This might all sound like a lot of work, and it is. Fortunately, though, you don't have to worry too much about the details. There are numerous Python toolkits for manipulating images. In this unit, we will use the [Image](https://pillow.readthedocs.io/en/stable/reference/Image.html) and [ImageOps](https://pillow.readthedocs.io/en/stable/reference/ImageOps.html) modules from the [PIL (now called Pillow)](https://python-pillow.org/) library."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "jk3COdKGIB7l"
- },
- "source": [
- "## Get Image\n",
- "\n",
- "The image that we will work with comes from [Pixabay](https://pixabay.com/photos/running-shoe-shoe-brooks-371624/). On the image page, you'll see the option to download it. Choose the 1920x1280 version of the image.\n",
- "\n",
- "After you have download the image to your computer, upload it into this Colab by running the code block below, clicking \"Choose Files\" in the form that appears, selecting the image that was just downloaded from the dialog box, and then pressing \"Open\". You should see messages about the file being uploaded and then eventually you'll see a notification that the file upload is complete."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "euRZuE9MLHd0"
- },
- "outputs": [],
- "source": [
- "from google.colab import files\n",
- "\n",
- "uploaded = files.upload()\n",
- "\n",
- "for fn in uploaded.keys():\n",
- " print('User uploaded file \"{name}\" with length {length} bytes'.format(\n",
- " name=fn, length=len(uploaded[fn])))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "LboFaJ47r6x-"
- },
- "source": [
- "We can now take a look at our image to see if we uploaded it properly. To do this we will use [Matplotlib](https://matplotlib.org/) to display the image. But first we must load the image from the virtual machine hosting this Colab. Right now that image is stored on the virtual machine's hard drive.\n",
- "\n",
- "We'll use Pillow's `Image` module to open the file.\n",
- "\n",
- "Notice we use a [context block](https://docs.python.org/2.5/whatsnew/pep-343.html) to automatically close the image we opened in order to free up resources. We could also have just explicitly called close after we were done with the image."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "xhd_d5vzrc_h"
- },
- "outputs": [],
- "source": [
- "from PIL import Image\n",
- "import matplotlib.pyplot as plt\n",
- "\n",
- "image_file = \"running-shoe-371624_1920.jpg\"\n",
- "\n",
- "with Image.open(image_file) as sneaker:\n",
- " plt.imshow(sneaker)\n",
- " plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "-Goq0Frawyuk"
- },
- "source": [
- "## Reshaping\n",
- "\n",
- "The image we currently have is wider than it is tall (landscape). It could have just as easily been taller than it is wide (portrait). It could have even been a square.\n",
- "\n",
- "Does the model care? In some ways it does, and in others it doesn't. The model needs consistent inputs. These could be of any shape and size, but they must be consistent throughout the modeling.\n",
- "\n",
- "First, we should know the size of the image we are working with. We can get that from Pillow by simply asking for the image size."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "LfcdK_js_dQd"
- },
- "outputs": [],
- "source": [
- "from PIL import Image\n",
- "import matplotlib.pyplot as plt\n",
- "\n",
- "image_width_height = None\n",
- "\n",
- "with Image.open(image_file) as sneaker:\n",
- " image_width_height = sneaker.size\n",
- "\n",
- "print(image_width_height)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "u617UoBvzI_A"
- },
- "source": [
- "As expected, we have dimensions that indicate that we have an image in landscape: 1920 pixels wide and 1280 pixels tall.\n",
- "\n",
- "Now we have to figure out *if* and *how* to reshape it.\n",
- "\n",
- "For the question of *if*, let's assume that we expect a variable set of input shapes, and based on this, we believe that reshaping is necessary.\n",
- "\n",
- "Now we need to think about *how* to reshape the image. *How* can take many different formats:\n",
- "\n",
- "* Do we find the smaller dimension and just add blank padding to it until it is the same size as the larger dimension?\n",
- " * If so, do we pad one side? Both?\n",
- " * And what pixel value(s) do we use for the padding? Min? Max? Average? Other?\n",
- "* Do we crop a fixed portion of the image?\n",
- " * If so, do we center? Randomly crop? Multiple times?\n",
- "* Do we simply resize the image and let it be proportionally distorted?\n",
- "\n",
- "The answer to all of these questions completely depends on your problem domain and use case. This is actually part of the **science** of data science. Hypothesize, experiment, repeat.\n",
- "\n",
- "But for this Colab, we have to make a definitive decision. For simplicity, we will choose to evenly pad the smaller dimension with white pixels as evenly as possible on either side.\n",
- "\n",
- "To do this we first need to find the larger side (height or width) of the image."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "nX-sWA3Z2uu3"
- },
- "outputs": [],
- "source": [
- "max_dimension = max(image_width_height)\n",
- "\n",
- "print(max_dimension)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "r8Uv3iI725Mb"
- },
- "source": [
- "Now we need to find out how much padding we need to add to each side of the image. The longer side shouldn't get any extra padding, and since we want to make the image a square, the shorter side should get enough padding to make it equal to the longer side.\n",
- "\n",
- "In this case we have a landscape picture. Therefore, no extra width is needed, and 640 pixels of height is needed."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "O-6VBoda3agD"
- },
- "outputs": [],
- "source": [
- "width_padding = max_dimension - image_width_height[0]\n",
- "height_padding = max_dimension - image_width_height[1]\n",
- "\n",
- "print(\"Width padding: {}, Height padding: {}\".format(width_padding, height_padding))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "B7Cl6fRyDuDI"
- },
- "source": [
- "We don't want all of the padding to be on one side of the image, though. We need to split the amount of padding in half and then add each half of the padding to each side of the shorter dimension.\n",
- "\n",
- "There is a problem when the padding is an odd number of pixels. A half of a pixel doesn't make sense, so instead we just need to choose a side of the image to put the extra bit of padding onto. In order to do this, we first do integer division to split the padding in half and then use subtraction to find the size of the other portion of the padding."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "bsve7z4TFJyk"
- },
- "outputs": [],
- "source": [
- "left_padding = width_padding // 2\n",
- "right_padding = width_padding - left_padding\n",
- "\n",
- "top_padding = height_padding // 2\n",
- "bottom_padding = height_padding - top_padding\n",
- "\n",
- "print(\"Left padding: {}, Top padding {}, Right padding: {}, Bottom padding {}\".format(\n",
- " left_padding, \n",
- " top_padding, \n",
- " right_padding, \n",
- " bottom_padding))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "ucstp6tBFt94"
- },
- "source": [
- "Now that we know how much padding to add to the image, we can do so by asking Pillow to expand the image.\n",
- "\n",
- "We asked for the padding to be white (RGBA all `255`). This made sense in this case because this particular image contains one \"object\" and a solid white background. If your images are not so well-produced, you might need to use a different strategy for coloring the image padding."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "Uj3FVyQ7Fx_x"
- },
- "outputs": [],
- "source": [
- "from PIL import ImageOps\n",
- "\n",
- "padding = (\n",
- " left_padding, \n",
- " top_padding, \n",
- " right_padding, \n",
- " bottom_padding\n",
- ")\n",
- "\n",
- "image = Image.open(\"running-shoe-371624_1920.jpg\")\n",
- "padded_image = ImageOps.expand(image, padding, (255,255,255,255))\n",
- "image.close()\n",
- "\n",
- "_ = plt.imshow(padded_image)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "yG5AIH79JE2F"
- },
- "source": [
- "We will do one final check to confirm that the image is indeed a square now. You should now have a `1920x1920` image."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "8uP8N77nJCcQ"
- },
- "outputs": [],
- "source": [
- "padded_image.size"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "w9Clq0KWL2dD"
- },
- "source": [
- "## Scale the Image"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "f8DaPLm2tki9"
- },
- "source": [
- "`1920x1920` is a pretty big image for a machine learning model to handle. If each pixel were used as input, that would be `3,686,400` values in the input vector for a model. It is common for each pixel to have three or four channels for a color image: red, green, blue, alpha. If there are four channels, the actual number of inputs is actually `14,745,600` for this image.\n",
- "\n",
- "A common strategy to reduce the size of the inputs is to scale it down. Let's use Pillow to do that."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "GfhuSpMqfngD"
- },
- "outputs": [],
- "source": [
- "desired_size = (200, 200)\n",
- "\n",
- "resized_image = padded_image.resize(desired_size, Image.ANTIALIAS)\n",
- "_ = plt.imshow(resized_image)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "ODIrlPDiJrDP"
- },
- "source": [
- "We can see the exact size of the resized image."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "VFcv3PYzJwjl"
- },
- "outputs": [],
- "source": [
- "resized_image.size"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "uLCSnMlwJy3j"
- },
- "source": [
- "Padding before resizing ensures that we don't distort the shape of the contents of our image, but it did require that we apply an artificial background.\n",
- "\n",
- "We could have also just scaled the image into a `200x200` square and distorted the image.\n",
- "\n",
- "Which is better really depends on what type of image you have coming into your system and the problem you are trying to solve."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "SdqWCg3eLldl"
- },
- "source": [
- "# Exercises"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "xs-QlHuc6IbK"
- },
- "source": [
- "## Exercise 1"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "axURTCgJwOjY"
- },
- "source": [
- "Your turn! Find another sneaker image and make it square and a size of 100 by 100 pixels. Use your favorite image search website if you don't have a sneaker image handy, e.g.: [Pixabay](https://pixabay.com), [Google Image Search](https://images.google.com), etc."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "tJiERBW68otc"
- },
- "source": [
- "### **Student Solution**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "mM2z0RJQfWev"
- },
- "outputs": [],
- "source": [
- "# Upload the file you just downloaded from your computer to the Colab runtime\n",
- "\n",
- "from google.colab import files\n",
- "\n",
- "uploaded = files.upload()\n",
- "\n",
- "for fn in uploaded.keys():\n",
- " print('User uploaded file \"{name}\" with length {length} bytes'.format(\n",
- " name=fn, length=len(uploaded[fn])))\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "NlafXcm_wMNc"
- },
- "outputs": [],
- "source": [
- "### YOUR CODE HERE ###\n",
- "\n",
- "# Open the image file and plot the image\n",
- "\n",
- "# Print the dimension of the image\n",
- "\n",
- "# Find the longer dimension \n",
- "\n",
- "# Compute the delta width and height\n",
- "\n",
- "# Compute the padding amounts\n",
- "\n",
- "# Pad and plot the image\n",
- "\n",
- "# Resize and plot the image\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "kF3UCE1VZJlJ"
- },
- "source": [
- "---"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "-SK6FT1nwFkP"
- },
- "source": [
- "## Exercise 2"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "T7EgQSQCwjrK"
- },
- "source": [
- "Pick one of the images above, and do the following:\n",
- "\n",
- "1. Flip the image horizontally (left to right).\n",
- "2. Then, save the flipped image back to overwrite the original image file.\n",
- "\n",
- "Resource: [PIL Reference Guide](https://pillow.readthedocs.io/en/3.0.x/reference/ImageOps.html)\n",
- "\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "ViMkTsVX8zV9"
- },
- "source": [
- "### **Student Solution**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "tBTnDLI8ZBry"
- },
- "outputs": [],
- "source": [
- "### YOUR CODE HERE ###\n",
- "\n",
- "# Flip the image horizontally (left to right)\n",
- "\n",
- "# Plot the image to show the image is indeed flipped horizontally\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "tY0967odCyZs"
- },
- "outputs": [],
- "source": [
- "### YOUR CODE HERE ###\n",
- "\n",
- "# Save newly generated image to the folder\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "ixGu-OcaZMsj"
- },
- "source": [
- "---"
- ]
- }
- ],
- "metadata": {
- "colab": {
- "collapsed_sections": [
- "copyright",
- "exercise-1-key-1",
- "exercise-2-key-1"
- ],
- "include_colab_link": true,
- "name": "Manipulating an Image in Python",
- "private_outputs": true,
- "provenance": [],
- "toc_visible": true
- },
- "kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
diff --git a/00_pil_DONE.ipynb b/00_pil_DONE.ipynb
new file mode 100644
index 0000000..29dd3ed
--- /dev/null
+++ b/00_pil_DONE.ipynb
@@ -0,0 +1,638 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "name": "00-pil_DONE.ipynb",
+ "private_outputs": true,
+ "provenance": [],
+ "collapsed_sections": [
+ "copyright",
+ "exercise-1-key-1",
+ "exercise-2-key-1"
+ ],
+ "toc_visible": true
+ },
+ "interpreter": {
+ "hash": "5fb980b2d143a7e5753fbd5e6a424b5332c0eb07d7e6b6440c0e082ef58dd123"
+ },
+ "kernelspec": {
+ "display_name": "Python 3.7.10 64-bit ('tf': conda)",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.8"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github"
+ },
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "copyright"
+ },
+ "source": [
+ "#### Copyright 2020 Google LLC."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "24p97VuTvYVT"
+ },
+ "source": [
+ "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "CVmV0M74xwm7"
+ },
+ "source": [
+ "# Manipulating an Image in Python"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "XuFjSsW53I9_"
+ },
+ "source": [
+ "So far in this course, the data that we have encountered has been in some text format, such as comma separated values of strings and numbers. Other data has been directly loaded from scikit-learn as a `Bunch` of NumPy arrays, also containing strings and numbers.\n",
+ "\n",
+ "Data scientists sometimes find themselves working with collections of images, which are represented in a much more compact binary format. One of the most common examples of working with images is image classification, e.g., reverse image search.\n",
+ "\n",
+ "These images are often contained in a zip file, but they can also be in a directory on your computer or even on the internet. Once you have the images, you'll typically need to perform some type of preprocessing on them before you can do any sort of modeling.\n",
+ "\n",
+ "Most models expect a specific size of image, so you'll need to resize the images you feed your model if they differ from what is expected. Resizing might include cropping, stretching, padding, and scaling an image. Resizing to a smaller size also helps speed up your model by reducing the size of the input data.\n",
+ "\n",
+ "Images can also be encoded in many different ways. Some are grayscale; others are color. Color images might be encoded red-green-blue (RGB), blue-green-red (BGR), rgb-alpha, bgr-alpha, hue-saturation-lightness (HSL), hue-saturation-value (HSV), or some other encoding scheme. You will need to make sure your input images' encoding for all of your training data is the same.\n",
+ "\n",
+ "It is also common to normalize or standardize your images, which are just two different ways of reducing a wide range of pixel values (typically `0 `to `255` inclusive) into a tighter range.\n",
+ "\n",
+ "This might all sound like a lot of work, and it is. Fortunately, though, you don't have to worry too much about the details. There are numerous Python toolkits for manipulating images. In this unit, we will use the [Image](https://pillow.readthedocs.io/en/stable/reference/Image.html) and [ImageOps](https://pillow.readthedocs.io/en/stable/reference/ImageOps.html) modules from the [PIL (now called Pillow)](https://python-pillow.org/) library."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jk3COdKGIB7l"
+ },
+ "source": [
+ "## Get Image\n",
+ "\n",
+ "The image that we will work with comes from [Pixabay](https://pixabay.com/photos/running-shoe-shoe-brooks-371624/). On the image page, you'll see the option to download it. Choose the 1920x1280 version of the image.\n",
+ "\n",
+ "After you have download the image to your computer, upload it into this Colab by running the code block below, clicking \"Choose Files\" in the form that appears, selecting the image that was just downloaded from the dialog box, and then pressing \"Open\". You should see messages about the file being uploaded and then eventually you'll see a notification that the file upload is complete."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "euRZuE9MLHd0"
+ },
+ "source": [
+ "from google.colab import files\n",
+ "\n",
+ "uploaded = files.upload()\n",
+ "\n",
+ "for fn in uploaded.keys():\n",
+ " print('User uploaded file \"{name}\" with length {length} bytes'.format(\n",
+ " name=fn, length=len(uploaded[fn])))"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "LboFaJ47r6x-"
+ },
+ "source": [
+ "We can now take a look at our image to see if we uploaded it properly. To do this we will use [Matplotlib](https://matplotlib.org/) to display the image. But first we must load the image from the virtual machine hosting this Colab. Right now that image is stored on the virtual machine's hard drive.\n",
+ "\n",
+ "We'll use Pillow's `Image` module to open the file.\n",
+ "\n",
+ "Notice we use a [context block](https://docs.python.org/2.5/whatsnew/pep-343.html) to automatically close the image we opened in order to free up resources. We could also have just explicitly called close after we were done with the image."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "xhd_d5vzrc_h"
+ },
+ "source": [
+ "from PIL import Image\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "image_file = \"running-shoe-371624_1920.jpg\"\n",
+ "\n",
+ "with Image.open(image_file) as sneaker:\n",
+ " plt.imshow(sneaker)\n",
+ " plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-Goq0Frawyuk"
+ },
+ "source": [
+ "## Reshaping\n",
+ "\n",
+ "The image we currently have is wider than it is tall (landscape). It could have just as easily been taller than it is wide (portrait). It could have even been a square.\n",
+ "\n",
+ "Does the model care? In some ways it does, and in others it doesn't. The model needs consistent inputs. These could be of any shape and size, but they must be consistent throughout the modeling.\n",
+ "\n",
+ "First, we should know the size of the image we are working with. We can get that from Pillow by simply asking for the image size."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "LfcdK_js_dQd"
+ },
+ "source": [
+ "from PIL import Image\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "image_width_height = None\n",
+ "\n",
+ "with Image.open(image_file) as sneaker:\n",
+ " image_width_height = sneaker.size\n",
+ "\n",
+ "print(image_width_height)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "u617UoBvzI_A"
+ },
+ "source": [
+ "As expected, we have dimensions that indicate that we have an image in landscape: 1920 pixels wide and 1280 pixels tall.\n",
+ "\n",
+ "Now we have to figure out *if* and *how* to reshape it.\n",
+ "\n",
+ "For the question of *if*, let's assume that we expect a variable set of input shapes, and based on this, we believe that reshaping is necessary.\n",
+ "\n",
+ "Now we need to think about *how* to reshape the image. *How* can take many different formats:\n",
+ "\n",
+ "* Do we find the smaller dimension and just add blank padding to it until it is the same size as the larger dimension?\n",
+ " * If so, do we pad one side? Both?\n",
+ " * And what pixel value(s) do we use for the padding? Min? Max? Average? Other?\n",
+ "* Do we crop a fixed portion of the image?\n",
+ " * If so, do we center? Randomly crop? Multiple times?\n",
+ "* Do we simply resize the image and let it be proportionally distorted?\n",
+ "\n",
+ "The answer to all of these questions completely depends on your problem domain and use case. This is actually part of the **science** of data science. Hypothesize, experiment, repeat.\n",
+ "\n",
+ "But for this Colab, we have to make a definitive decision. For simplicity, we will choose to evenly pad the smaller dimension with white pixels as evenly as possible on either side.\n",
+ "\n",
+ "To do this we first need to find the larger side (height or width) of the image."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "nX-sWA3Z2uu3"
+ },
+ "source": [
+ "max_dimension = max(image_width_height)\n",
+ "\n",
+ "print(max_dimension)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "r8Uv3iI725Mb"
+ },
+ "source": [
+ "Now we need to find out how much padding we need to add to each side of the image. The longer side shouldn't get any extra padding, and since we want to make the image a square, the shorter side should get enough padding to make it equal to the longer side.\n",
+ "\n",
+ "In this case we have a landscape picture. Therefore, no extra width is needed, and 640 pixels of height is needed."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "O-6VBoda3agD"
+ },
+ "source": [
+ "width_padding = max_dimension - image_width_height[0]\n",
+ "height_padding = max_dimension - image_width_height[1]\n",
+ "\n",
+ "print(\"Width padding: {}, Height padding: {}\".format(width_padding, height_padding))"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "B7Cl6fRyDuDI"
+ },
+ "source": [
+ "We don't want all of the padding to be on one side of the image, though. We need to split the amount of padding in half and then add each half of the padding to each side of the shorter dimension.\n",
+ "\n",
+ "There is a problem when the padding is an odd number of pixels. A half of a pixel doesn't make sense, so instead we just need to choose a side of the image to put the extra bit of padding onto. In order to do this, we first do integer division to split the padding in half and then use subtraction to find the size of the other portion of the padding."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "bsve7z4TFJyk"
+ },
+ "source": [
+ "left_padding = width_padding // 2\n",
+ "right_padding = width_padding - left_padding\n",
+ "\n",
+ "top_padding = height_padding // 2\n",
+ "bottom_padding = height_padding - top_padding\n",
+ "\n",
+ "print(\"Left padding: {}, Top padding {}, Right padding: {}, Bottom padding {}\".format(\n",
+ " left_padding, \n",
+ " top_padding, \n",
+ " right_padding, \n",
+ " bottom_padding))"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ucstp6tBFt94"
+ },
+ "source": [
+ "Now that we know how much padding to add to the image, we can do so by asking Pillow to expand the image.\n",
+ "\n",
+ "We asked for the padding to be white (RGBA all `255`). This made sense in this case because this particular image contains one \"object\" and a solid white background. If your images are not so well-produced, you might need to use a different strategy for coloring the image padding."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Uj3FVyQ7Fx_x"
+ },
+ "source": [
+ "from PIL import ImageOps\n",
+ "\n",
+ "padding = (\n",
+ " left_padding, \n",
+ " top_padding, \n",
+ " right_padding, \n",
+ " bottom_padding\n",
+ ")\n",
+ "\n",
+ "image = Image.open(\"running-shoe-371624_1920.jpg\")\n",
+ "padded_image = ImageOps.expand(image, padding, (255,255,255,255))\n",
+ "image.close()\n",
+ "\n",
+ "_ = plt.imshow(padded_image)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yG5AIH79JE2F"
+ },
+ "source": [
+ "We will do one final check to confirm that the image is indeed a square now. You should now have a `1920x1920` image."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "8uP8N77nJCcQ"
+ },
+ "source": [
+ "padded_image.size"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "w9Clq0KWL2dD"
+ },
+ "source": [
+ "## Scale the Image"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "f8DaPLm2tki9"
+ },
+ "source": [
+ "`1920x1920` is a pretty big image for a machine learning model to handle. If each pixel were used as input, that would be `3,686,400` values in the input vector for a model. It is common for each pixel to have three or four channels for a color image: red, green, blue, alpha. If there are four channels, the actual number of inputs is actually `14,745,600` for this image.\n",
+ "\n",
+ "A common strategy to reduce the size of the inputs is to scale it down. Let's use Pillow to do that."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "GfhuSpMqfngD"
+ },
+ "source": [
+ "desired_size = (200, 200)\n",
+ "\n",
+ "resized_image = padded_image.resize(desired_size, Image.ANTIALIAS)\n",
+ "_ = plt.imshow(resized_image)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ODIrlPDiJrDP"
+ },
+ "source": [
+ "We can see the exact size of the resized image."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "VFcv3PYzJwjl"
+ },
+ "source": [
+ "resized_image.size"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "uLCSnMlwJy3j"
+ },
+ "source": [
+ "Padding before resizing ensures that we don't distort the shape of the contents of our image, but it did require that we apply an artificial background.\n",
+ "\n",
+ "We could have also just scaled the image into a `200x200` square and distorted the image.\n",
+ "\n",
+ "Which is better really depends on what type of image you have coming into your system and the problem you are trying to solve."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SdqWCg3eLldl"
+ },
+ "source": [
+ "# Exercises"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xs-QlHuc6IbK"
+ },
+ "source": [
+ "## Exercise 1"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "axURTCgJwOjY"
+ },
+ "source": [
+ "Your turn! Find another sneaker image and make it square and a size of 100 by 100 pixels. Use your favorite image search website if you don't have a sneaker image handy, e.g.: [Pixabay](https://pixabay.com), [Google Image Search](https://images.google.com), etc."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "tJiERBW68otc"
+ },
+ "source": [
+ "### **Student Solution**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "mM2z0RJQfWev"
+ },
+ "source": [
+ "# Upload the file you just downloaded from your computer to the Colab runtime\n",
+ "\n",
+ "from google.colab import files\n",
+ "\n",
+ "uploaded = files.upload()\n",
+ "\n",
+ "for fn in uploaded.keys():\n",
+ " print('User uploaded file \"{name}\" with length {length} bytes'.format(\n",
+ " name=fn, length=len(uploaded[fn])))\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "NlafXcm_wMNc"
+ },
+ "source": [
+ "### YOUR CODE HERE ###\n",
+ "\n",
+ "# Open the image file and plot the image\n",
+ "image_file = \"Bob-from-the-minions-movie.jpeg\"\n",
+ "image_size = None\n",
+ "\n",
+ "with Image.open(image_file) as minions:\n",
+ " plt.imshow(minions)\n",
+ " image_size = minions.size\n",
+ " plt.show()\n",
+ "\n",
+ "# Print the dimension of the image\n",
+ "print(image_size)\n",
+ "# Find the longer dimension \n",
+ "max_dimension = max(image_size)\n",
+ "print(max_dimension)\n",
+ "# Compute the delta width and height\n",
+ "width_padding = max_dimension - image_size[0]\n",
+ "height_padding = max_dimension - image_size[1]\n",
+ "print(\"Width padding: {}, Height padding: {}\".format(width_padding, height_padding))\n",
+ "# Compute the padding amounts\n",
+ "left_padding = width_padding // 2\n",
+ "right_padding = width_padding - left_padding\n",
+ "top_padding = height_padding // 2\n",
+ "bottom_padding = height_padding - top_padding\n",
+ "\n",
+ "print(\"Left padding: {}, Top padding {}, Right padding: {}, Bottom padding {}\".format(\n",
+ " left_padding, \n",
+ " top_padding, \n",
+ " right_padding, \n",
+ " bottom_padding))\n",
+ "# Pad and plot the image\n",
+ "padding = (left_padding, top_padding, right_padding, bottom_padding)\n",
+ "image = Image.open(\"Bob-from-the-minions-movie.jpeg\")\n",
+ "padded_image = ImageOps.expand(image, padding, (255,255,255,255))\n",
+ "image.close()\n",
+ "_ = plt.imshow(padded_image)\n",
+ "# Resize and plot the image\n",
+ "#padded_image.size\n",
+ "desired_size = (100, 100)\n",
+ "\n",
+ "resized_image = padded_image.resize(desired_size, Image.ANTIALIAS)\n",
+ "_ = plt.imshow(resized_image)\n",
+ "resized_image.size"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kF3UCE1VZJlJ"
+ },
+ "source": [
+ "---"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-SK6FT1nwFkP"
+ },
+ "source": [
+ "## Exercise 2"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "T7EgQSQCwjrK"
+ },
+ "source": [
+ "Pick one of the images above, and do the following:\n",
+ "\n",
+ "1. Flip the image horizontally (left to right).\n",
+ "2. Then, save the flipped image back to overwrite the original image file.\n",
+ "\n",
+ "Resource: [PIL Reference Guide](https://pillow.readthedocs.io/en/3.0.x/reference/ImageOps.html)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ViMkTsVX8zV9"
+ },
+ "source": [
+ "### **Student Solution**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "tBTnDLI8ZBry"
+ },
+ "source": [
+ "### YOUR CODE HERE ###\n",
+ "import PIL\n",
+ "from google.colab import files\n",
+ "\n",
+ "uploaded = files.upload()\n",
+ "\n",
+ "for fn in uploaded.keys():\n",
+ " print('User uploaded file \"{name}\" with length {length} bytes'.format(\n",
+ " name=fn, length=len(uploaded[fn])))"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "wAwS5HmFlClN"
+ },
+ "source": [
+ "image_file = 'k-s-pets-services-ecil-hyderabad-pet-care-takers-1knoqwn9vh-1.jpeg'\n",
+ "flipped_image = 'k-s-pets-services-ecil-hyderabad-pet-care-takers-1knoqwn9vh-1_flipped.jpeg'\n",
+ "# Flip the image horizontally (left to right)\n",
+ "with Image.open(image_file) as dog:\n",
+ " plt.imshow(dog)\n",
+ " image_size = dog.size\n",
+ " plt.show()\n",
+ " _dog = dog.transpose(PIL.Image.FLIP_LEFT_RIGHT)\n",
+ " plt.imshow(_dog)\n",
+ " _dog.save(flipped_image)\n",
+ "# Plot the image to show the image is indeed flipped horizontally\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "tY0967odCyZs"
+ },
+ "source": [
+ "### YOUR CODE HERE ###\n",
+ "# Save newly generated image to the folder\n",
+ "with Image.open(flipped_image) as flippeddog:\n",
+ " image_size = flippeddog.size\n",
+ " plt.show()\n",
+ " plt.imshow(flippeddog)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ixGu-OcaZMsj"
+ },
+ "source": [
+ "---"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/01-open_cv.ipynb b/01-open_cv.ipynb
deleted file mode 100644
index 99ed6a2..0000000
--- a/01-open_cv.ipynb
+++ /dev/null
@@ -1,965 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "view-in-github"
- },
- "source": [
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "copyright"
- },
- "source": [
- "#### Copyright 2020 Google LLC."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "7PLP9Q30PKtv"
- },
- "outputs": [],
- "source": [
- "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
- "# you may not use this file except in compliance with the License.\n",
- "# You may obtain a copy of the License at\n",
- "#\n",
- "# https://www.apache.org/licenses/LICENSE-2.0\n",
- "#\n",
- "# Unless required by applicable law or agreed to in writing, software\n",
- "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
- "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
- "# See the License for the specific language governing permissions and\n",
- "# limitations under the License."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "f5W9rkuBmBu9"
- },
- "source": [
- "# OpenCV"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "zIykBQbYXrXA"
- },
- "source": [
- "[OpenCV](https://opencv.org/) is an open-source computer vision library. It comes packaged with many powerful computer vision tools, including image and video processing utilities. The library has a lot of the same functionality as the [Python Image Library (PIL)](https://python-pillow.org/) but also includes some computer vision support that PIL doesn't include.\n",
- "\n",
- "In this lesson we will learn how to use OpenCV to process images."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "G8u2lYRWbE37"
- },
- "source": [
- "## Load an Image"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "RRFdZwtwLKKV"
- },
- "source": [
- "Start by downloading a small (640x360) version of [this image of a car](https://pixabay.com/illustrations/car-sports-car-racing-car-speed-49278/) from Pixabay and then uploading it to this Colab.\n",
- "\n",
- "**Be sure to load the small 640x360 version of the image for this lab.**\n",
- "\n",
- "After loading the image, we can use matplotlib to view the image."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "RmoZ6R9bKnEH"
- },
- "outputs": [],
- "source": [
- "import cv2 as cv\n",
- "import matplotlib.pyplot as plt\n",
- "\n",
- "image_file = 'car-49278_640.jpg'\n",
- "\n",
- "image = cv.imread(image_file)\n",
- "\n",
- "plt.imshow(image)\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "GVzr4XfQLO23"
- },
- "source": [
- "### Color Ordering"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "bQTzqEf1K3Tv"
- },
- "source": [
- "Does something look off? Wasn't the car red when we downloaded the image?\n",
- "\n",
- "OpenCV assumes the image is stored with blue-green-red (BGR) encoding instead of [red-green-blue (RGB)](https://en.wikipedia.org/wiki/RGB_color_model), but matplotlib assumes RGB. So, the reds and blues in the image are inverted when displayed.\n",
- "\n",
- "Why does OpenCV assume images are BGR?\n",
- "\n",
- "BGR was historically a popular storage format used by digital camera manufacturers and many software packages. At the time it was a good choice for a default. Defaults are difficult to change, so BGR is here to stay in OpenCV.\n",
- "\n",
- "It doesn't really matter which format is used as long as the inputs to our model are consistent. However, it can be annoying to look at images with inverted colors. You just need to know how to tell OpenCV to fix it.\n",
- "\n",
- "Luckily it is easy to change from BGR to RGB. We can just use `cvtColor`. There are [scores of conversions](https://docs.opencv.org/3.1.0/d7/d1b/group__imgproc__misc.html) possible."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "OrxvjU6gK7Fy"
- },
- "outputs": [],
- "source": [
- "image = cv.cvtColor(image, cv.COLOR_BGR2RGB)\n",
- "\n",
- "plt.imshow(image)\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "nYyEWvL8Lf8H"
- },
- "source": [
- "## Drawing on Images"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "yMGhqtBvLilk"
- },
- "source": [
- "### Drawing Rectangles on Images\n",
- "\n",
- "Suppose we want to draw a rectangle around objects we identify in an image. This can be done with the OpenCV `rectangle` method."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "mQxOqUiSLhoN"
- },
- "outputs": [],
- "source": [
- "left = 100\n",
- "right = 580\n",
- "top = 100\n",
- "bottom = 300\n",
- "\n",
- "r = 255\n",
- "g = 0\n",
- "b = 0\n",
- "\n",
- "cv.rectangle(image, (left, top), (right, bottom), (r, g, b), thickness=2)\n",
- "plt.imshow(image)\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "-eyAAN5ZLqcX"
- },
- "source": [
- "### Drawing Text on Images\n",
- "\n",
- "You can also draw text on images using `putText`."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "ytbjF3IxLuzl"
- },
- "outputs": [],
- "source": [
- "left = 150\n",
- "top = 50\n",
- "\n",
- "r = 0\n",
- "g = 0\n",
- "b = 0\n",
- "scale = 1.0\n",
- "thickness = 2\n",
- "\n",
- "cv.putText(image, \"It is a car!\", (left, top), cv.FONT_HERSHEY_SIMPLEX, scale,\n",
- " [r, g, b], thickness)\n",
- "\n",
- "plt.imshow(image)\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "MkLGTqteLEHC"
- },
- "source": [
- "## Image Scaling"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "NW-RlIP9LHfx"
- },
- "source": [
- "Models are trained with images scaled to a specific size and are sensitive to the input size being consistent. One solution is to simply scale the image to the required size using the `resize` method.\n",
- " \n",
- "In the example below, we scale the image to `300x300` pixels. This creates a pretty distorted image, which might affect the training and predictions made by the model."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "KzsGZK3mLW5t"
- },
- "outputs": [],
- "source": [
- "image_scaled = cv.resize(image, (300, 300))\n",
- "\n",
- "plt.imshow(image_scaled)\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "mBKsIs4ykMtR"
- },
- "source": [
- "## Cropping With Edge Detection"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "mGr77pUvkROt"
- },
- "source": [
- "Another strategy is to crop the image using \"edge detection\", then scale the image after you have cropped it down. This strategy can be error-prone, but it can also be really helpful in isolating individual objects in an image.\n",
- "\n",
- "In the case of the car image that we have loaded, cropping based on edge detection is both simple and effective. In images with more noise in the background, automatic cropping will be much more difficult."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "DbHQuRbjk29f"
- },
- "source": [
- "To begin cropping, we'll rely on OpenCV's [Canny](https://docs.opencv.org/2.4/modules/imgproc/doc/feature_detection.html?highlight=canny#canny) detection algorithm."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "LlNs5Wb-ZOz3"
- },
- "outputs": [],
- "source": [
- "threshold = 200\n",
- "image = cv.imread(image_file)\n",
- "image = cv.cvtColor(image, cv.COLOR_BGR2RGB)\n",
- "edges = cv.Canny(image, threshold, threshold*2)\n",
- "\n",
- "fig, (orig, edge) = plt.subplots(2)\n",
- "orig.imshow(image, cmap='gray')\n",
- "edge.imshow(edges, cmap='gray')\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "2L5zF8EomT7v"
- },
- "source": [
- "The `threshold` parameter is a tuning value set to the images you are processing. More details can be found on [Canny's Wikipedia page](https://en.wikipedia.org/wiki/Canny_edge_detector).\n",
- "\n",
- "Let's see a few different thresholds in action."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "8TQu_1D8mTiE"
- },
- "outputs": [],
- "source": [
- "fig, (orig, t1, t50, t100, t200, t300, t500) = plt.subplots(7, figsize=(5, 25))\n",
- "\n",
- "orig.imshow(image)\n",
- "t1.imshow(cv.Canny(image, 10, 10*2), cmap='gray')\n",
- "t50.imshow(cv.Canny(image, 50, 50*2), cmap='gray')\n",
- "t100.imshow(cv.Canny(image, 100, 100*2), cmap='gray')\n",
- "t200.imshow(cv.Canny(image, 200, 200*2), cmap='gray')\n",
- "t300.imshow(cv.Canny(image, 300, 300*2), cmap='gray')\n",
- "t500.imshow(cv.Canny(image, 500, 500*2), cmap='gray')\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "ayTn7dXyoa_J"
- },
- "source": [
- "None of these settings do too badly, though a threshold of 10 has a lot of noise, and a threshold of 500 barely outlines the car. We have to remember that our goal is to build a bounding box around the car and crop on that bounding box."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "a8zzsxWko1G0"
- },
- "source": [
- "Another consideration is that the edge detection algorithm is often more effective if the image is grayscale and if there is some blurring.\n",
- " \n",
- "First let's convert the image to grayscale."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "IZSbCNDXaGSV"
- },
- "outputs": [],
- "source": [
- "img_gray = cv.cvtColor(image, cv.COLOR_RGB2GRAY)\n",
- "_ = plt.imshow(img_gray, cmap='gray')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "MHRWyjhYpN3P"
- },
- "source": [
- "And now we'll blur the image a bit."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "LIUmsFFrpL-1"
- },
- "outputs": [],
- "source": [
- "img_gray = cv.blur(img_gray, (3,3))\n",
- "_ = plt.imshow(img_gray, cmap='gray')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "-anEZ9NopW37"
- },
- "source": [
- "Given this new grayscale and blurred image, we can run the edge detection algorithm again."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "tbyr6n9Opcrt"
- },
- "outputs": [],
- "source": [
- "fig, (orig, t1, t50, t100, t200, t300, t500) = plt.subplots(7, figsize=(5, 25))\n",
- "\n",
- "orig.imshow(img_gray, cmap='gray')\n",
- "t1.imshow(cv.Canny(img_gray, 10, 10*2), cmap='gray')\n",
- "t50.imshow(cv.Canny(img_gray, 50, 50*2), cmap='gray')\n",
- "t100.imshow(cv.Canny(img_gray, 100, 100*2), cmap='gray')\n",
- "t200.imshow(cv.Canny(img_gray, 200, 200*2), cmap='gray')\n",
- "t300.imshow(cv.Canny(img_gray, 300, 300*2), cmap='gray')\n",
- "t500.imshow(cv.Canny(img_gray, 500, 500*2), cmap='gray')\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "DkIIp4JZpnlz"
- },
- "source": [
- "In this case our edges completely disappear at higher thresholds!\n",
- "\n",
- "The threshold of 200 seemed to perform reasonably well in both situations, so let's stick with that."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "P4KI438bp7Dv"
- },
- "outputs": [],
- "source": [
- "img_canny = cv.Canny(img_gray, 200, 200*2)\n",
- "\n",
- "plt.imshow(img_canny, cmap='gray')\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "grY78Cwwp-Qk"
- },
- "source": [
- "We now need to find the bounding box around the item in the image that we want to crop. The first step in doing this is to utilize the [findContours](https://docs.opencv.org/3.4/d3/dc0/group__imgproc__shape.html#ga17ed9f5d79ae97bd4c7cf18403e1689a) function. This function returns a list of contours found in the output of the Canny algorithm. The contours are defined by lists of $(x, y)$ values."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "Thhz47TwrFow"
- },
- "outputs": [],
- "source": [
- "contours, _ = cv.findContours(img_canny, cv.RETR_TREE,\n",
- " cv.CHAIN_APPROX_SIMPLE)\n",
- "\n",
- "print(len(contours))\n",
- "print(contours[0])"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "L41cRvQHr5IA"
- },
- "source": [
- "Given the contours, we can approximate the polygon that the contour forms and then create a bounding box around each contour."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "dSBmgd_fbHJn"
- },
- "outputs": [],
- "source": [
- "bounding_boxes = []\n",
- "contours_poly = []\n",
- "\n",
- "for contour in contours:\n",
- " polygon = cv.approxPolyDP(contour, 3, True)\n",
- " contours_poly.append(polygon)\n",
- " bounding_boxes.append(cv.boundingRect(polygon))\n",
- "\n",
- "print(len(contours_poly))\n",
- "print(len(bounding_boxes))\n",
- "print(bounding_boxes)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "EQ1NJtuTUS9g"
- },
- "source": [
- "Let's take a look at all of the bounding boxes on the car."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "GEDykg6JUABc"
- },
- "outputs": [],
- "source": [
- "import numpy as np\n",
- "\n",
- "image_copy = np.copy(image)\n",
- "\n",
- "x, y, width, height = largest_box\n",
- "for box in bounding_boxes:\n",
- " cv.rectangle(image_copy, \n",
- " (box[0], box[1]), (box[0]+box[2], box[1]+box[3]),\n",
- " [0, 0, 255],\n",
- " 2)\n",
- "\n",
- "_ = plt.imshow(image_copy)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "AHBNnusgUb2X"
- },
- "source": [
- "No single box seems to capture the entire car, but we can use the outer boundaries to find a unified box.\n",
- "\n",
- "We'll use a very simple algorithm that simply finds the outer boundaries and doesn't care if the boxes overlap. In practice you'd likely want to use a more sophisticated algorithm."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "xO_Tj80rUrbI"
- },
- "outputs": [],
- "source": [
- "x1, y1, x2, y2 = 640, 640, 0, 0\n",
- "\n",
- "for box in bounding_boxes:\n",
- " if box[0] < x1:\n",
- " x1 = box[0]\n",
- " if box[1] < y1:\n",
- " y1 = box[1]\n",
- " if box[0] + box[2] > x2:\n",
- " x2 = box[0] + box[2]\n",
- " if box[1] + box[3] > y2:\n",
- " y2 = box[1] + box[3]\n",
- "\n",
- "x1, y1, x2, y2"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "vUB3uIE5VWtm"
- },
- "source": [
- "And then we can draw the box."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "rS2tJnyjVYUh"
- },
- "outputs": [],
- "source": [
- "import numpy as np\n",
- "\n",
- "image_copy = np.copy(image)\n",
- "\n",
- "cv.rectangle(image_copy, \n",
- " (x1, y1), (x2, y2),\n",
- " [0, 0, 255],\n",
- " 2)\n",
- "\n",
- "_ = plt.imshow(image_copy)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "_DD8Kbvo2BWD"
- },
- "source": [
- "The box does clip the car a bit, but for the most part, the car is within the box."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "AwKwraUa2GT3"
- },
- "source": [
- "Now we need to crop the image to just the car itself.\n",
- "\n",
- "Notice that we pair the `x` coordinate with `height` and the `y` with `width`. This is because we want all of the rows for a given height and the columns for a given width."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "6aM1Q8cid_Xv"
- },
- "outputs": [],
- "source": [
- "x, y, width, height = largest_box\n",
- "cropped_img = image[y1:y2, x1:x2]\n",
- "_ = plt.imshow(cropped_img)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "bYccBq4k2bQv"
- },
- "source": [
- "Now we need to make the image into a square by padding the image. We find the longest side and then pad the shorter side with the necessary pixels to make the image a square.\n",
- "\n",
- "To add the padding we use OpenCV's `copyMakeBorder` function."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "tygNEuGXe2Aj"
- },
- "outputs": [],
- "source": [
- "height = cropped_img.shape[0]\n",
- "width = cropped_img.shape[1]\n",
- "\n",
- "left_pad, right_pad, top_pad, bottom_pad = 0, 0, 0, 0\n",
- "if height > width:\n",
- " left_pad = int((height-width) / 2)\n",
- " right_pad = height-width-left_pad\n",
- "elif width > height:\n",
- " top_pad = int((width-height) / 2)\n",
- " bottom_pad = width-height-top_pad\n",
- "\n",
- "img_square = cv.copyMakeBorder(\n",
- " cropped_img,\n",
- " top_pad,\n",
- " bottom_pad,\n",
- " left_pad,\n",
- " right_pad,\n",
- " cv.BORDER_CONSTANT,\n",
- " value=(255,255,255))\n",
- "\n",
- "_ = plt.imshow(img_square)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "-XW_qEgu23z9"
- },
- "source": [
- "And finally, we can scale the image down to a 300x300 image to feed to our model using OpenCV's `resize` function again."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "Im_q3Xf3ggP2"
- },
- "outputs": [],
- "source": [
- "image_scaled = cv.resize(img_square, (300, 300))\n",
- "\n",
- "plt.imshow(image_scaled)\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "Xo1wnMSl3BuN"
- },
- "source": [
- "## Rotating Images"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "MRANxOVN3Hgn"
- },
- "source": [
- "It is sometimes useful to rotate images before feeding them to your model. This increases the size of your training data, and it makes your model more resilient to subtle patterns that might exist within your base images.\n",
- " \n",
- "For example, in a popular fashion image dataset, most boots are pointed in one direction and sandals in the other. When the model attempts to identify a boot pointed in the wrong direction, it will often predict 'sandal' based purely on the orientation of the object.\n",
- " \n",
- "To flip an image on the horizontal or vertical axis, we can just use the `flip` function.\n",
- " \n",
- "Here is an example of flipping an image on the horizontal axis."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "1x9PemhtgrRv"
- },
- "outputs": [],
- "source": [
- "horizontal_img = cv.flip(image_scaled, 0)\n",
- "plt.imshow(horizontal_img)\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "A7IbPf3l4tbt"
- },
- "source": [
- "And now the vertical axis."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "ORzVDhHfgyEy"
- },
- "outputs": [],
- "source": [
- "vertical_img = cv.flip(image_scaled, 1)\n",
- "plt.imshow(vertical_img)\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "HfXe8OOm4vPZ"
- },
- "source": [
- "And finally, both."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "hhhdEiutg250"
- },
- "outputs": [],
- "source": [
- "horizontal_and_vertical_img = cv.flip(image_scaled, -1)\n",
- "plt.imshow(horizontal_and_vertical_img)\n",
- "plt.show()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "5WFLU7hWbO8_"
- },
- "source": [
- "# Resources\n",
- "\n",
- "* [OpenCV Documentation on Edge Detection](https://docs.opencv.org/3.4/da/d0c/tutorial_bounding_rects_circles.html)\n",
- "* Canny Edge Detector: [Wikipedia](https://en.wikipedia.org/wiki/Canny_edge_detector), [OpenCV Documentation](https://docs.opencv.org/3.1.0/da/d22/tutorial_py_canny.html)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "Swt2fxm-fG_B"
- },
- "source": [
- "# Exercises"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "vaiwWLygsq5M"
- },
- "source": [
- "## Exercise 1"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "iWq38ASlb2aY"
- },
- "source": [
- "We have seen how to rotate an image on its horizontal and vertical axes. This technique works well for increasing the size of your training set and the capabilities of your model, while also providing resiliency to biases that might be hidden in your data.\n",
- "\n",
- "It is also possible to rotate an image by different angles.\n",
- "\n",
- "Use OpenCV to take our `image_scaled` image from above and rotate it so that the car is angled at 45 degrees. Do this for every corner of the squared image.\n",
- " \n",
- "There should be eight images in total. The order of the images isn't important, but the variety is. There should be one image for each case below:\n",
- " \n",
- "1. Car pointed to the top-left corner of the image\n",
- "1. Upside-down car pointed to the top-left corner of the image\n",
- "1. Car pointed to the top-right corner of the image\n",
- "1. Upside-down car pointed to the top-right corner of the image\n",
- "1. Car pointed to the bottom-left corner of the image\n",
- "1. Upside-down car pointed to the bottom-left corner of the image\n",
- "1. Car pointed to the bottom-right corner of the image\n",
- "1. Upside-down car pointed to the bottom-right corner of the image\n",
- "\n",
- "Display the images using `matplotlib.pyplot`.\n",
- " \n",
- "Hint: Check out the `getRotationMatrix2D` and `warpAffine` methods.\n"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "CYZEXNK1VDIJ"
- },
- "source": [
- "### **Student Solution**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "TI_WxOyjcfNu"
- },
- "outputs": [],
- "source": [
- "# Your answer goes here"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "4lI64FdEeP1-"
- },
- "source": [
- "---"
- ]
- }
- ],
- "metadata": {
- "colab": {
- "collapsed_sections": [
- "copyright",
- "exercise-1-key-1"
- ],
- "include_colab_link": true,
- "name": "OpenCV",
- "private_outputs": true,
- "provenance": [],
- "toc_visible": true
- },
- "kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
diff --git a/01_open_cv_DONE.ipynb b/01_open_cv_DONE.ipynb
new file mode 100644
index 0000000..14eeb53
--- /dev/null
+++ b/01_open_cv_DONE.ipynb
@@ -0,0 +1,884 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "name": "01-open_cv_DONE.ipynb",
+ "private_outputs": true,
+ "provenance": [],
+ "collapsed_sections": [
+ "copyright",
+ "exercise-1-key-1"
+ ],
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github"
+ },
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "copyright"
+ },
+ "source": [
+ "#### Copyright 2020 Google LLC."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "7PLP9Q30PKtv"
+ },
+ "source": [
+ "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "f5W9rkuBmBu9"
+ },
+ "source": [
+ "# OpenCV"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "zIykBQbYXrXA"
+ },
+ "source": [
+ "[OpenCV](https://opencv.org/) is an open-source computer vision library. It comes packaged with many powerful computer vision tools, including image and video processing utilities. The library has a lot of the same functionality as the [Python Image Library (PIL)](https://python-pillow.org/) but also includes some computer vision support that PIL doesn't include.\n",
+ "\n",
+ "In this lesson we will learn how to use OpenCV to process images."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "G8u2lYRWbE37"
+ },
+ "source": [
+ "## Load an Image"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "RRFdZwtwLKKV"
+ },
+ "source": [
+ "Start by downloading a small (640x360) version of [this image of a car](https://pixabay.com/illustrations/car-sports-car-racing-car-speed-49278/) from Pixabay and then uploading it to this Colab.\n",
+ "\n",
+ "**Be sure to load the small 640x360 version of the image for this lab.**\n",
+ "\n",
+ "After loading the image, we can use matplotlib to view the image."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "RmoZ6R9bKnEH"
+ },
+ "source": [
+ "import cv2 as cv\n",
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "image_file = 'car-49278_640.jpg'\n",
+ "\n",
+ "image = cv.imread(image_file)\n",
+ "\n",
+ "plt.imshow(image)\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "GVzr4XfQLO23"
+ },
+ "source": [
+ "### Color Ordering"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "bQTzqEf1K3Tv"
+ },
+ "source": [
+ "Does something look off? Wasn't the car red when we downloaded the image?\n",
+ "\n",
+ "OpenCV assumes the image is stored with blue-green-red (BGR) encoding instead of [red-green-blue (RGB)](https://en.wikipedia.org/wiki/RGB_color_model), but matplotlib assumes RGB. So, the reds and blues in the image are inverted when displayed.\n",
+ "\n",
+ "Why does OpenCV assume images are BGR?\n",
+ "\n",
+ "BGR was historically a popular storage format used by digital camera manufacturers and many software packages. At the time it was a good choice for a default. Defaults are difficult to change, so BGR is here to stay in OpenCV.\n",
+ "\n",
+ "It doesn't really matter which format is used as long as the inputs to our model are consistent. However, it can be annoying to look at images with inverted colors. You just need to know how to tell OpenCV to fix it.\n",
+ "\n",
+ "Luckily it is easy to change from BGR to RGB. We can just use `cvtColor`. There are [scores of conversions](https://docs.opencv.org/3.1.0/d7/d1b/group__imgproc__misc.html) possible."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "OrxvjU6gK7Fy"
+ },
+ "source": [
+ "image = cv.cvtColor(image, cv.COLOR_BGR2RGB)\n",
+ "\n",
+ "plt.imshow(image)\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "nYyEWvL8Lf8H"
+ },
+ "source": [
+ "## Drawing on Images"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yMGhqtBvLilk"
+ },
+ "source": [
+ "### Drawing Rectangles on Images\n",
+ "\n",
+ "Suppose we want to draw a rectangle around objects we identify in an image. This can be done with the OpenCV `rectangle` method."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "mQxOqUiSLhoN"
+ },
+ "source": [
+ "left = 100\n",
+ "right = 580\n",
+ "top = 100\n",
+ "bottom = 300\n",
+ "\n",
+ "r = 255\n",
+ "g = 0\n",
+ "b = 0\n",
+ "\n",
+ "cv.rectangle(image, (left, top), (right, bottom), (r, g, b), thickness=3)\n",
+ "plt.imshow(image)\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-eyAAN5ZLqcX"
+ },
+ "source": [
+ "### Drawing Text on Images\n",
+ "\n",
+ "You can also draw text on images using `putText`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ytbjF3IxLuzl"
+ },
+ "source": [
+ "left = 150\n",
+ "top = 50\n",
+ "\n",
+ "r = 0\n",
+ "g = 0\n",
+ "b = 0\n",
+ "scale = 1.0\n",
+ "thickness = 2\n",
+ "\n",
+ "cv.putText(image, \"It is a car!\", (left, top), cv.FONT_HERSHEY_SIMPLEX, scale,\n",
+ " [r, g, b], thickness)\n",
+ "\n",
+ "plt.imshow(image)\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "MkLGTqteLEHC"
+ },
+ "source": [
+ "## Image Scaling"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "NW-RlIP9LHfx"
+ },
+ "source": [
+ "Models are trained with images scaled to a specific size and are sensitive to the input size being consistent. One solution is to simply scale the image to the required size using the `resize` method.\n",
+ " \n",
+ "In the example below, we scale the image to `300x300` pixels. This creates a pretty distorted image, which might affect the training and predictions made by the model."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "KzsGZK3mLW5t"
+ },
+ "source": [
+ "image_scaled = cv.resize(image, (300, 300))\n",
+ "\n",
+ "plt.imshow(image_scaled)\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mBKsIs4ykMtR"
+ },
+ "source": [
+ "## Cropping With Edge Detection"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mGr77pUvkROt"
+ },
+ "source": [
+ "Another strategy is to crop the image using \"edge detection\", then scale the image after you have cropped it down. This strategy can be error-prone, but it can also be really helpful in isolating individual objects in an image.\n",
+ "\n",
+ "In the case of the car image that we have loaded, cropping based on edge detection is both simple and effective. In images with more noise in the background, automatic cropping will be much more difficult."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "DbHQuRbjk29f"
+ },
+ "source": [
+ "To begin cropping, we'll rely on OpenCV's [Canny](https://docs.opencv.org/2.4/modules/imgproc/doc/feature_detection.html?highlight=canny#canny) detection algorithm."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "LlNs5Wb-ZOz3"
+ },
+ "source": [
+ "threshold = 200\n",
+ "image = cv.imread(image_file)\n",
+ "image = cv.cvtColor(image, cv.COLOR_BGR2RGB)\n",
+ "edges = cv.Canny(image, threshold, threshold*2)\n",
+ "\n",
+ "fig, (orig, edge) = plt.subplots(2)\n",
+ "orig.imshow(image, cmap='gray')\n",
+ "edge.imshow(edges, cmap='gray')\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2L5zF8EomT7v"
+ },
+ "source": [
+ "The `threshold` parameter is a tuning value set to the images you are processing. More details can be found on [Canny's Wikipedia page](https://en.wikipedia.org/wiki/Canny_edge_detector).\n",
+ "\n",
+ "Let's see a few different thresholds in action."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "8TQu_1D8mTiE"
+ },
+ "source": [
+ "fig, (orig, t1, t50, t100, t200, t300, t500) = plt.subplots(7, figsize=(5, 25))\n",
+ "\n",
+ "orig.imshow(image)\n",
+ "t1.imshow(cv.Canny(image, 10, 10*2), cmap='gray')\n",
+ "t50.imshow(cv.Canny(image, 50, 50*2), cmap='gray')\n",
+ "t100.imshow(cv.Canny(image, 100, 100*2), cmap='gray')\n",
+ "t200.imshow(cv.Canny(image, 200, 200*2), cmap='gray')\n",
+ "t300.imshow(cv.Canny(image, 300, 300*2), cmap='gray')\n",
+ "t500.imshow(cv.Canny(image, 500, 500*2), cmap='gray')\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ayTn7dXyoa_J"
+ },
+ "source": [
+ "None of these settings do too badly, though a threshold of 10 has a lot of noise, and a threshold of 500 barely outlines the car. We have to remember that our goal is to build a bounding box around the car and crop on that bounding box."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "a8zzsxWko1G0"
+ },
+ "source": [
+ "Another consideration is that the edge detection algorithm is often more effective if the image is grayscale and if there is some blurring.\n",
+ " \n",
+ "First let's convert the image to grayscale."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "IZSbCNDXaGSV"
+ },
+ "source": [
+ "img_gray = cv.cvtColor(image, cv.COLOR_RGB2GRAY)\n",
+ "_ = plt.imshow(img_gray, cmap='gray')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "MHRWyjhYpN3P"
+ },
+ "source": [
+ "And now we'll blur the image a bit."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "LIUmsFFrpL-1"
+ },
+ "source": [
+ "img_gray = cv.blur(img_gray, (3,3))\n",
+ "_ = plt.imshow(img_gray, cmap='gray')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-anEZ9NopW37"
+ },
+ "source": [
+ "Given this new grayscale and blurred image, we can run the edge detection algorithm again."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "tbyr6n9Opcrt"
+ },
+ "source": [
+ "fig, (orig, t1, t50, t100, t200, t300, t500) = plt.subplots(7, figsize=(5, 25))\n",
+ "\n",
+ "orig.imshow(img_gray, cmap='gray')\n",
+ "t1.imshow(cv.Canny(img_gray, 10, 10*2), cmap='gray')\n",
+ "t50.imshow(cv.Canny(img_gray, 50, 50*2), cmap='gray')\n",
+ "t100.imshow(cv.Canny(img_gray, 100, 100*2), cmap='gray')\n",
+ "t200.imshow(cv.Canny(img_gray, 200, 200*2), cmap='gray')\n",
+ "t300.imshow(cv.Canny(img_gray, 300, 300*2), cmap='gray')\n",
+ "t500.imshow(cv.Canny(img_gray, 500, 500*2), cmap='gray')\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "DkIIp4JZpnlz"
+ },
+ "source": [
+ "In this case our edges completely disappear at higher thresholds!\n",
+ "\n",
+ "The threshold of 200 seemed to perform reasonably well in both situations, so let's stick with that."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "P4KI438bp7Dv"
+ },
+ "source": [
+ "img_canny = cv.Canny(img_gray, 200, 200*2)\n",
+ "\n",
+ "plt.imshow(img_canny, cmap='gray')\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "grY78Cwwp-Qk"
+ },
+ "source": [
+ "We now need to find the bounding box around the item in the image that we want to crop. The first step in doing this is to utilize the [findContours](https://docs.opencv.org/3.4/d3/dc0/group__imgproc__shape.html#ga17ed9f5d79ae97bd4c7cf18403e1689a) function. This function returns a list of contours found in the output of the Canny algorithm. The contours are defined by lists of $(x, y)$ values."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Thhz47TwrFow"
+ },
+ "source": [
+ "contours, _ = cv.findContours(img_canny, cv.RETR_TREE, cv.CHAIN_APPROX_SIMPLE)\n",
+ "\n",
+ "print(len(contours))\n",
+ "print(contours[0])"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "L41cRvQHr5IA"
+ },
+ "source": [
+ "Given the contours, we can approximate the polygon that the contour forms and then create a bounding box around each contour."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "dSBmgd_fbHJn"
+ },
+ "source": [
+ "bounding_boxes = []\n",
+ "contours_poly = []\n",
+ "\n",
+ "for contour in contours:\n",
+ " polygon = cv.approxPolyDP(contour, 3, True)\n",
+ " contours_poly.append(polygon)\n",
+ " bounding_boxes.append(cv.boundingRect(polygon))\n",
+ "\n",
+ "print(len(contours_poly))\n",
+ "print(len(bounding_boxes))\n",
+ "print(bounding_boxes)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "EQ1NJtuTUS9g"
+ },
+ "source": [
+ "Let's take a look at all of the bounding boxes on the car."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "GEDykg6JUABc"
+ },
+ "source": [
+ "import numpy as np\n",
+ "\n",
+ "image_copy = np.copy(image)\n",
+ "\n",
+ "#x, y, width, height = largest_box\n",
+ "for box in bounding_boxes:\n",
+ " cv.rectangle(image_copy, (box[0], box[1]), (box[0]+box[2], box[1]+box[3]), [0, 0, 255], 2)\n",
+ "\n",
+ "_ = plt.imshow(image_copy)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "AHBNnusgUb2X"
+ },
+ "source": [
+ "No single box seems to capture the entire car, but we can use the outer boundaries to find a unified box.\n",
+ "\n",
+ "We'll use a very simple algorithm that simply finds the outer boundaries and doesn't care if the boxes overlap. In practice you'd likely want to use a more sophisticated algorithm."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "xO_Tj80rUrbI"
+ },
+ "source": [
+ "x1, y1, x2, y2 = 640, 640, 0, 0\n",
+ "\n",
+ "for box in bounding_boxes:\n",
+ " if box[0] < x1:\n",
+ " x1 = box[0]\n",
+ " if box[1] < y1:\n",
+ " y1 = box[1]\n",
+ " if box[0] + box[2] > x2:\n",
+ " x2 = box[0] + box[2]\n",
+ " if box[1] + box[3] > y2:\n",
+ " y2 = box[1] + box[3]\n",
+ "\n",
+ "x1, y1, x2, y2"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "vUB3uIE5VWtm"
+ },
+ "source": [
+ "And then we can draw the box."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "rS2tJnyjVYUh"
+ },
+ "source": [
+ "import numpy as np\n",
+ "\n",
+ "image_copy = np.copy(image)\n",
+ "\n",
+ "cv.rectangle(image_copy, (x1, y1), (x2, y2), [255, 10, 10], 3)\n",
+ "\n",
+ "_ = plt.imshow(image_copy)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "_DD8Kbvo2BWD"
+ },
+ "source": [
+ "The box does clip the car a bit, but for the most part, the car is within the box."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "AwKwraUa2GT3"
+ },
+ "source": [
+ "Now we need to crop the image to just the car itself.\n",
+ "\n",
+ "Notice that we pair the `x` coordinate with `height` and the `y` with `width`. This is because we want all of the rows for a given height and the columns for a given width."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "6aM1Q8cid_Xv"
+ },
+ "source": [
+ "# x, y, width, height = largest_box\n",
+ "cropped_img = image[y1:y2, x1:x2]\n",
+ "_ = plt.imshow(cropped_img)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "bYccBq4k2bQv"
+ },
+ "source": [
+ "Now we need to make the image into a square by padding the image. We find the longest side and then pad the shorter side with the necessary pixels to make the image a square.\n",
+ "\n",
+ "To add the padding we use OpenCV's `copyMakeBorder` function."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "tygNEuGXe2Aj"
+ },
+ "source": [
+ "height = cropped_img.shape[0]\n",
+ "width = cropped_img.shape[1]\n",
+ "\n",
+ "left_pad, right_pad, top_pad, bottom_pad = 0, 0, 0, 0\n",
+ "if height > width:\n",
+ " left_pad = int((height-width) / 2)\n",
+ " right_pad = height-width-left_pad\n",
+ "elif width > height:\n",
+ " top_pad = int((width-height) / 2)\n",
+ " bottom_pad = width-height-top_pad\n",
+ "\n",
+ "img_square = cv.copyMakeBorder(\n",
+ " cropped_img,\n",
+ " top_pad,\n",
+ " bottom_pad,\n",
+ " left_pad,\n",
+ " right_pad,\n",
+ " cv.BORDER_CONSTANT,\n",
+ " value=(255,255,255))\n",
+ "\n",
+ "_ = plt.imshow(img_square)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-XW_qEgu23z9"
+ },
+ "source": [
+ "And finally, we can scale the image down to a 300x300 image to feed to our model using OpenCV's `resize` function again."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Im_q3Xf3ggP2"
+ },
+ "source": [
+ "image_scaled = cv.resize(img_square, (300, 300))\n",
+ "\n",
+ "plt.imshow(image_scaled)\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Xo1wnMSl3BuN"
+ },
+ "source": [
+ "## Rotating Images"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "MRANxOVN3Hgn"
+ },
+ "source": [
+ "It is sometimes useful to rotate images before feeding them to your model. This increases the size of your training data, and it makes your model more resilient to subtle patterns that might exist within your base images.\n",
+ " \n",
+ "For example, in a popular fashion image dataset, most boots are pointed in one direction and sandals in the other. When the model attempts to identify a boot pointed in the wrong direction, it will often predict 'sandal' based purely on the orientation of the object.\n",
+ " \n",
+ "To flip an image on the horizontal or vertical axis, we can just use the `flip` function.\n",
+ " \n",
+ "Here is an example of flipping an image on the horizontal axis."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "1x9PemhtgrRv"
+ },
+ "source": [
+ "horizontal_img = cv.flip(image_scaled, 0)\n",
+ "plt.imshow(horizontal_img)\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "A7IbPf3l4tbt"
+ },
+ "source": [
+ "And now the vertical axis."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ORzVDhHfgyEy"
+ },
+ "source": [
+ "vertical_img = cv.flip(image_scaled, 1)\n",
+ "plt.imshow(vertical_img)\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HfXe8OOm4vPZ"
+ },
+ "source": [
+ "And finally, both."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "hhhdEiutg250"
+ },
+ "source": [
+ "horizontal_and_vertical_img = cv.flip(image_scaled, -1)\n",
+ "plt.imshow(horizontal_and_vertical_img)\n",
+ "plt.show()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5WFLU7hWbO8_"
+ },
+ "source": [
+ "# Resources\n",
+ "\n",
+ "* [OpenCV Documentation on Edge Detection](https://docs.opencv.org/3.4/da/d0c/tutorial_bounding_rects_circles.html)\n",
+ "* Canny Edge Detector: [Wikipedia](https://en.wikipedia.org/wiki/Canny_edge_detector), [OpenCV Documentation](https://docs.opencv.org/3.1.0/da/d22/tutorial_py_canny.html)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Swt2fxm-fG_B"
+ },
+ "source": [
+ "# Exercises"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "vaiwWLygsq5M"
+ },
+ "source": [
+ "## Exercise 1"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "iWq38ASlb2aY"
+ },
+ "source": [
+ "We have seen how to rotate an image on its horizontal and vertical axes. This technique works well for increasing the size of your training set and the capabilities of your model, while also providing resiliency to biases that might be hidden in your data.\n",
+ "\n",
+ "It is also possible to rotate an image by different angles.\n",
+ "\n",
+ "Use OpenCV to take our `image_scaled` image from above and rotate it so that the car is angled at 45 degrees. Do this for every corner of the squared image.\n",
+ " \n",
+ "There should be eight images in total. The order of the images isn't important, but the variety is. There should be one image for each case below:\n",
+ " \n",
+ "1. Car pointed to the top-left corner of the image\n",
+ "1. Upside-down car pointed to the top-left corner of the image\n",
+ "1. Car pointed to the top-right corner of the image\n",
+ "1. Upside-down car pointed to the top-right corner of the image\n",
+ "1. Car pointed to the bottom-left corner of the image\n",
+ "1. Upside-down car pointed to the bottom-left corner of the image\n",
+ "1. Car pointed to the bottom-right corner of the image\n",
+ "1. Upside-down car pointed to the bottom-right corner of the image\n",
+ "\n",
+ "Display the images using `matplotlib.pyplot`.\n",
+ " \n",
+ "Hint: Check out the `getRotationMatrix2D` and `warpAffine` methods.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "CYZEXNK1VDIJ"
+ },
+ "source": [
+ "### **Student Solution**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "TI_WxOyjcfNu"
+ },
+ "source": [
+ "# Your answer goes here\n",
+ "center_image = tuple(np.array(image_scaled.shape[1::-1]) / 2)\n",
+ "\n",
+ "angle = [135, 45, 225, 315]\n",
+ "\n",
+ "\n",
+ "for i in range(len(angle)):\n",
+ " #create transpose matrix\n",
+ " matrix = cv.getRotationMatrix2D(center_image, angle[i], 1)\n",
+ " #pass tp warpAffine\n",
+ " warp_1 = cv.warpAffine(image_scaled, matrix, image_scaled.shape[1::-1], flags=cv.INTER_LINEAR)\n",
+ " warp_2 = cv.warpAffine(horizontal_img, matrix, image_scaled.shape[1::-1], flags=cv.INTER_LINEAR)\n",
+ "\n",
+ " plt.show()\n",
+ " plt.imshow(warp_1)\n",
+ " plt.show()\n",
+ " plt.imshow(warp_2)\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "4lI64FdEeP1-"
+ },
+ "source": [
+ "---"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/02-video_processing.ipynb b/02-video_processing.ipynb
deleted file mode 100644
index cdcfae3..0000000
--- a/02-video_processing.ipynb
+++ /dev/null
@@ -1,503 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "view-in-github"
- },
- "source": [
- "
"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "copyright"
- },
- "source": [
- "#### Copyright 2020 Google LLC."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "khlO4Bu21oZ4"
- },
- "outputs": [],
- "source": [
- "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
- "# you may not use this file except in compliance with the License.\n",
- "# You may obtain a copy of the License at\n",
- "#\n",
- "# https://www.apache.org/licenses/LICENSE-2.0\n",
- "#\n",
- "# Unless required by applicable law or agreed to in writing, software\n",
- "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
- "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
- "# See the License for the specific language governing permissions and\n",
- "# limitations under the License."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "AlzIlBsScJJ_"
- },
- "source": [
- "# Video Processing"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "nTirVS4FWaPx"
- },
- "source": [
- "In this lesson we will process video data using the [OpenCV](https://opencv.org/) Python library."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "QmPyT9q4fEyp"
- },
- "source": [
- "## Obtain a Video"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "PmhKfT5OfIET"
- },
- "source": [
- "Let's start by uploading the smallest version of [this video](https://pixabay.com/videos/cars-motorway-speed-motion-traffic-1900/) to the Colab. Rename the video to `cars.mp4` or change the name of the video in the code below."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "K0Z3-prQMBph"
- },
- "source": [
- "## Reading the Video\n",
- "\n",
- "OpenCV is an open source library for performing computer vision tasks. One of these tasks is reading and writing video frames. To read the `cars.mp4` video file, we use the [VideoCapture](https://docs.opencv.org/2.4/modules/highgui/doc/reading_and_writing_images_and_video.html#videocapture) class."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "xwwkoH0WMArG"
- },
- "outputs": [],
- "source": [
- "import cv2 as cv\n",
- "\n",
- "cars_video = cv.VideoCapture('cars.mp4')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "jzxNrZmigdz5"
- },
- "source": [
- "Once you have created a `VideoCapture` object, you can obtain information about the video that you are processing."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "vHhBlRVFgiBu"
- },
- "outputs": [],
- "source": [
- "height = int(cars_video.get(cv.CAP_PROP_FRAME_HEIGHT))\n",
- "width = int(cars_video.get(cv.CAP_PROP_FRAME_WIDTH))\n",
- "fps = cars_video.get(cv.CAP_PROP_FPS)\n",
- "total_frames = int(cars_video.get(cv.CAP_PROP_FRAME_COUNT))\n",
- "\n",
- "print(f'height: {height}')\n",
- "print(f'width: {width}')\n",
- "print(f'frames per second: {fps}')\n",
- "print(f'total frames: {total_frames}')\n",
- "print(f'video length (seconds): {total_frames / fps}')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "yUQ5ZW6OhHgM"
- },
- "source": [
- "When you are done processing a video file, it is a good idea to release the VideoCapture to free up memory in your program."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "o_Ubw7Wlgk52"
- },
- "outputs": [],
- "source": [
- "cars_video.release()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "0pRQOHZ9nxIE"
- },
- "source": [
- "We can now loop through the video frame by frame. To do this we need to know the total number of frames in the video. For each frame we set the current frame position and then read that frame. This causes the frame to be loaded from disk into memory. This is done because videos can be enormous in size, so we don't necessarily want the entire thing in memory.\n",
- " \n",
- "You might also notice that we read the frame from the car's video, and then we check the return value to make sure that the read was successful. This is because the underlying video processing library is written in the C++ programming language, and a common practice in that language is to return a status code indicating if a function succeeds or not. This isn't very idiomatic in Python; it is just the underlying library's style leaking through into the Python wrapper."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "c7j0yculnH4-"
- },
- "outputs": [],
- "source": [
- "cars_video = cv.VideoCapture('cars.mp4')\n",
- "\n",
- "total_frames = int(cars_video.get(cv.CAP_PROP_FRAME_COUNT))\n",
- "\n",
- "frames_read = 0\n",
- "\n",
- "for current_frame in range(0, total_frames):\n",
- " cars_video.set(cv.CAP_PROP_POS_FRAMES, current_frame)\n",
- " ret, _ = cars_video.read()\n",
- " if not ret:\n",
- " raise Exception(f'Problem reading frame {current_frame} from video')\n",
- " if (current_frame+1) % 50 == 0:\n",
- " print(f'Read {current_frame+1} frames so far')\n",
- "\n",
- "cars_video.release()\n",
- "\n",
- "print(f'Read {total_frames} frames')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "2tNLbfYtn74T"
- },
- "source": [
- "That code took a while to execute. The video is just over a minute long, and it takes a while to iterate over every frame. Consider the amount of time it would take to perform object recognition on each frame.\n",
- "\n",
- "In practice you will be doing this kind of processing on a much bigger machine, or machines, than Colab provides for free. You can also process many frames in parallel.\n",
- "\n",
- "For our purposes, let's just make the video shorter."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "47T_WTF0i_Fd"
- },
- "source": [
- "We'll load the video one more time, and then we'll read out a single frame to illustrate that the frame is just an image."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "xT7_5wLWi6nW"
- },
- "outputs": [],
- "source": [
- "import matplotlib.pyplot as plt\n",
- "\n",
- "cars_video = cv.VideoCapture('cars.mp4')\n",
- "cars_video.set(cv.CAP_PROP_POS_FRAMES, 123)\n",
- "ret, frame = cars_video.read()\n",
- "if not ret:\n",
- " raise Exception(f'Problem reading frame {current_frame} from video')\n",
- "\n",
- "cars_video.release()\n",
- "\n",
- "plt.imshow(frame)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "muBJqjcCoiIi"
- },
- "source": [
- "## Writing a Video\n",
- "\n",
- "OpenCV also supports writing video data. Let's loop through the long video that we have and save only one second of it into a new file."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "OZ_WjoBiknW_"
- },
- "source": [
- "First we need to open our input video and get information about the frame rate, height, and width."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "HAp-Xom7kuV6"
- },
- "outputs": [],
- "source": [
- "input_video = cv.VideoCapture('cars.mp4')\n",
- "\n",
- "height = int(input_video.get(cv.CAP_PROP_FRAME_HEIGHT))\n",
- "width = int(input_video.get(cv.CAP_PROP_FRAME_WIDTH))\n",
- "fps = input_video.get(cv.CAP_PROP_FPS)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "xL7bq84AkwVR"
- },
- "source": [
- "Using that information we can create a [VideoWriter](https://docs.opencv.org/2.4/modules/highgui/doc/reading_and_writing_images_and_video.html#videowriter) that we'll use to write the shorter video.\n",
- "\n",
- "Video can be encoded using many different formats. In order to tell OpenCV which format to use, we choose a \"four character code\" from [fourcc](https://www.fourcc.org/). In this case we use \"mp4v\" to keep our input and output files consistent."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "6BrWL2cJovEU"
- },
- "outputs": [],
- "source": [
- "fourcc = cv.VideoWriter_fourcc(*'mp4v')\n",
- "output_video = cv.VideoWriter('cars-short.mp4', fourcc, fps, (width, height))"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "z87NtCMOnMnL"
- },
- "source": [
- "Now we can loop through one second of video frames and write each frame to our output video."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "Ns_6ESqXyWSS"
- },
- "outputs": [],
- "source": [
- "for i in range(0, int(fps)):\n",
- " input_video.set(cv.CAP_PROP_POS_FRAMES, i)\n",
- " ret, frame = input_video.read()\n",
- " if not ret:\n",
- " raise Exception(\"Problem reading frame\", i, \" from video\")\n",
- " output_video.write(frame)"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "gN-eK_4knWWh"
- },
- "source": [
- "Once processing is complete, be sure to release the video objects from memory."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "IblYT5rUnaQ4"
- },
- "outputs": [],
- "source": [
- "input_video.release()\n",
- "output_video.release()"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "vRrkuRxrydtn"
- },
- "source": [
- "And now we can list the directory to see if our new file was created."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "yABTY1HoPcpC"
- },
- "outputs": [],
- "source": [
- "import os\n",
- "\n",
- "os.listdir('./')"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "cJiP6TUxqMM4"
- },
- "source": [
- "You should now see a `cars-short.mp4` file in your file browser in Colab. Download and view the video to make sure that it only lasts for a second."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "qc5FP_s2nqQg"
- },
- "source": [
- "Notice we have only concerned ourselves with the visual portion of the video. Videos contain both visual and auditory elements. OpenCV is only concerned with computer vision, so it doesn't handle audio processing."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "YTVUYxPwcHhp"
- },
- "source": [
- "# Exercises"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "LdIOgOHP1ces"
- },
- "source": [
- "## Exercise 1"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "jhTEOK1ZmqN8"
- },
- "source": [
- "Above we shortened our video to 1 second by simply grabbing the first second of frames from the video file. Since not much typically changes from frame to frame within a second of video, a better video processing technique is to sample frames throughout the entire video and skip some frames. For example, it might be more beneficial to process every 10th frame or only process 1 of the frames in every second of video.\n",
- "\n",
- "In this exercise, take the original cars video used in this Colab and reduce it to a short 25-fps (frames per second) video by grabbing the first frame of every second of video. Save the video as `cars-sampled.mp4`."
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "7XM35vYWSbim"
- },
- "source": [
- "### **Student Solution**"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {
- "colab": {},
- "colab_type": "code",
- "id": "ivTzfzQN5jDk"
- },
- "outputs": [],
- "source": [
- "# Your code goes here"
- ]
- },
- {
- "cell_type": "markdown",
- "metadata": {
- "colab_type": "text",
- "id": "ictdRRePTvA8"
- },
- "source": [
- "---"
- ]
- }
- ],
- "metadata": {
- "colab": {
- "collapsed_sections": [
- "copyright",
- "H3voR3OOxDv1"
- ],
- "include_colab_link": true,
- "name": "Video Processing",
- "private_outputs": true,
- "provenance": [],
- "toc_visible": true
- },
- "kernelspec": {
- "display_name": "Python 3",
- "name": "python3"
- }
- },
- "nbformat": 4,
- "nbformat_minor": 0
-}
diff --git a/02_video_processing_DONE.ipynb b/02_video_processing_DONE.ipynb
new file mode 100644
index 0000000..4808ee7
--- /dev/null
+++ b/02_video_processing_DONE.ipynb
@@ -0,0 +1,490 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "name": "02-video_processing_DONE.ipynb",
+ "private_outputs": true,
+ "provenance": [],
+ "collapsed_sections": [
+ "copyright",
+ "H3voR3OOxDv1"
+ ]
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "name": "python3"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github"
+ },
+ "source": [
+ "
"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "copyright"
+ },
+ "source": [
+ "#### Copyright 2020 Google LLC."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "khlO4Bu21oZ4"
+ },
+ "source": [
+ "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+ "# you may not use this file except in compliance with the License.\n",
+ "# You may obtain a copy of the License at\n",
+ "#\n",
+ "# https://www.apache.org/licenses/LICENSE-2.0\n",
+ "#\n",
+ "# Unless required by applicable law or agreed to in writing, software\n",
+ "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+ "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+ "# See the License for the specific language governing permissions and\n",
+ "# limitations under the License."
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "AlzIlBsScJJ_"
+ },
+ "source": [
+ "# Video Processing"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "nTirVS4FWaPx"
+ },
+ "source": [
+ "In this lesson we will process video data using the [OpenCV](https://opencv.org/) Python library."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "QmPyT9q4fEyp"
+ },
+ "source": [
+ "## Obtain a Video"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "PmhKfT5OfIET"
+ },
+ "source": [
+ "Let's start by uploading the smallest version of [this video](https://pixabay.com/videos/cars-motorway-speed-motion-traffic-1900/) to the Colab. Rename the video to `cars.mp4` or change the name of the video in the code below."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "K0Z3-prQMBph"
+ },
+ "source": [
+ "## Reading the Video\n",
+ "\n",
+ "OpenCV is an open source library for performing computer vision tasks. One of these tasks is reading and writing video frames. To read the `cars.mp4` video file, we use the [VideoCapture](https://docs.opencv.org/2.4/modules/highgui/doc/reading_and_writing_images_and_video.html#videocapture) class."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "5M6MBwCTCsLB"
+ },
+ "source": [
+ "from google.colab import files\n",
+ "\n",
+ "uploaded = files.upload()\n",
+ "\n",
+ "for fn in uploaded.keys():\n",
+ " print('User uploaded file \"{name}\" with length {length} bytes'.format(\n",
+ " name=fn, length=len(uploaded[fn])))"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "xwwkoH0WMArG"
+ },
+ "source": [
+ "import cv2 as cv\n",
+ "\n",
+ "cars_video = cv.VideoCapture('cars.mp4')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jzxNrZmigdz5"
+ },
+ "source": [
+ "Once you have created a `VideoCapture` object, you can obtain information about the video that you are processing."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "vHhBlRVFgiBu"
+ },
+ "source": [
+ "height = int(cars_video.get(cv.CAP_PROP_FRAME_HEIGHT))\n",
+ "width = int(cars_video.get(cv.CAP_PROP_FRAME_WIDTH))\n",
+ "fps = cars_video.get(cv.CAP_PROP_FPS)\n",
+ "total_frames = int(cars_video.get(cv.CAP_PROP_FRAME_COUNT))\n",
+ "\n",
+ "print(f'height: {height}')\n",
+ "print(f'width: {width}')\n",
+ "print(f'frames per second: {fps}')\n",
+ "print(f'total frames: {total_frames}')\n",
+ "print(f'video length (seconds): {total_frames / fps}')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "yUQ5ZW6OhHgM"
+ },
+ "source": [
+ "When you are done processing a video file, it is a good idea to release the VideoCapture to free up memory in your program."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "o_Ubw7Wlgk52"
+ },
+ "source": [
+ "cars_video.release()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "0pRQOHZ9nxIE"
+ },
+ "source": [
+ "We can now loop through the video frame by frame. To do this we need to know the total number of frames in the video. For each frame we set the current frame position and then read that frame. This causes the frame to be loaded from disk into memory. This is done because videos can be enormous in size, so we don't necessarily want the entire thing in memory.\n",
+ " \n",
+ "You might also notice that we read the frame from the car's video, and then we check the return value to make sure that the read was successful. This is because the underlying video processing library is written in the C++ programming language, and a common practice in that language is to return a status code indicating if a function succeeds or not. This isn't very idiomatic in Python; it is just the underlying library's style leaking through into the Python wrapper."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "c7j0yculnH4-"
+ },
+ "source": [
+ "cars_video = cv.VideoCapture('cars.mp4')\n",
+ "\n",
+ "total_frames = int(cars_video.get(cv.CAP_PROP_FRAME_COUNT))\n",
+ "\n",
+ "frames_read = 0\n",
+ "\n",
+ "for current_frame in range(0, total_frames):\n",
+ " cars_video.set(cv.CAP_PROP_POS_FRAMES, current_frame)\n",
+ " ret, _ = cars_video.read()\n",
+ " if not ret:\n",
+ " raise Exception(f'Problem reading frame {current_frame} from video')\n",
+ " if (current_frame+1) % 50 == 0:\n",
+ " print(f'Read {current_frame+1} frames so far')\n",
+ "\n",
+ "cars_video.release()\n",
+ "\n",
+ "print(f'Read {total_frames} frames')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2tNLbfYtn74T"
+ },
+ "source": [
+ "That code took a while to execute. The video is just over a minute long, and it takes a while to iterate over every frame. Consider the amount of time it would take to perform object recognition on each frame.\n",
+ "\n",
+ "In practice you will be doing this kind of processing on a much bigger machine, or machines, than Colab provides for free. You can also process many frames in parallel.\n",
+ "\n",
+ "For our purposes, let's just make the video shorter."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "47T_WTF0i_Fd"
+ },
+ "source": [
+ "We'll load the video one more time, and then we'll read out a single frame to illustrate that the frame is just an image."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "xT7_5wLWi6nW"
+ },
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "\n",
+ "cars_video = cv.VideoCapture('cars.mp4')\n",
+ "cars_video.set(cv.CAP_PROP_POS_FRAMES, 123)\n",
+ "ret, frame = cars_video.read()\n",
+ "if not ret:\n",
+ " raise Exception(f'Problem reading frame {current_frame} from video')\n",
+ "\n",
+ "cars_video.release()\n",
+ "\n",
+ "plt.imshow(frame)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "muBJqjcCoiIi"
+ },
+ "source": [
+ "## Writing a Video\n",
+ "\n",
+ "OpenCV also supports writing video data. Let's loop through the long video that we have and save only one second of it into a new file."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "OZ_WjoBiknW_"
+ },
+ "source": [
+ "First we need to open our input video and get information about the frame rate, height, and width."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "HAp-Xom7kuV6"
+ },
+ "source": [
+ "input_video = cv.VideoCapture('cars.mp4')\n",
+ "\n",
+ "height = int(input_video.get(cv.CAP_PROP_FRAME_HEIGHT))\n",
+ "width = int(input_video.get(cv.CAP_PROP_FRAME_WIDTH))\n",
+ "fps = input_video.get(cv.CAP_PROP_FPS)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xL7bq84AkwVR"
+ },
+ "source": [
+ "Using that information we can create a [VideoWriter](https://docs.opencv.org/2.4/modules/highgui/doc/reading_and_writing_images_and_video.html#videowriter) that we'll use to write the shorter video.\n",
+ "\n",
+ "Video can be encoded using many different formats. In order to tell OpenCV which format to use, we choose a \"four character code\" from [fourcc](https://www.fourcc.org/). In this case we use \"mp4v\" to keep our input and output files consistent."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "6BrWL2cJovEU"
+ },
+ "source": [
+ "fourcc = cv.VideoWriter_fourcc(*'mp4v')\n",
+ "output_video = cv.VideoWriter('cars-short.mp4', fourcc, fps, (width, height))"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "z87NtCMOnMnL"
+ },
+ "source": [
+ "Now we can loop through one second of video frames and write each frame to our output video."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Ns_6ESqXyWSS"
+ },
+ "source": [
+ "for i in range(0, int(fps)):\n",
+ " input_video.set(cv.CAP_PROP_POS_FRAMES, i)\n",
+ " ret, frame = input_video.read()\n",
+ " if not ret:\n",
+ " raise Exception(\"Problem reading frame\", i, \" from video\")\n",
+ " output_video.write(frame)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "gN-eK_4knWWh"
+ },
+ "source": [
+ "Once processing is complete, be sure to release the video objects from memory."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "IblYT5rUnaQ4"
+ },
+ "source": [
+ "input_video.release()\n",
+ "output_video.release()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "vRrkuRxrydtn"
+ },
+ "source": [
+ "And now we can list the directory to see if our new file was created."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "yABTY1HoPcpC"
+ },
+ "source": [
+ "import os\n",
+ "\n",
+ "os.listdir('./')"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "cJiP6TUxqMM4"
+ },
+ "source": [
+ "You should now see a `cars-short.mp4` file in your file browser in Colab. Download and view the video to make sure that it only lasts for a second."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "qc5FP_s2nqQg"
+ },
+ "source": [
+ "Notice we have only concerned ourselves with the visual portion of the video. Videos contain both visual and auditory elements. OpenCV is only concerned with computer vision, so it doesn't handle audio processing."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YTVUYxPwcHhp"
+ },
+ "source": [
+ "# Exercises"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "LdIOgOHP1ces"
+ },
+ "source": [
+ "## Exercise 1"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jhTEOK1ZmqN8"
+ },
+ "source": [
+ "Above we shortened our video to 1 second by simply grabbing the first second of frames from the video file. Since not much typically changes from frame to frame within a second of video, a better video processing technique is to sample frames throughout the entire video and skip some frames. For example, it might be more beneficial to process every 10th frame or only process 1 of the frames in every second of video.\n",
+ "\n",
+ "In this exercise, take the original cars video used in this Colab and reduce it to a short 25-fps (frames per second) video by grabbing the first frame of every second of video. Save the video as `cars-sampled.mp4`."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7XM35vYWSbim"
+ },
+ "source": [
+ "### **Student Solution**"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ivTzfzQN5jDk"
+ },
+ "source": [
+ "#input video\n",
+ "input_video = cv.VideoCapture('cars.mp4')\n",
+ "\n",
+ "#getting height width fps\n",
+ "height = int(input_video.get(cv.CAP_PROP_FRAME_HEIGHT))\n",
+ "width = int(input_video.get(cv.CAP_PROP_FRAME_WIDTH))\n",
+ "fps = input_video.get(cv.CAP_PROP_FPS)\n",
+ "\n",
+ "#generate video file to be written\n",
+ "fourcc = cv.VideoWriter_fourcc(*'mp4v')\n",
+ "output_video = cv.VideoWriter('cars-sampled.mp4', fourcc, fps, (width, height))\n",
+ "\n",
+ "\n",
+ "for i in range(0, int(input_video.get(cv.CAP_PROP_FRAME_COUNT)), 25):\n",
+ " input_video.set(cv.CAP_PROP_POS_FRAMES, i)\n",
+ " ret, frame = input_video.read()\n",
+ " if not ret:\n",
+ " raise Exception(\"Problem reading frame\", i, \" from video\")\n",
+ " output_video.write(frame)\n",
+ "\n",
+ "input_video.release()\n",
+ "output_video.release()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ictdRRePTvA8"
+ },
+ "source": [
+ "---"
+ ]
+ }
+ ]
+}
\ No newline at end of file