+{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"provenance":[],"authorship_tag":"ABX9TyOUYrjEYAVUpeNCP4+c07lo"},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# K-means Clustering on the Iris Dataset\n","\n","This notebook demonstrates how to perform K-means clustering on the classic Iris dataset using Python, scikit-learn, and Google Colab. \n","K-means is an unsupervised machine learning algorithm that groups data into clusters based on feature similarity.\n","\n","**In this notebook, you will:**\n","- Load the Iris dataset directly from the UCI Machine Learning Repository\n","- Apply K-means clustering to group the data into clusters\n","- Visualize the resulting clusters\n","\n","No prior setup or downloads are required—simply run each cell to see the results!"],"metadata":{"id":"4gKAjgYiDMQn"}},{"cell_type":"code","source":["# Step 1: Import libraries\n","import pandas as pd\n","from sklearn.cluster import KMeans\n","import matplotlib.pyplot as plt\n","\n","# Step 2: Load the Iris dataset from UC Irvine.\n","url = \"https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data\"\n","cols = ['sepal_length', 'sepal_width', 'petal_length', 'petal_width', 'species']\n","df = pd.read_csv(url, header=None, names=cols)\n","\n","# Step 3: Prepare data (drop species column for clustering)\n","X = df.drop('species', axis=1)\n","\n","# Step 4: Run K-means clustering\n","kmeans = KMeans(n_clusters=3, random_state=42)\n","df['cluster'] = kmeans.fit_predict(X)\n","\n","# Step 5: Show results\n","print(df.head())\n","\n","# Step 6: Visualize clusters (using first two features)\n","plt.scatter(df['sepal_length'], df['sepal_width'], c=df['cluster'])\n","plt.xlabel('Sepal Length')\n","plt.ylabel('Sepal Width')\n","plt.title('K-means Clusters on Iris Dataset')\n","plt.show()\n"],"metadata":{"id":"Ca52XxqOEC78"},"execution_count":null,"outputs":[]}]}
0 commit comments