Machine Learning (ML) is one of the most exciting fields in tech today. But if you’re just starting out, it might seem complicated.
Don’t worry! In this guide, I’ll show you a simple and practical workflow to train your first ML model using Python and scikit-learn.
Whether you want to predict house prices or classify images, this step-by-step process will help you build a working ML model fast.
Every ML project starts with data.
For beginners, it’s best to use public datasets.
👉 Example: The famous Boston Housing dataset contains information about houses and their prices in Boston.
from sklearn.datasets import load_boston
data = load_boston()
X = data.data
y = data.targetData is rarely clean in the real world.
Preprocessing helps the model learn better.
Let’s split the data into training and testing sets:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)💡 Tip:
Scaling features using StandardScaler can improve performance but is optional here.
Let’s start with a simple Linear Regression model to predict house prices.
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)The model has now “learned” patterns from the training data.
Let’s see how well the model performs on unseen data.
from sklearn.metrics import mean_squared_error, r2_score
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)
print(f"Mean Squared Error: {mse}")
print(f"R² Score: {r2}")🔔 Tip:
A higher R² (close to 1) means the model explains the data variance well.
Once satisfied, save the trained model using joblib:
import joblib
joblib.dump(model, 'house_price_model.pkl')
print("Model saved successfully.")This .pkl file can now be used anywhere.
Let’s load the saved model and test it on new data (we’ll reuse the test set for demonstration).
# Load the saved model
loaded_model = joblib.load('house_price_model.pkl')
# Make predictions
new_predictions = loaded_model.predict(X_test)
from sklearn.metrics import mean_squared_error, r2_score
new_mse = mean_squared_error(y_test, new_predictions)
new_r2 = r2_score(y_test, new_predictions)
print(f"Deployed Model - Mean Squared Error: {new_mse}")
print(f"Deployed Model - R² Score: {new_r2}")✅ Result:
You should see the same performance metrics as before, proving the model works after deployment.
Instead of guessing, we can automatically test multiple models and pick the best one based on performance.
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.svm import SVR
models = {
"Linear Regression": LinearRegression(),
"Decision Tree": DecisionTreeRegressor(random_state=42),
"Random Forest": RandomForestRegressor(random_state=42),
"Support Vector Regressor": SVR()
}results = []
for name, model in models.items():
model.fit(X_train, y_train)
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
r2 = r2_score(y_test, predictions)
results.append({
"Model": name,
"MSE": mse,
"R2 Score": r2
})import pandas as pd
results_df = pd.DataFrame(results)
results_df = results_df.sort_values(by="R2 Score", ascending=False)
print(results_df)This gives you a clear table showing which model performed best based on R² score.
from sklearn.model_selection import cross_val_score
for name, model in models.items():
scores = cross_val_score(model, X, y, cv=5, scoring='r2')
print(f"{name} Average R² Score: {scores.mean():.4f}")This ensures the model isn’t just lucky with one specific data split.
You just learned how to:
✔ Train an ML model
✔ Evaluate it properly
✔ Save & deploy it
✔ Test it after deployment
✔ Automatically find the best model using multiple algorithms
✅ Try other models: KNN, Gradient Boosting
✅ Tune hyperparameters for better accuracy
✅ Visualize data & predictions using Matplotlib or Seaborn
✅ Build a CLI or simple web app using Flask that uses your trained model interactively
👉 Ready to level up your Machine Learning skills?
Start experimenting with your own datasets today 🚀