gcp_professional_machine_learning_engineer_exam_notes/bigqueryml_models_2025.md at main · maheshkr-code/gcp_professional_machine_learning_engineer_exam_notes · GitHub

30 lines (24 loc) · 5.1 KB

Category	Model Name	When to Use	Why (Technical Rationale)	Sample Use Case
Time Series	ARIMA_PLUS	You have a single metric tracked over time.	Automatically handles seasonality, holidays, and outliers. Forecast weekly sales using 6 years of seasonal CSV data to optimize inventory and staffing.	Forecasting next month's daily electricity demand. Forecast weekly sales for upcoming seasons
Time Series	ARIMA_PLUS_XREG	Time series + external factors.	Incorporates "outside" variables (e.g., weather) to improve accuracy.	Predicting sales based on history and local weather forecasts.
Diagnostics	CONTRIBUTION_ANALYSIS	Explaining the "root cause" of a change.	Sifts through segments to find exact drivers of a metric shift.	Explaining why revenue dropped 12% among mobile users.
Classification	LOGISTIC_REG	Fast, simple baseline for labels.	Highly interpretable; maps data to probabilities (0 to 1).	Determining if an email is "Spam" or "Not Spam."
Classification	BOOSTED_TREE_AS_CLASSIFIER	Structured data; need top-tier accuracy.	Uses Gradient Boosting (XGBoost) to learn from previous errors.	Identifying fraudulent bank transactions.
Classification	RANDOM_FOREST_CLASSIFIER	Noisy data; want to avoid overfitting.	Ensemble of decision trees using "majority vote" logic.	Predicting patient risk tiers based on health metrics.
Classification	DNN_CLASSIFIER	Massive datasets; non-linear patterns.	Deep Neural Networks find hidden relationships in "Big Data."	Identifying high-value customers from web logs.
Classification	WIDE_AND_DEEP_CLASSIFIER	Need both memorization and generalization.	Combines linear models (rules) with deep models (patterns).	Ranking items in a search result or app store.
Classification	AUTOML_CLASSIFIER	Highest accuracy; no manual tuning.	Automated Neural Architecture Search (NAS) finds the best model.	Mission-critical loan default prediction.
Regression	LINEAR_REG	Predicting a continuous number (price, etc).	Finds the "best fit" straight line through data points.	Estimating used car price based on mileage.
Regression	BOOSTED_TREE_REGRESSOR	Numerical prediction on tabular data.	Superior for datasets with "jumps" or non-linear steps.	Predicting delivery wait times during peak hours.
Regression	RANDOM_FOREST_REGRESSOR	Robust prediction; ignore outliers.	Averages many trees to create a stable, reliable output.	Estimating crop yield based on soil conditions.
Regression	DNN_REGRESSOR	Massive scale; complex targets.	Best for high-frequency trading or physics simulations.	Predicting precise energy output of wind farms.
Regression	WIDE_AND_DEEP_REGRESSOR	Many categories; need numerical accuracy.	Excellent for sparse data (many columns with zeros).	Predicting "likelihood to pay" scores.
Regression	AUTOML_REGRESSOR	High-performance; zero manual setup.	Automated feature engineering and model selection.	Predicting annual revenue for global enterprises.
Clustering	KMEANS	Find natural groups in unlabeled data.	Minimizes distance between points to form segments.	Grouping customers into "Budget" vs "Luxury" tiers.
Recommendation	MATRIX_FACTORIZATION	User ratings or interaction data.	Predicts user preferences via latent factor decomposition.	Recommending movies on a streaming platform.
Dim. Reduction	PCA	Too many columns; need to simplify.	Condenses high-dimensional data while keeping variance.	Reducing 1,000 genomic markers into 10 key indicators.
Dim. Reduction	AUTOENCODER	Detect anomalies or compress data.	Learns to rebuild data; failure to rebuild = anomaly.	Detecting suspicious patterns in network traffic.