Skip to content

Latest commit

 

History

History
30 lines (24 loc) · 5.1 KB

File metadata and controls

30 lines (24 loc) · 5.1 KB
image
Category Model Name When to Use Why (Technical Rationale) Sample Use Case
Time Series ARIMA_PLUS You have a single metric tracked over time. Automatically handles seasonality, holidays, and outliers. Forecast weekly sales using 6 years of seasonal CSV data to optimize inventory and staffing. Forecasting next month's daily electricity demand. Forecast weekly sales for upcoming seasons
Time Series ARIMA_PLUS_XREG Time series + external factors. Incorporates "outside" variables (e.g., weather) to improve accuracy. Predicting sales based on history and local weather forecasts.
Diagnostics CONTRIBUTION_ANALYSIS Explaining the "root cause" of a change. Sifts through segments to find exact drivers of a metric shift. Explaining why revenue dropped 12% among mobile users.
Classification LOGISTIC_REG Fast, simple baseline for labels. Highly interpretable; maps data to probabilities (0 to 1). Determining if an email is "Spam" or "Not Spam."
Classification BOOSTED_TREE_AS_CLASSIFIER Structured data; need top-tier accuracy. Uses Gradient Boosting (XGBoost) to learn from previous errors. Identifying fraudulent bank transactions.
Classification RANDOM_FOREST_CLASSIFIER Noisy data; want to avoid overfitting. Ensemble of decision trees using "majority vote" logic. Predicting patient risk tiers based on health metrics.
Classification DNN_CLASSIFIER Massive datasets; non-linear patterns. Deep Neural Networks find hidden relationships in "Big Data." Identifying high-value customers from web logs.
Classification WIDE_AND_DEEP_CLASSIFIER Need both memorization and generalization. Combines linear models (rules) with deep models (patterns). Ranking items in a search result or app store.
Classification AUTOML_CLASSIFIER Highest accuracy; no manual tuning. Automated Neural Architecture Search (NAS) finds the best model. Mission-critical loan default prediction.
Regression LINEAR_REG Predicting a continuous number (price, etc). Finds the "best fit" straight line through data points. Estimating used car price based on mileage.
Regression BOOSTED_TREE_REGRESSOR Numerical prediction on tabular data. Superior for datasets with "jumps" or non-linear steps. Predicting delivery wait times during peak hours.
Regression RANDOM_FOREST_REGRESSOR Robust prediction; ignore outliers. Averages many trees to create a stable, reliable output. Estimating crop yield based on soil conditions.
Regression DNN_REGRESSOR Massive scale; complex targets. Best for high-frequency trading or physics simulations. Predicting precise energy output of wind farms.
Regression WIDE_AND_DEEP_REGRESSOR Many categories; need numerical accuracy. Excellent for sparse data (many columns with zeros). Predicting "likelihood to pay" scores.
Regression AUTOML_REGRESSOR High-performance; zero manual setup. Automated feature engineering and model selection. Predicting annual revenue for global enterprises.
Clustering KMEANS Find natural groups in unlabeled data. Minimizes distance between points to form segments. Grouping customers into "Budget" vs "Luxury" tiers.
Recommendation MATRIX_FACTORIZATION User ratings or interaction data. Predicts user preferences via latent factor decomposition. Recommending movies on a streaming platform.
Dim. Reduction PCA Too many columns; need to simplify. Condenses high-dimensional data while keeping variance. Reducing 1,000 genomic markers into 10 key indicators.
Dim. Reduction AUTOENCODER Detect anomalies or compress data. Learns to rebuild data; failure to rebuild = anomaly. Detecting suspicious patterns in network traffic.