Added the material for XGBoost optimization#30
Conversation
|
Since the latest changes still have perf data, it cannot be approved till we get perf claim pre-requisites fulfilled. |
david-cortes-intel
left a comment
There was a problem hiding this comment.
General comment: this guide says 'xgboost', but it is limited to predictions/inference, while a similar guide could also be done for training, covering details like threading, hyperparameters to try, and similar.
razdoburdin
left a comment
There was a problem hiding this comment.
please update installation instructions and consider switching to the actual versions of the software.
…-learn, removing memory allocator section, and clarifying the scope to include all 3 methods
@rsiyer-intel Updated the doc with PDT approved data |
Done and done |
| For multiclass classification, default XGBoost, LightGBM, and daal4py all use one tree per class. CatBoost, on the other hand, uses vectorized trees. This means all other approaches end up processing `num_classes x` more trees compared to CatBoost, e.g., 7,000 vs 1,000 for Covtype. For smaller `num_estimators` like `100`, `daal4py` outperforms CatBoost, but as `num_estimators` gets larger, CatBoost provides better inference latency. | ||
| For multiclass classification, XGBoost, LightGBM, and daal4py (with default settings as of the tested versions) use one tree per class, while CatBoost uses symmetric (oblivious) trees that handle all classes in a single tree. This means daal4py ends up processing `num_classes × num_estimators` trees compared to CatBoost's `num_estimators` trees (e.g., 7,000 vs 1,000 for Covtype with 7 classes). As a result, CatBoost can provide better inference latency for multiclass tasks with many classes and large ensembles. | ||
|
|
||
| > **Note:** XGBoost is moving towards multi-output trees (via `multi_strategy="multi_output_tree"`) which would reduce this gap by handling all classes in a single tree, similar to CatBoost. Check the [XGBoost documentation](https://xgboost.readthedocs.io/en/latest/tutorials/multioutput.html) for the latest defaults. |
There was a problem hiding this comment.
This tutorial doesn't show what the defaults are.
…e mention of undefined default, removed unnecessary symmetric tree mention
Added the materials for XGBoost optimization. Please review and give me your feedback.