Skip to content

refactored decision tree into reusable components#316

Merged
Mec-iS merged 4 commits intosmartcorelib:developmentfrom
DanielLacina:decision_tree_refactor
Jul 12, 2025
Merged

refactored decision tree into reusable components#316
Mec-iS merged 4 commits intosmartcorelib:developmentfrom
DanielLacina:decision_tree_refactor

Conversation

@DanielLacina
Copy link
Copy Markdown
Contributor

No description provided.

@DanielLacina DanielLacina requested a review from Mec-iS as a code owner July 9, 2025 18:58

#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(Debug, Clone)]
struct Node {
Copy link
Copy Markdown
Collaborator

@Mec-iS Mec-iS Jul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should problably be Node<TX>, why it should be limited to f64? in general everytime you write a fixed numerical type you should ask yourself if it is generalisable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this code is copied and pasted from the DecisionTree file with the difference being that I use the name BaseTree instead of DecisionTree. This implementation is already in your codebase: https://github.com/smartcorelib/smartcore/blob/development/src/tree/decision_tree_regressor.rs#L129.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plus Node is an implementation detail (no pub keyword) so it doesn't matter what numeric type it contains.


impl PartialEq for Node {
fn eq(&self, other: &Self) -> bool {
(self.output - other.output).abs() < f64::EPSILON
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, this will work only for f64?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Node is an implementation detail


#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(Debug, Clone, Default)]
pub enum Splitter {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing docstring


#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[derive(Debug, Clone)]
/// Parameters of Regression base_tree
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should explain a little more like:

/// Parameters for configuring the behavior of a decision tree regressor.
/// 
/// This struct controls various aspects of tree construction including
/// depth limits, sample requirements, and splitting strategies.

let mut order: Vec<Vec<usize>> = Vec::new();

for i in 0..num_attributes {
let mut col_i: Vec<TX> = x.get_col(i).iterator(0).copied().collect();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this one could be just maybe?

let mut col_view: Vec<f64> = x.get_col(i).iterator(0).copied().collect();
let indices = col_view.argsort_mut();

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied and pasted this code from the decision tree file. That would probably be more readable. Ima be honest, it was pretty hard reading the code of the decision tree implementation, but it is indeed optimal.

fn parameters(&self) -> &BaseTreeRegressorParameters {
self.parameters.as_ref().unwrap()
}
/// Get estimate of intercept, return value
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this comment is correct, please double-check

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's already written in the codebase.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Mec-iS
Copy link
Copy Markdown
Collaborator

Mec-iS commented Jul 11, 2025

Thank you!

This is a nice starting point so to make possible to implement:

  • Classification Trees
  • Ensemble Methods: random forests
  • Feature Selection: mtry can be used for feature sampling

@Mec-iS
Copy link
Copy Markdown
Collaborator

Mec-iS commented Jul 11, 2025

This opens up to implementations like:

trait TreeRegressor<TX, TY, X, Y> {
    fn fit_tree(&self, x: &X, y: &Y, samples: Vec<usize>) -> Result<Self, Failed>;
    fn predict_tree(&self, x: &X) -> Result<Y, Failed>;
}

trait EnsembleRegressor<TX, TY, X, Y> {
    fn fit_ensemble(&self, x: &X, y: &Y, n_estimators: usize) -> Result<Self, Failed>;
    fn predict_ensemble(&self, x: &X) -> Result<Y, Failed>;
}

is that correct?

Recommended Implementation Priority

  • Random Forest Regressor: Highest priority due to existing foundation
  • Gradient Boosting Regressor: High impact for predictive performance
  • Extra Trees Regressor: Low implementation cost with good diversity benefits
  • Quantile Regression Trees: Valuable for uncertainty quantification
  • Pruned Regression Trees: Important for model interpretability

@DanielLacina
Copy link
Copy Markdown
Contributor Author

I think we should build Base models that other more complex models can build on top of.

@DanielLacina
Copy link
Copy Markdown
Contributor Author

Idk if traits are necessary. I think all we need is for example a complex model that stores the base model as an attribute. And then we can add extra logic on top of the base model.

@Mec-iS Mec-iS self-requested a review July 12, 2025 10:24
@Mec-iS Mec-iS merged commit c5816b0 into smartcorelib:development Jul 12, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants