Skip to content

Update Core API docstring and default values#412

Merged
popescu-v merged 1 commit intodevfrom
fix-api-docstrings-and-default-values
Jun 2, 2025
Merged

Update Core API docstring and default values#412
popescu-v merged 1 commit intodevfrom
fix-api-docstrings-and-default-values

Conversation

@popescu-v
Copy link
Copy Markdown
Collaborator

@popescu-v popescu-v commented May 28, 2025

  • replace "None" with "none" as acceptable values for discretization_method and grouping_method, following Khiops Core PR 692 use "none" rather than "None" in all khiops parameters khiops#695
  • use "MODL" as default value instead of Python None for the same two parameters
  • in train_recoder, fix documented default value of keep_initial_categorical_variables and keep_initial_numerical_variables to False, according to the function signature.

TODO Before Asking for a Review

  • Rebase your branch to the latest version of dev (or main for release PRs)
  • Make sure all CI workflows are green
  • When adding a public feature/fix: Update the Unreleased section of CHANGELOG.md (no date)
  • Self-Review: Review "Files Changed" tab and fix any problems you find
  • API Docs (only if there are changes in docstrings, rst files or samples):
    • Check the docs build without warning: see the log of the API Docs workflow
    • Check that your changes render well in HTML: download the API Docs artifact and open index.html
    • If there are any problems it is faster to iterate by building locally the API Docs

- "0-1 binarization": A 0's and 1's coding the interval/group id
- "conditional info": Conditional information of the interval/group
- "none": Keeps the variable as-is
discretization_method : str
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering what "none" does.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"none" applies no discretization, nor grouping AFAIK; just basic statistics are computed.

# Remove discretization method if specified for supervised analysis:
# it is always MODL
if "discretization_method" in task_args and task_args["target_variable"] != "":
del task_args["discretization_method"]
Copy link
Copy Markdown

@marcboulle marcboulle May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Je ne comprend pas à quoi cela sert de supprimer la méthode de discrétisation de la cas supervisé, puisque dans ce cas, cette spécification est simplement ignorée, et que cela ne pose aucun problème.

De plus, comme la ligne suivante est dans le scénario, cela ne risque-t-il pas de faire planter le moteur de templating?
DiscretizerUnsupervisedMethodName __discretization_method__

Pourrait-on supprimer ces lignes de preprocessing des arguments des tâches?

Copy link
Copy Markdown
Collaborator Author

@popescu-v popescu-v Jun 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The templating engine doesn't fail, because the expected default value (viz. "MODL") is injected automatically in the template if the argument is missing from the API call. However, in cases such as this one, this behavior leads to potentially spurious effects: the "MODL" value is injected in the scenario, even if the user had specified, say, "none".
Hence, e.g.:

  1. the user specifies discretization_method="none" in a call to train_predictor for a supervised learning task (target_variable != "");
  2. the _preprocess_task_arguments removes discretization_method from the arguments list;
  3. the default value for this argument, i.e. "MODL" is injected into the scenario template as DiscretizerUnsupervisedMethodName MODL;
  4. As there is a target specified (e.g. AnalysisSpec.TargetAttributeName class), DiscretizerUnsupervisedMethodName is ignored altogether by the MODL_* binary.

Hence, the result is correct all in all, but the behavior is unnecessarily complicated and generates spurious entries in the scenarios.

Hence, I will remove these preprocessing lines.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


# Remove grouping method if specified for supervised analysis: it is always MODL
if "grouping_method" in task_args and task_args["target_variable"] != "":
del task_args["grouping_method"]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idem pour la méthode de groupement de valeur, avec la ligne suivante du scenario:
GrouperUnsupervisedMethodName __grouping_method__

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes indeed, I will remove the preprocessing (see the comment above).

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Allows grouping of the target variable values in classification. It can
substantially increase the training time.
discretization_method : str
discretization_method : str, default "MODL"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pour info, l'info-bulle de ce champ dans la GUI est actuellement:
Name of the discretization method in case of unsupervised analysis.

Ne pourrait-on pas harmoniser, dans un sens ou dans un autre?

Note: détail, facultatif, à noter éventuellement pour plus tard de façon plus générale en complément de l'issue #363

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will just update the docstring according to the Khiops Core entry.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

grouping_method : str
Its valid values are: "MODL", "EqualWidth", "EqualFrequency" or "none".
Ignored for supervised analysis.
grouping_method : str, default "MODL"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pour info, l'info-bulle de ce champ dans la GUI est actuellement:
Name of the value grouping method in case of unsupervised analysis.

Cf. commentaire précédent sur la méthode de discrétisation

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will update the docstring accordingly.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

- replace "None" with "none" as acceptable values for
  discretization_method and grouping_method, following Khiops Core PR
  KhiopsML/khiops#695
- use "MODL" as default value instead of Python None for the same two
  parameters
- stop removing the discretization_method and grouping_method arguments
  in case of supervised analysis: they are ignored by Khiops Core in the
  scenarios anyway, and removing them generated spurious scenario
  entries (default values substituted in the templates in case of
  absence).
- in train_recoder, fix documented default value of
  keep_initial_categorical_variables and
  keep_initial_numerical_variables to False, according to the function
  signature.
@popescu-v popescu-v force-pushed the fix-api-docstrings-and-default-values branch from 94d0c3f to 188b85f Compare June 2, 2025 09:04
@popescu-v popescu-v requested a review from marcboulle June 2, 2025 09:05
Copy link
Copy Markdown

@marcboulle marcboulle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@popescu-v popescu-v merged commit e325efd into dev Jun 2, 2025
25 of 29 checks passed
@popescu-v popescu-v deleted the fix-api-docstrings-and-default-values branch June 2, 2025 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants