Update Core API docstring and default values#412
Conversation
| - "0-1 binarization": A 0's and 1's coding the interval/group id | ||
| - "conditional info": Conditional information of the interval/group | ||
| - "none": Keeps the variable as-is | ||
| discretization_method : str |
There was a problem hiding this comment.
I was wondering what "none" does.
There was a problem hiding this comment.
"none" applies no discretization, nor grouping AFAIK; just basic statistics are computed.
khiops/core/api.py
Outdated
| # Remove discretization method if specified for supervised analysis: | ||
| # it is always MODL | ||
| if "discretization_method" in task_args and task_args["target_variable"] != "": | ||
| del task_args["discretization_method"] |
There was a problem hiding this comment.
Je ne comprend pas à quoi cela sert de supprimer la méthode de discrétisation de la cas supervisé, puisque dans ce cas, cette spécification est simplement ignorée, et que cela ne pose aucun problème.
De plus, comme la ligne suivante est dans le scénario, cela ne risque-t-il pas de faire planter le moteur de templating?
DiscretizerUnsupervisedMethodName __discretization_method__
Pourrait-on supprimer ces lignes de preprocessing des arguments des tâches?
There was a problem hiding this comment.
The templating engine doesn't fail, because the expected default value (viz. "MODL") is injected automatically in the template if the argument is missing from the API call. However, in cases such as this one, this behavior leads to potentially spurious effects: the "MODL" value is injected in the scenario, even if the user had specified, say, "none".
Hence, e.g.:
- the user specifies
discretization_method="none"in a call totrain_predictorfor a supervised learning task (target_variable != ""); - the
_preprocess_task_argumentsremovesdiscretization_methodfrom the arguments list; - the default value for this argument, i.e.
"MODL"is injected into the scenario template asDiscretizerUnsupervisedMethodName MODL; - As there is a target specified (e.g.
AnalysisSpec.TargetAttributeName class),DiscretizerUnsupervisedMethodNameis ignored altogether by theMODL_*binary.
Hence, the result is correct all in all, but the behavior is unnecessarily complicated and generates spurious entries in the scenarios.
Hence, I will remove these preprocessing lines.
khiops/core/api.py
Outdated
|
|
||
| # Remove grouping method if specified for supervised analysis: it is always MODL | ||
| if "grouping_method" in task_args and task_args["target_variable"] != "": | ||
| del task_args["grouping_method"] |
There was a problem hiding this comment.
Idem pour la méthode de groupement de valeur, avec la ligne suivante du scenario:
GrouperUnsupervisedMethodName __grouping_method__
There was a problem hiding this comment.
Yes indeed, I will remove the preprocessing (see the comment above).
| Allows grouping of the target variable values in classification. It can | ||
| substantially increase the training time. | ||
| discretization_method : str | ||
| discretization_method : str, default "MODL" |
There was a problem hiding this comment.
Pour info, l'info-bulle de ce champ dans la GUI est actuellement:
Name of the discretization method in case of unsupervised analysis.
Ne pourrait-on pas harmoniser, dans un sens ou dans un autre?
Note: détail, facultatif, à noter éventuellement pour plus tard de façon plus générale en complément de l'issue #363
There was a problem hiding this comment.
OK, I will just update the docstring according to the Khiops Core entry.
| grouping_method : str | ||
| Its valid values are: "MODL", "EqualWidth", "EqualFrequency" or "none". | ||
| Ignored for supervised analysis. | ||
| grouping_method : str, default "MODL" |
There was a problem hiding this comment.
Pour info, l'info-bulle de ce champ dans la GUI est actuellement:
Name of the value grouping method in case of unsupervised analysis.
Cf. commentaire précédent sur la méthode de discrétisation
There was a problem hiding this comment.
OK, I will update the docstring accordingly.
- replace "None" with "none" as acceptable values for discretization_method and grouping_method, following Khiops Core PR KhiopsML/khiops#695 - use "MODL" as default value instead of Python None for the same two parameters - stop removing the discretization_method and grouping_method arguments in case of supervised analysis: they are ignored by Khiops Core in the scenarios anyway, and removing them generated spurious scenario entries (default values substituted in the templates in case of absence). - in train_recoder, fix documented default value of keep_initial_categorical_variables and keep_initial_numerical_variables to False, according to the function signature.
94d0c3f to
188b85f
Compare
TODO Before Asking for a Review
dev(ormainfor release PRs)Unreleasedsection ofCHANGELOG.md(no date)index.html