Skip to content

Commit 1b0b3dc

Browse files
authored
Merge pull request #467 from KhiopsML/59-add-api-to-specify-dictionary-rules
59 add api to specify dictionary rules
2 parents ec50a84 + a4ef9e5 commit 1b0b3dc

File tree

7 files changed

+692
-10
lines changed

7 files changed

+692
-10
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
### Added
1212
- (`core`) Dictionary API support for dictionary, variable and variable block
1313
comments, and dictionary and variable block internal comments.
14+
- (`core`) Dictionary `Rule` class and supporting API for adding and getting
15+
rules to / from variables and variable blocks.
1416
- (`sklearn`) `Text` Khiops type support at the estimator level.
1517

1618
### Fixed

doc/samples/samples.rst

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -655,17 +655,23 @@ Samples
655655
fold_index_variable.name = "FoldIndex"
656656
fold_index_variable.type = "Numerical"
657657
fold_index_variable.used = False
658-
fold_index_variable.rule = "Ceil(Product(" + str(fold_number) + ", Random()))"
659658
dictionary.add_variable(fold_index_variable)
660659
660+
# Create fold indexing rule and set it on `fold_index_variable`
661+
dictionary.get_variable(fold_index_variable.name).set_rule(
662+
kh.Rule("Ceil", kh.Rule("Product", fold_number, kh.Rule("Random()"))),
663+
)
664+
661665
# Add variables that indicate if the instance is in the train dataset:
662666
for fold_index in range(1, fold_number + 1):
663667
is_in_train_dataset_variable = kh.Variable()
664668
is_in_train_dataset_variable.name = "IsInTrainDataset" + str(fold_index)
665669
is_in_train_dataset_variable.type = "Numerical"
666670
is_in_train_dataset_variable.used = False
667-
is_in_train_dataset_variable.rule = "NEQ(FoldIndex, " + str(fold_index) + ")"
668671
dictionary.add_variable(is_in_train_dataset_variable)
672+
dictionary.get_variable(is_in_train_dataset_variable.name).set_rule(
673+
kh.Rule("NEQ", fold_index_variable, fold_index),
674+
)
669675
670676
# Print dictionary with fold variables
671677
print("Dictionary file with fold variables")

khiops/core/api.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -757,9 +757,11 @@ def train_predictor(
757757
Maximum number of text features to construct.
758758
text_features : str, default "words"
759759
Type of the text features. Can be either one of:
760+
760761
- "words": sequences of non-space characters
761762
- "ngrams": sequences of bytes
762763
- "tokens": user-defined
764+
763765
max_trees : int, default 10
764766
Maximum number of trees to construct.
765767
max_pairs : int, default 0
@@ -788,8 +790,10 @@ def train_predictor(
788790
Maximum number of variable parts produced by preprocessing methods. If equal
789791
to 0 it is automatically calculated.
790792
Special default values for unsupervised analysis:
793+
791794
- If ``discretization_method`` is "EqualWidth" or "EqualFrequency": 10
792795
- If ``grouping_method`` is "BasicGrouping": 10
796+
793797
... :
794798
See :ref:`core-api-common-params`.
795799
@@ -1181,9 +1185,11 @@ def train_recoder(
11811185
Maximum number of text features to construct.
11821186
text_features : str, default "words"
11831187
Type of the text features. Can be either one of:
1188+
11841189
- "words": sequences of non-space characters
11851190
- "ngrams": sequences of bytes
11861191
- "tokens": user-defined
1192+
11871193
max_trees : int, default 10
11881194
Maximum number of trees to construct.
11891195
max_pairs : int, default 0
@@ -1210,13 +1216,16 @@ def train_recoder(
12101216
If ``True`` keeps initial numerical variables.
12111217
categorical_recoding_method : str
12121218
Type of recoding for categorical variables. Types available:
1219+
12131220
- "part Id" (default): An id for the interval/group
12141221
- "part label": A label for the interval/group
12151222
- "0-1 binarization": A 0's and 1's coding the interval/group id
12161223
- "conditional info": Conditional information of the interval/group
12171224
- "none": Keeps the variable as-is
1225+
12181226
numerical_recoding_method : str
12191227
Type of recoding recoding for numerical variables. Types available:
1228+
12201229
- "part Id" (default): An id for the interval/group
12211230
- "part label": A label for the interval/group
12221231
- "0-1 binarization": A 0's and 1's coding the interval/group id
@@ -1226,13 +1235,16 @@ def train_recoder(
12261235
- "rank normalization": mean normalized rank (between 0 and 1) of the
12271236
instances
12281237
- "none": Keeps the variable as-is
1238+
12291239
pairs_recoding_method : str
12301240
Type of recoding for bivariate variables. Types available:
1241+
12311242
- "part Id" (default): An id for the interval/group
12321243
- "part label": A label for the interval/group
12331244
- "0-1 binarization": A 0's and 1's coding the interval/group id
12341245
- "conditional info": Conditional information of the interval/group
12351246
- "none": Keeps the variable as-is
1247+
12361248
discretization_method : str, default "MODL"
12371249
Name of the discretization method in case of unsupervised analysis.
12381250
Its valid values are: "MODL", "EqualWidth", "EqualFrequency" or "none".
@@ -1245,15 +1257,18 @@ def train_recoder(
12451257
Maximum number of variable parts produced by preprocessing methods. If equal
12461258
to 0 it is automatically calculated.
12471259
Special default values for unsupervised analysis:
1260+
12481261
- If ``discretization_method`` is "EqualWidth" or "EqualFrequency": 10
12491262
- If ``grouping_method`` is "BasicGrouping": 10
1263+
12501264
... :
12511265
See :ref:`core-api-common-params`.
12521266
12531267
Returns
12541268
-------
12551269
tuple
12561270
A 2-tuple containing:
1271+
12571272
- The path of the JSON file report of the process
12581273
- The path of the dictionary containing the recoding model
12591274

0 commit comments

Comments
 (0)