|
| 1 | +[](http://kt.ijs.si/) |
| 2 | +[](https://hub.docker.com/r/hbpmip/java-jsi-clus-rm/) |
| 3 | +[](https://hub.docker.com/r/hbpmip/java-jsi-clus-rm/tags "hbpmip/java-jsi-clus-rm image tags") |
| 4 | +[](https://microbadger.com/#/images/hbpmip/java-jsi-clus-rm "hbpmip/java-jsi-clus-rm on microbadger") |
| 5 | + |
| 6 | +# hbpmip/java-jsi-clus-rm: Redescription Mining using Predictive Clustering from JSI and IRB |
| 7 | + |
| 8 | +Implementation of the Redescription mining algorithm based on Predictive Clustering Trees. |
| 9 | +For more details see https://github.com/matmih/CLUS-RM-library. |
| 10 | + |
| 11 | +## Usage |
| 12 | + |
| 13 | +```sh |
| 14 | + docker run --rm --env [list of environment variables] hbpmip/java-jsi-clus-rm compute |
| 15 | +``` |
| 16 | + |
| 17 | +where the environment variables are: |
| 18 | + |
| 19 | +* NODE: name of the node (machine) used for execution |
| 20 | +* JOB_ID: ID of the job. |
| 21 | +* IN_JDBC_DRIVER: org.postgresql.Driver |
| 22 | +* IN_JDBC_URL: URL to the input database, e.g. jdbc:postgresql://db:5432/features |
| 23 | +* IN_JDBC_USER: User for the input database |
| 24 | +* IN_JDBC_PASSWORD: Password for the input database |
| 25 | +* OUT_JDBC_DRIVER: org.postgresql.Driver |
| 26 | +* OUT_JDBC_URL: URL to the output database, jdbc:postgresql://db:5432/woken |
| 27 | +* OUT_JDBC_USER: User for the output database |
| 28 | +* OUT_JDBC_PASSWORD: Password for the output database |
| 29 | +* PARAM_covariables: Attributes contained in the first data view. |
| 30 | +* PARAM_variables: Attributes contained in the second data view. |
| 31 | +* PARAM_query: Query selecting the data to feed into the algorithm for training |
| 32 | +* MODEL_PARAM_minJS: Specify minimal redescription accuracy (measured with Jaccard index) required to return it to the user. Parameter values are contained in [0,1]. (default is MODEL_PARAM_minJS=0.5) |
| 33 | +* MODEL_PARAM_maxPval: Specify maximal redescription p-value required to return it to the user. Parameter values are contained in [0,1]. (default is MODEL_PARAM_maxPval=0.01) |
| 34 | +* MODEL_PARAM_MinSupport: Specify minimal redescription support required to return it to the user. Parameter values are contained in [1,|E|], where |E| denotes number of entities in the dataset. (This parameter MUST be defined by the user and is domain and data dependent). |
| 35 | +* MODEL_PARAM_MaxSupport: Specify maximal redescription support allowed. Parameter values are contained in [1,|E|], where |E| denotes number of entities in the dataset. (default is MODEL_PARAM_MaxSupport = |E|). |
| 36 | +* MODEL_PARAM_numRandomRestarts: Specify the number of random initialization steps performed by the CLUS-RM (the default is MODEL_PARAM_numRandomRestarts = 1). |
| 37 | +* MODEL_PARAM_numIterations: Specify the number of iterations (also called alternations) performed by the CLUS-RM (the default is MODEL_PARAM_numIterations = 10). |
| 38 | +* MODEL_PARAM_numRetRed: Specify the number of redescriptions to be returned by the CLUS-RM (the default is MODEL_PARAM_numRetRed = 50). |
| 39 | +* MODEL_PARAM_attributeImportanceW1: Specify the attribute importance, for attributes contained in view 1, used in constraint-based redescription mining (the default is MODEL_PARAM_attributeImportanceW1 = "none"). Possible values are: "none" - allow redescriptions with any attributes from view1, "suggested" - allow defining combinations of attributes that increase redescription score (redescriptions containing specified attributes are preferred), "soft" - only return redescriptions satisfying at least part of specified constraints to the user (redescriptions satisfying larger portion of constraint set are preferred), "hard" - only return redescriptions satisfying all constraints defined in one constraint set. |
| 40 | +* MODEL_PARAM_attributeImportanceW2: Specify the attribute importance, for attributes contained in view 2, used in constraint-based redescription mining (the default is MODEL_PARAM_attributeImportanceW1 = "none"). Possible values are: "none" - allow redescriptions with any attributes from view2, "suggested" - allow defining combinations of attributes that increase redescription score (redescriptions containing specified attributes are preferred), "soft" - only return redescriptions satisfying at least part of specified constraints to the user (redescriptions satisfying larger portion of constraint set are preferred), "hard" - only return redescriptions satisfying all constraints defined in one constraint set. |
| 41 | +* MODEL_PARAM_importantAttributesW1: defines constraint sets, for attributes contained in view 1, to be used in constraint-based redescription mining (default is MODEL_PARAM_importantAttributesW1=""). Constraints are specified in the format "{a;b;c},{a;d}", where a,b,c,d are some attributes contained in the first view (view1) of the data. |
| 42 | +* MODEL_PARAM_importantAttributesW2: defines constraint sets, for attributes contained in view 2, to be used in constraint-based redescription mining (default is MODEL_PARAM_importantAttributesW1=""). Constraints are specified in the format "{e;f;g},{h;i}", where e,f,g,h,i are some attributes contained in the second view (view2) of the data. |
0 commit comments