Data lab by anuragkapale · Pull Request #8 · datamllab/autokaggle

anuragkapale · 2019-09-20T19:18:58Z

No description provided.

qingquansong · 2019-09-20T19:21:35Z

+
+# TODO: Further clean the design of this file
+class AutoKaggle(BaseEstimator):
+    pipeline = None


Move the class variables to instance variables.

qingquansong · 2019-09-20T19:21:59Z

+    p_hparams_base = None
+
+    def __init__(self, config=None, **kwargs):
+        """


Follow autokeras doc string style.

qingquansong · 2019-09-20T19:22:58Z

+import hyperopt
+from hyperopt import tpe, hp, fmin, Trials, STATUS_OK, STATUS_FAIL
+from sklearn.model_selection import cross_val_score
+from autokaggle.ensemblers import RankedEnsembler, StackingEnsembler


import modules instead of classes.

qingquansong · 2019-09-20T19:25:04Z

+    m_hparams_base = None
+    p_hparams_base = None
+
+    def __init__(self, config=None, **kwargs):


Explicitly clarify all the arguments instead of using kwargs.

qingquansong · 2019-09-20T19:26:02Z

+            x: A numpy.ndarray instance containing the training data.
+            y: training label vector.
+            time_limit: remaining time budget.
+            data_info: meta-features of the dataset, which is an numpy.ndarray describing the


A list of strings. (specify the type)

qingquansong · 2019-09-20T19:27:50Z

+        self.pipeline.fit(x_train, y_train)
+
+    def resample(self, x, y):
+        if self.config.balance_class_dist:


add doc strings.

qingquansong · 2019-09-20T19:30:52Z

+        return x, y
+
+    def subsample(self, x, y, sample_percent):
+        # TODO: Add way to balance the subsample


Add doc string to subsample.

qingquansong · 2019-09-20T19:31:56Z

+        return grid_train_x, grid_train_y
+
+    def search(self, x, y, prep_space, model_space):
+        grid_train_x, grid_train_y = self.subsample(x, y, sample_percent=self.config.subsample_ratio)


set maximum line length to 85, and check with CI using flake 8.

qingquansong · 2019-09-20T19:39:57Z

+            np.random.shuffle(best_trials)
+
+        if self.config.diverse_ensemble:
+            estimator_list = self.pick_diverse_estimators(best_trials, self.config.num_estimators_ensemble)


remove the second arg

qingquansong · 2019-09-20T19:43:06Z

+        return np.array(data_info)
+
+
+class AutoKaggleClassifier(AutoKaggle):


rename to "Classifier"

qingquansong · 2019-09-20T19:53:12Z

+        return score_metric, skf
+
+
+class AutoKaggleRegressor(AutoKaggle):


Rename to "Regressor".

qingquansong · 2019-09-20T19:54:14Z

+        self.ensembling_algo = hyperopt.rand.suggest if ensembling_algo == 'random' else hyperopt.tpe.suggest
+        self.num_p_hparams = num_p_hparams
+
+    def update(self, options):


Add doc string.

qingquansong · 2019-09-20T19:54:46Z

+                setattr(self, k, v)
+
+
+knn_classifier_params = {


Use all capital letters for constants.

qingquansong · 2019-09-20T19:55:33Z

+}
+
+
+class RankedEnsembler:


Extract a base class , function should raise not implemented error.

Extend object class.

Rename to RankEnsembleModel

Doc strings.

qingquansong · 2019-09-20T19:59:28Z

+        self.stacking_estimator = self.search(predictions, y_val)
+        self.stacking_estimator.fit(predictions, y_val)
+
+    def search(self, x, y):


add doc string,

qingquansong · 2019-09-20T20:13:47Z

+LEVEL_HIGH = 32
+
+
+class TabularPreprocessor(TransformerMixin):


anuragkapale added 29 commits July 20, 2019 11:09

Class Design + Ensemble

4655af1

Refactor

289805c

Add Benchmarking script

23ddedf

Add estimators

fadf0ab

Fixed the tests

1f800e1

Use hyper-opt for search

a7b66c3

Added code for primitives

27ed4b8

Fetch multiple trials from hyperopt

1e7c94f

Fix regression hparamspace

c5158ce

Resolve fziling datasets

6dd78ae

Before shifting to pandas

d6a6060

Shift to pandas, add 2nd order and target encoding

8d0dc6b

Use Tabular Data

e10541d

Save changes

2e97035

Fix the prep pipeline

4c7c0fb

Added global config/fixed label encoder

08f4a4d

Split to classifier and regressor

cdd2ba7

Refactor Config

516515b

Fix config init related bug

c400a2f

Diverse Ensembles

b35d9be

CV for stacking and proba stacking

1731d77

hparam update

e614bfd

Add blind dataset in stacking

951e1d1

Refactor with AutoPipe

5aed127

Fix higher order primitives

4749d5c

Add params to preprocessor

f4df7fd

2 rounds search

d5b7f9c

Select best preprocessing settings

6d25537

prep param space update

8e0068b

qingquansong reviewed Sep 20, 2019

View reviewed changes

Comment thread autokaggle/preprocessor.py Outdated

LEVEL_HIGH = 32

class TabularPreprocessor(TransformerMixin):

Copy link
Copy Markdown

qingquansong Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename

anuragkapale added 2 commits September 26, 2019 18:26

Address review comments

c29f159

Fix Indent

7e651e6

		return np.array(data_info)


		class AutoKaggleClassifier(AutoKaggle):

		return score_metric, skf


		class AutoKaggleRegressor(AutoKaggle):

		setattr(self, k, v)


		knn_classifier_params = {

Conversation

anuragkapale commented Sep 20, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qingquansong Sep 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qingquansong Sep 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

qingquansong Sep 20, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

qingquansong Sep 20, 2019 •

edited

Loading

qingquansong Sep 20, 2019 •

edited

Loading

qingquansong Sep 20, 2019 •

edited

Loading