Skip to content

Data lab#8

Open
anuragkapale wants to merge 31 commits into
masterfrom
data_lab
Open

Data lab#8
anuragkapale wants to merge 31 commits into
masterfrom
data_lab

Conversation

@anuragkapale
Copy link
Copy Markdown
Collaborator

No description provided.

Comment thread autokaggle/auto_ml.py Outdated

# TODO: Further clean the design of this file
class AutoKaggle(BaseEstimator):
pipeline = None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move the class variables to instance variables.

Comment thread autokaggle/auto_ml.py Outdated
p_hparams_base = None

def __init__(self, config=None, **kwargs):
"""
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow autokeras doc string style.

Comment thread autokaggle/auto_ml.py Outdated
import hyperopt
from hyperopt import tpe, hp, fmin, Trials, STATUS_OK, STATUS_FAIL
from sklearn.model_selection import cross_val_score
from autokaggle.ensemblers import RankedEnsembler, StackingEnsembler
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import modules instead of classes.

Comment thread autokaggle/auto_ml.py Outdated
m_hparams_base = None
p_hparams_base = None

def __init__(self, config=None, **kwargs):
Copy link
Copy Markdown

@qingquansong qingquansong Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Explicitly clarify all the arguments instead of using kwargs.

Comment thread autokaggle/auto_ml.py Outdated
x: A numpy.ndarray instance containing the training data.
y: training label vector.
time_limit: remaining time budget.
data_info: meta-features of the dataset, which is an numpy.ndarray describing the
Copy link
Copy Markdown

@qingquansong qingquansong Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A list of strings. (specify the type)

Comment thread autokaggle/auto_ml.py Outdated
self.pipeline.fit(x_train, y_train)

def resample(self, x, y):
if self.config.balance_class_dist:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add doc strings.

Comment thread autokaggle/auto_ml.py
return x, y

def subsample(self, x, y, sample_percent):
# TODO: Add way to balance the subsample
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add doc string to subsample.

Comment thread autokaggle/auto_ml.py Outdated
return grid_train_x, grid_train_y

def search(self, x, y, prep_space, model_space):
grid_train_x, grid_train_y = self.subsample(x, y, sample_percent=self.config.subsample_ratio)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set maximum line length to 85, and check with CI using flake 8.

Comment thread autokaggle/auto_ml.py Outdated
np.random.shuffle(best_trials)

if self.config.diverse_ensemble:
estimator_list = self.pick_diverse_estimators(best_trials, self.config.num_estimators_ensemble)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the second arg

Comment thread autokaggle/auto_ml.py Outdated
return np.array(data_info)


class AutoKaggleClassifier(AutoKaggle):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to "Classifier"

Comment thread autokaggle/auto_ml.py Outdated
return score_metric, skf


class AutoKaggleRegressor(AutoKaggle):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to "Regressor".

Comment thread autokaggle/config.py
self.ensembling_algo = hyperopt.rand.suggest if ensembling_algo == 'random' else hyperopt.tpe.suggest
self.num_p_hparams = num_p_hparams

def update(self, options):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add doc string.

Comment thread autokaggle/config.py Outdated
setattr(self, k, v)


knn_classifier_params = {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use all capital letters for constants.

Comment thread autokaggle/ensemblers.py Outdated
}


class RankedEnsembler:
Copy link
Copy Markdown

@qingquansong qingquansong Sep 20, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract a base class , function should raise not implemented error.

  1. Extend object class.
  2. Rename to RankEnsembleModel
  3. Doc strings.

Comment thread autokaggle/ensemblers.py
self.stacking_estimator = self.search(predictions, y_val)
self.stacking_estimator.fit(predictions, y_val)

def search(self, x, y):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add doc string,

Comment thread autokaggle/preprocessor.py Outdated
LEVEL_HIGH = 32


class TabularPreprocessor(TransformerMixin):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants