AutoML model incorporating tune commands.#1410
AutoML model incorporating tune commands.#1410seraphimstreets wants to merge 15 commits intointel:mainfrom
Conversation
mhash1m
left a comment
There was a problem hiding this comment.
Great work so far, Lets continue the review on the weekend.
dffml/model/automl.py
Outdated
| if self.parent.config.objective == "min": | ||
| highest_acc = float("inf") | ||
| elif self.parent.config.objective == "max": | ||
| highest_acc = -1 |
There was a problem hiding this comment.
just in case we have a scorer that outputs values below -1 or if someone adds one in the future, lets have the highest_acc as float("-inf") (might want to confirm this syntax)
dffml/model/automl.py
Outdated
| else: | ||
| tuner.config.parameters = {} | ||
|
|
||
| val = await tune(model, tuner, scorer, self.parent.config.predict, sources, sources) |
There was a problem hiding this comment.
Lets not use the same sources for train and validation. It was discussed that we will use a list of sources instead.
dffml/model/automl.py
Outdated
| else: | ||
| tuner.config.parameters = {} | ||
|
|
||
| val = await tune(model, tuner, scorer, self.parent.config.predict, sources, sources) |
There was a problem hiding this comment.
lets rename val so it doesnt get confused with validation(val short)
|
|
||
|
|
||
| from dffml_model_xgboost.xgbregressor import ( | ||
| XGBRegressorModel, |
There was a problem hiding this comment.
Was this file left in intentionally? Let's double check for other additions like this which might be from directory copy pastin getc.
There was a problem hiding this comment.
Yes, you're right. I'll remove it.
As part of the second stage of the GSOC AutoML project as defined in #968, this is a preliminary iteration of the AutoML model. The idea is to allow users to provide a dataset and list of models they wish to train, and DFFML's integrated AutoML model will perform training/tuning/scoring to select the best model for the user, abstracting away much of the ML process into an easy-to-use API. The current iteration performs the training and scoring using default hyperparameters, so has not implemented tuning yet. Some discussion by the community will be needed to evaluate the best way for tuning to occur in the AutoML process. (should we have default hyperparameter search spaces for each model, or must it be user-defined?)