AutoModel - HyperParameter Search
This example shows how to search hyperparameters for a model with Tuner module.
In [ ]:
Copied!
import os
# you can remove this
os.chdir("../../")
import os
# you can remove this
os.chdir("../../")
In [ ]:
Copied!
from gradsflow import Model
from gradsflow.tuner.tuner import Tuner
from gradsflow import Model
from gradsflow.tuner.tuner import Tuner
In [ ]:
Copied!
from timm import create_model
from ray import tune
from gradsflow.data.image import get_fake_data
from gradsflow import AutoDataset
from timm import create_model
from ray import tune
from gradsflow.data.image import get_fake_data
from gradsflow import AutoDataset
In [ ]:
Copied!
image_size = (64, 64)
fake_data = get_fake_data(image_size, num_workers=0)
train_ds, train_dl = fake_data.dataset, fake_data.dataloader
fake_data = get_fake_data(image_size, num_workers=0)
val_ds, val_dl = fake_data.dataset, fake_data.dataloader
num_classes = train_ds.num_classes
autodataset = AutoDataset(train_dl, val_dl, num_classes=num_classes)
image_size = (64, 64)
fake_data = get_fake_data(image_size, num_workers=0)
train_ds, train_dl = fake_data.dataset, fake_data.dataloader
fake_data = get_fake_data(image_size, num_workers=0)
val_ds, val_dl = fake_data.dataset, fake_data.dataloader
num_classes = train_ds.num_classes
autodataset = AutoDataset(train_dl, val_dl, num_classes=num_classes)
In [ ]:
Copied!
from gradsflow.tuner.tuner import Tuner
from gradsflow.tuner.automodel import AutoModelV2
from gradsflow.tuner.tuner import Tuner
from gradsflow.tuner.automodel import AutoModelV2
Registering hyperparameters¶
Gradsflow AutoModel provides you two main ways to register your hyperparameters.
Easiest way is to compile the model and the values will be registered automatically. In this example we will hyperparameter search for ConvNet architecture, optimizer and learning rate.
In [ ]:
Copied!
tuner = Tuner()
cnn1 = create_model("resnet18", pretrained=False, num_classes=num_classes)
cnn2 = create_model("efficientnet_b0", pretrained=False, num_classes=num_classes)
cnns = tuner.suggest_complex("learner", cnn1, cnn2)
tuner = Tuner()
cnn1 = create_model("resnet18", pretrained=False, num_classes=num_classes)
cnn2 = create_model("efficientnet_b0", pretrained=False, num_classes=num_classes)
cnns = tuner.suggest_complex("learner", cnn1, cnn2)
2021-10-09 09:38:07,002 INFO services.py:1250 -- View the Ray dashboard at http://127.0.0.1:8265
In [ ]:
Copied!
model = AutoModelV2(cnns)
model.compile(
loss="crossentropyloss",
optimizer=tune.choice(("adam", "sgd")),
learning_rate=tune.loguniform(1e-5, 1e-3),
metrics="accuracy",
)
model = AutoModelV2(cnns)
model.compile(
loss="crossentropyloss",
optimizer=tune.choice(("adam", "sgd")),
learning_rate=tune.loguniform(1e-5, 1e-3),
metrics="accuracy",
)
In [ ]:
Copied!
model.hp_tune(tuner, autodataset, epochs=1, n_trials=1)
model.hp_tune(tuner, autodataset, epochs=1, n_trials=1)
2021-10-09 09:38:07,521 WARNING function_runner.py:558 -- Function checkpointing is disabled. This may result in unexpected behavior when using checkpointing features or certain schedulers. To enable, set the train function arguments to be `func(config, checkpoint_dir=None)`.
== Status ==
Memory usage on this node: 9.3/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 PENDING)
Memory usage on this node: 9.3/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc | learner | learning_rate | optimizer |
---|---|---|---|---|---|
trainable_8234a_00000 | PENDING | 1 | 0.000146035 | adam |
== Status ==
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 RUNNING)
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | learner | learning_rate | optimizer |
---|---|---|---|---|---|
trainable_8234a_00000 | RUNNING | 1 | 0.000146035 | adam |
== Status ==
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 RUNNING)
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | learner | learning_rate | optimizer |
---|---|---|---|---|---|
trainable_8234a_00000 | RUNNING | 1 | 0.000146035 | adam |
== Status ==
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 RUNNING)
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | learner | learning_rate | optimizer |
---|---|---|---|---|---|
trainable_8234a_00000 | RUNNING | 1 | 0.000146035 | adam |
== Status ==
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 RUNNING)
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | learner | learning_rate | optimizer |
---|---|---|---|---|---|
trainable_8234a_00000 | RUNNING | 1 | 0.000146035 | adam |
== Status ==
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 RUNNING)
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | learner | learning_rate | optimizer |
---|---|---|---|---|---|
trainable_8234a_00000 | RUNNING | 1 | 0.000146035 | adam |
== Status ==
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 RUNNING)
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | learner | learning_rate | optimizer |
---|---|---|---|---|---|
trainable_8234a_00000 | RUNNING | 1 | 0.000146035 | adam |
Result for trainable_8234a_00000: date: 2021-10-09_09-38-39 done: false experiment_id: 62141761f7d541bba0df2e14526d1b1e hostname: Anikets-Turing-Machine.local iterations_since_restore: 1 node_ip: 192.168.50.84 pid: 9798 should_checkpoint: true time_since_restore: 26.36624789237976 time_this_iter_s: 26.36624789237976 time_total_s: 26.36624789237976 timestamp: 1633752519 timesteps_since_restore: 0 train_accuracy: tensor(0.0900) train_loss: 2.4604174029581074 training_iteration: 1 trial_id: 8234a_00000 val_accuracy: tensor(0.0900) val_loss: 25.253401203201005 Result for trainable_8234a_00000: date: 2021-10-09_09-38-39 done: true experiment_id: 62141761f7d541bba0df2e14526d1b1e experiment_tag: 0_learner=1,learning_rate=0.00014603,optimizer=adam hostname: Anikets-Turing-Machine.local iterations_since_restore: 1 node_ip: 192.168.50.84 pid: 9798 should_checkpoint: true time_since_restore: 26.36624789237976 time_this_iter_s: 26.36624789237976 time_total_s: 26.36624789237976 timestamp: 1633752519 timesteps_since_restore: 0 train_accuracy: tensor(0.0900) train_loss: 2.4604174029581074 training_iteration: 1 trial_id: 8234a_00000 val_accuracy: tensor(0.0900) val_loss: 25.253401203201005
== Status ==
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Current best trial: 8234a_00000 with train_loss=2.4604174029581074 and parameters={'learner': 1, 'optimizer': 'adam', 'learning_rate': 0.00014603467084700278}
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 TERMINATED)
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Current best trial: 8234a_00000 with train_loss=2.4604174029581074 and parameters={'learner': 1, 'optimizer': 'adam', 'learning_rate': 0.00014603467084700278}
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-07
Number of trials: 1/1 (1 TERMINATED)
Trial name | status | loc | learner | learning_rate | optimizer | iter | total time (s) | val_loss | train_loss |
---|---|---|---|---|---|---|---|---|---|
trainable_8234a_00000 | TERMINATED | 1 | 0.000146035 | adam | 1 | 26.3662 | 25.2534 | 2.46042 |
2021-10-09 09:38:39,930 INFO tune.py:617 -- Total run time: 32.42 seconds (32.23 seconds for the tuning loop).
The second way to register hyperparameters is to use Tuner module.¶
In [ ]:
Copied!
tuner = Tuner()
cnn1 = create_model("resnet18", pretrained=False, num_classes=num_classes)
cnn2 = create_model("efficientnet_b0", pretrained=False, num_classes=num_classes)
cnns = tuner.suggest_complex("learner", cnn1, cnn2)
tuner = Tuner()
cnn1 = create_model("resnet18", pretrained=False, num_classes=num_classes)
cnn2 = create_model("efficientnet_b0", pretrained=False, num_classes=num_classes)
cnns = tuner.suggest_complex("learner", cnn1, cnn2)
In [ ]:
Copied!
tuner.choice("optimizer", "adam", "sgd")
tuner.loguniform("learning_rate", 1e-5, 1e-3)
tuner.scalar("loss", "crossentropyloss")
tuner.choice("optimizer", "adam", "sgd")
tuner.loguniform("learning_rate", 1e-5, 1e-3)
tuner.scalar("loss", "crossentropyloss")
In [ ]:
Copied!
model = AutoModelV2(cnns)
model.hp_tune(tuner, autodataset, epochs=1, n_trials=1)
model = AutoModelV2(cnns)
model.hp_tune(tuner, autodataset, epochs=1, n_trials=1)
== Status ==
Memory usage on this node: 9.4/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-40
Number of trials: 1/1 (1 PENDING)
Memory usage on this node: 9.4/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-40
Number of trials: 1/1 (1 PENDING)
Trial name | status | loc | learner | learning_rate | optimizer |
---|---|---|---|---|---|
trainable_95d24_00000 | PENDING | 0 | 1.00317e-05 | sgd |
== Status ==
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-40
Number of trials: 1/1 (1 RUNNING)
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-40
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | learner | learning_rate | optimizer |
---|---|---|---|---|---|
trainable_95d24_00000 | RUNNING | 0 | 1.00317e-05 | sgd |
== Status ==
Memory usage on this node: 9.6/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-40
Number of trials: 1/1 (1 RUNNING)
Memory usage on this node: 9.6/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-40
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | learner | learning_rate | optimizer |
---|---|---|---|---|---|
trainable_95d24_00000 | RUNNING | 0 | 1.00317e-05 | sgd |
== Status ==
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-40
Number of trials: 1/1 (1 RUNNING)
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 1.0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-40
Number of trials: 1/1 (1 RUNNING)
Trial name | status | loc | learner | learning_rate | optimizer |
---|---|---|---|---|---|
trainable_95d24_00000 | RUNNING | 0 | 1.00317e-05 | sgd |
Result for trainable_95d24_00000: date: 2021-10-09_09-38-59 done: false experiment_id: ae1947fa57dd47f69512fb8f967d5716 hostname: Anikets-Turing-Machine.local iterations_since_restore: 1 node_ip: 192.168.50.84 pid: 9811 should_checkpoint: true time_since_restore: 13.801200151443481 time_this_iter_s: 13.801200151443481 time_total_s: 13.801200151443481 timestamp: 1633752539 timesteps_since_restore: 0 train_loss: 2.308414518810076 training_iteration: 1 trial_id: 95d24_00000 val_loss: 2.3132864784963068 Result for trainable_95d24_00000: date: 2021-10-09_09-38-59 done: true experiment_id: ae1947fa57dd47f69512fb8f967d5716 experiment_tag: 0_learner=0,learning_rate=1.0032e-05,optimizer=sgd hostname: Anikets-Turing-Machine.local iterations_since_restore: 1 node_ip: 192.168.50.84 pid: 9811 should_checkpoint: true time_since_restore: 13.801200151443481 time_this_iter_s: 13.801200151443481 time_total_s: 13.801200151443481 timestamp: 1633752539 timesteps_since_restore: 0 train_loss: 2.308414518810076 training_iteration: 1 trial_id: 95d24_00000 val_loss: 2.3132864784963068
== Status ==
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Current best trial: 95d24_00000 with train_loss=2.308414518810076 and parameters={'learner': 0, 'optimizer': 'sgd', 'learning_rate': 1.0031736212717992e-05, 'loss': 'crossentropyloss'}
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-40
Number of trials: 1/1 (1 TERMINATED)
Memory usage on this node: 9.5/16.0 GiB
Using FIFO scheduling algorithm.
Resources requested: 0/8 CPUs, 0/0 GPUs, 0.0/5.61 GiB heap, 0.0/2.81 GiB objects
Current best trial: 95d24_00000 with train_loss=2.308414518810076 and parameters={'learner': 0, 'optimizer': 'sgd', 'learning_rate': 1.0031736212717992e-05, 'loss': 'crossentropyloss'}
Result logdir: /Users/aniket/ray_results/trainable_2021-10-09_09-38-40
Number of trials: 1/1 (1 TERMINATED)
Trial name | status | loc | learner | learning_rate | optimizer | iter | total time (s) | val_loss | train_loss |
---|---|---|---|---|---|---|---|---|---|
trainable_95d24_00000 | TERMINATED | 0 | 1.00317e-05 | sgd | 1 | 13.8012 | 2.31329 | 2.30841 |
2021-10-09 09:38:59,845 INFO tune.py:617 -- Total run time: 19.41 seconds (19.29 seconds for the tuning loop).
Last update:
October 9, 2021