BOHB 사용법 | Seungwoo Han

BOHB란?

BOHB는 Bayesian Optimization과 Hyperband를 결합한 하이퍼파라미터 탐색 알고리즘이다. Hyperband의 빠른 탐색 속도와 Bayesian Optimization의 방향성 있는 탐색을 동시에 활용한다.

필수 라이브러리 설치

다음과 같은 라이브러리가 필요하다.

pip install ray[tune] hpbandster ConfigSpace

또한, 이 포스트에선 다음과 같이 세 가지 방식으로 구현하였다.

TF Keras (TuneCallback 방식)
TF GradientTape (train_iteration 방식)
PyTorch

주요 설정

1. 하이퍼파라미터 설정

PyTorch 기준으로, 최적화하고 싶은 파라미터를 config[""]로 감싸서 선언한다. 예를 들어 Linear의 뉴런, 활성화 함수, 최적화기, 에폭수를 최적화한다고 했을때, 다음과 같이 선언할수 있다.

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()

        self.flatten = nn.Flatten()
        self.Dense1 = nn.Linear(784, config["neuron1"])
        self.Dense2 = nn.Linear(config["neuron1"], config["neuron2"])
        self.output = nn.Linear(config["neuron1"], 10)

        if config["activation"] == "relu":
            self.activation = nn.ReLU()
        if config["activation"] == "tanh":
            self.activation = nn.Tanh()

    def forward(self, x):
        x = self.flatten(x)
        x = self.activation(self.Dense1(x))
        x = self.activation(self.Dense2(x))
        x = F.log_softmax(x)
        return x

if config["optimizers"] == "rmsprop":
    optimizer = torch.optim.RMSprop(model.parameters())
if config["optimizers"] == "adam":
    optimizer = torch.optim.Adam(model.parameters())

for epoch in range(config["training_iteration"]):
  ...

config를 지정한뒤, Scheduler를 실행할 파일에 다음과 같이 최적화 범위를 지정한다.

config = {
    "training_iteration": epochs,
    "activation": tune.choice(["relu", "tanh"]),
    "neuron1": tune.randint(32, 64),
    "neuron2": tune.randint(32, 64),
    "optimizers": tune.choice(["rmsprop", "adam"]),
}

2. Scheduler의 max_t 설정

Scheduler에는 max_t라는 파라미터가 있다. max_t는 각 trial이 실행할 최대 epoch 수를 의미한다. Keras나 PyTorch 모두 이 값을 기준으로 학습이 종료된다.

scheduler = HyperBandForBOHB(
    time_attr="training_iteration",
    max_t=100,
)

trial이 시작되면 아래와 같이 상태가 출력된다.

+-----------------------+----------+-------+--------------+-----------+-----------+--------------+
| Trial name            | status   | loc   | activation   |   neuron1 |   neuron2 | optimizers   |
|-----------------------+----------+-------+--------------+-----------+-----------+--------------|
| objective_dd8c7_00000 | PENDING  |       | tanh         |        43 |        50 | adam         |
| objective_dd8c7_00001 | PENDING  |       | relu         |        63 |        45 | adam         |
| objective_dd8c7_00002 | PENDING  |       | tanh         |        63 |        52 | rmsprop      |
+-----------------------+----------+-------+--------------+-----------+-----------+--------------+

2. stop 조건 설정

이제 Scheduler를 실행할 tune.run함수에 대해 알아보자. tune.run은 최적화할 목적함수인 Objective, config, Scheduler, num_samples, metric 등이 있다. 각 요소는 다음과 같다.

Objective : 최적화할 코드의 def 함수의 이름. 예를 들어 위의 Net을 최적화 할 경우 다음과 같이 만들수 있다.

def objective(config):
  device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

  mnist_train = dsets.MNIST(
      root="MNIST_data/",  
      train=True,  
      transform=transforms.ToTensor(),  
      download=True,
  )

   class Net(nn.Module):
      def __init__(self):
          super(Net, self).__init__()

          self.flatten = nn.Flatten()
          self.Dense1 = nn.Linear(784, config["neuron1"])
  ...

config : 위에 설명하였으니 생략
Scheduler : BOHB, Hyperband, Grid search등 최적화 스케쥴러를 선언한다.
num_samples : 고려할 샘플 수들
metric : 정확도와 같은 지표.
mode : 목적함수 값의 최대화 (max), 또는 최소화 (min) 지정

또한 tune.run의 stop 인자를 통해 종료 조건을 설정할 수 있다. 일정 정확도 이상이면 종료할수도 있고, training_iteration: 1로 설정하면 max_t에 관계없이 모든 trial이 1 iteration만 수행하고 종료된다. 아래는 해당 조건으로 실행했을 때의 출력 예시이다.

tune.run(
    objective,
    config=config,
    scheduler=scheduler,
    num_samples=64
    metric="mean_accuracy",
    local_dir="./bohb_results",
    mode="max",
    resources_per_trial={"cpu": 4, "gpu": 0},
    stop={
        "mean_accuracy": 0.99,      # accuracy가 0.99 이상이면 종료
        "training_iteration": 1     # 1 iteration만 수행하고 종료
    },
)

Number of trials: 3/3 (3 TERMINATED)
+-----------------------+------------+-------+--------------+-----------+-----------+--------------+----------+--------+------------------+
| Trial name            | status     | loc   | activation   |   neuron1 |   neuron2 | optimizers   |      acc |   iter |   total time (s) |
|-----------------------+------------+-------+--------------+-----------+-----------+--------------+----------+--------+------------------|
| objective_dd8c7_00000 | TERMINATED |       | tanh         |        43 |        50 | adam         | 0.938278 |      1 |          5.3518  |
| objective_dd8c7_00001 | TERMINATED |       | relu         |        63 |        45 | adam         | 0.942333 |      1 |          5.15036 |
| objective_dd8c7_00002 | TERMINATED |       | tanh         |        63 |        52 | rmsprop      | 0.94     |      1 |          4.28608 |
+-----------------------+------------+-------+--------------+-----------+-----------+--------------+----------+--------+------------------+

3. Visualization

실행 중 결과는 지정된 경로에 누적 저장되며, TensorBoard로 확인할 수 있다.

tensorboard --logdir bohb_results/

TensorBoard에서는 각 trial의 accuracy 변화와 하이퍼파라미터 조합별 성능을 시각적으로 비교할 수 있다.

실행 코드는 깃허브에서 확인할 수 있다.