Configuration
In this section, we will be going through the two main ways to interact with the realm-tune
tool.
Configuration file
Below is an example config file that contains all possible configurations, alongside comments that describe the fields.
realm_ai:
behavior_name: 3DBallHard # optional
algorithm: bayes # or random, optional, default: bayes
total_trials: 3 # total number of trials (inclusive of warmup_trials), optional, default: 5
warmup_trials: 5 # optional for bayes algorithm, number of "warmup" trials where random hyperparams are used. Ignored for other algorithms. Default val is 5
eval_window_size: 1 # optional, training run is evaluated by taking the average eps rew of past x episodes. Default val is 1
env_path: ../../../Unity/envs/my_game/env # mandotary! Either specify here, or under env_path settings below. Setting here takes precedence.
output_path: ../../custom_output_path # optional, specify to manually specify folder name, or to continue running
full_run_after_tuning: # optional, if specified, config in "best_trial" folder will be changed to reflect the following settings(i.e., max_steps). Leave it out if no training is to be done after tuning.
max_steps: 20000 # number of steps to run for the full training
wandb: # optional, if specified, log training metrics to wandb. Leave it out if wandb is not to be used.
project: realm_tune
entity: <username>
offline: false
mlagents: # Fully compatible with all single-agent mlagents config. Must use default_settings for hyperparameters!
env_settings:
# env_path: ../../../../../Unity/envs/3dball/3dball # precedence given to env_path above
env_args: null
num_envs: 1
seed: 0
engine_settings:
no_graphics: true
# checkpoint_settings:
# run_id: anything # does not matter, will be generated automatically
# force: false # does not matter, overwritten as True through cli argument
torch_settings:
device: cpu
default_settings:
trainer_type: ppo
hyperparameters:
batch_size: [64, 128, 256] # Means categorical
buffer_size: log_unif(2000, 12000) # Automatic detection as int
learning_rate: log_unif(0.0003, 0.01) # Automatic detection as float
beta: log_unif(0.001, 0.01) # unif and log_unif exclude upper bound - [0.001, 0.01)
num_epoch: unif(1, 15)
reward_signals:
extrinsic:
gamma: [0.99, 0.95]
max_steps: 10000
time_horizon: 1000
summary_freq: 5000
Cli arguments
The purpose of having cli arguments in conjunction of the configuration file is mainly for convenience, especially when realm-tune
is used programmatically (i.e., realm-tune
is called by another program). In such cases, it can be a hassle to parse the config file, edit the values, save the config file, and then point realm-tune
to it.
One important thing to note is that arguments passed through the cli args always takes precedence over those through the config files. In other words, arguments passed through the cli always overrides those in the config file. This enables a cool use case, which is to allow a user to have a fixed config file that they are comfortable with, and override them through the cli everytime they have a different game etc.
To get the list of available cli arguments, do realm-tune --help
, the output of which should look something like:
usage: realm-tune [-h] [--config-path CONFIG_PATH] [--output-path OUTPUT_PATH]
[--behavior-name BEHAVIOR_NAME] [--algorithm {bayes,random,grid}]
[--total-trials TOTAL_TRIALS] [--warmup-trials WARMUP_TRIALS]
[--eval-window-size EVAL_WINDOW_SIZE] [--env-path ENV_PATH] [--use-wandb]
[--wandb-project WANDB_PROJECT] [--wandb-entity WANDB_ENTITY] [--wandb-offline]
[--wandb-group WANDB_GROUP] [--wandb-jobtype WANDB_JOBTYPE] [--full-run]
[--full-run-max-steps FULL_RUN_MAX_STEPS]
Realm_AI hyperparameter optimization tool
optional arguments:
-h, --help show this help message and exit
--config-path CONFIG_PATH
--output-path OUTPUT_PATH
Specify path where data is stored
--behavior-name BEHAVIOR_NAME
Name of behaviour. This can be found under the agent's "Behavior Parameters"
component in the inspector of Unity
--algorithm {bayes,random,grid}
Algorithm for hyperparameter tuning
--total-trials TOTAL_TRIALS
Number of trials
--warmup-trials WARMUP_TRIALS
Number of warmup trials (only works for bayes algorithm)
--eval-window-size EVAL_WINDOW_SIZE
Training run is evaluated by taking the average eps rew of past x episodes
--env-path ENV_PATH Path to environment. If specified, overrides env_path in the config file
Weights and Biases Configuration:
--use-wandb
--wandb-project WANDB_PROJECT
--wandb-entity WANDB_ENTITY
--wandb-offline
--wandb-group WANDB_GROUP
--wandb-jobtype WANDB_JOBTYPE
Full run configuration:
--full-run
--full-run-max-steps FULL_RUN_MAX_STEPS
All arguments should look pretty familiar as they are mostly identical to those from the config file. However, the only two noteworthy arguments to point out are:
-
--use-wandb
exists because it allows the user to use Weights and Biases without requiring passing in any other information, such as the entity name. -
--full-run
exists for the exact same reason. It allows the user to do an automated full training run after hyperparameter tuning, without the need to configure the number of steps (the default value will be used instead).