PL and hydra

Project link

Project description

This is my training pipeline for training neural nets in PyTorch. It is based on PyTorch-lightning as a wrapper over PyTorch code and hydra for managing configuration files. The main concepts of the pipeline are:

  • the pipeline consists of several flexible modules, which should be easily replaceable
  • all used parameters and modules are defined in configs, which are prepared beforehand
  • thanks to the load_obj function, it is possible to define classes in configuration files and to load them without writing explicit imports
  • parameters and modules can be changed through the command line commands using hydra
  • the pipeline can be used for different deep learning tasks without changing the main code

Currently supported tasks: image classification, named entity recognition.

Hydra configuration

  • the folder with hydra configuration files has separate folders for each logical part of the pipeline
  • each folder has one or several small configuration files. For example, the optimizer folder has configuration files describing different optimizers
  • main configuration file - config.yaml has the definition of default configuration files in each folder as well as general configuration for the project and hydra itself
  • next, you can see an example of a configuration file for a scheduler
    # @package _group_
    class_name: torch.optim.lr_scheduler.ReduceLROnPlateau
    step: epoch
    monitor: ${training.metric}
    mode: ${training.mode}
    factor: 0.1
    patience: 1

    The first line is technical information for Hydra class_name - the full path to the class for import. function load_obj is used in the code to instantiate classes. ${training.metric} - this syntax means interpolation; this means that the value is taken from another configuration file.

  • it is even possible to define augmentations using configuration files.

Running the pipeline

>>> python
>>> python optimizer=sgd
>>> python model=efficientnet_model
>>> python model.encoder.params.arch=resnet34
>>> python datamodule.fold_n=0,1,2 -m

The basic way to run the training pipeline is to run the script. If we want to change some parameters, we can do it through the command line. It works even if the parameter is deeply nested. It is also possible to run multiple experiments using -m (multirun). In this case, all combinations of the passed parameters’ values will be used to run the script one after another.


Logging example

Logging in this pipeline is performed using two approaches.

It is possible to log the experiment into a predefined experiment tracking tool - comet_ml, W&B, and so on.

Also, logging can be done locally:

  • a folder with the timestamp is created. Optionally, the name of the folder can show overridden values, but this could cause errors when the path of the folder becomec too long for the OS to process correctly
  • hydra folder is created automatically and has the original configuration file, the overridden, and the resulting configuration file
  • code folder can copy the code of the project, which could be useful in cases when you change the code rapidly without committing the changes
  • logs folder has csv file with logs
  • saved_models has PyTorch-lightning checkpoints and some other possible artifacts
  • version_0 has tensorboard logs if it is enabled