A training pipeline for neural nets in PyTorch

Project description

This is my training pipeline for training neural nets in PyTorch. It is based on PyTorch-lightning as a wrapper over PyTorch code and hydra for managing configuration files. The main concepts of the pipeline are:

the pipeline consists of several flexible modules, which should be easily replaceable
all used parameters and modules are defined in configs, which are prepared beforehand
thanks to the load_obj function, it is possible to define classes in configuration files and to load them without writing explicit imports
parameters and modules can be changed through the command line commands using hydra
the pipeline can be used for different deep learning tasks without changing the main code

Currently supported tasks: image classification, named entity recognition.

Hydra configuration

the folder with hydra configuration files has separate folders for each logical part of the pipeline
each folder has one or several small configuration files. For example, the optimizer folder has configuration files describing different optimizers
main configuration file - config.yaml has the definition of default configuration files in each folder as well as general configuration for the project and hydra itself
next, you can see an example of a configuration file for a scheduler
```
# @package _group_
class_name: torch.optim.lr_scheduler.ReduceLROnPlateau
step: epoch
monitor: ${training.metric}
params:
mode: ${training.mode}
factor: 0.1
patience: 1
```
The first line is technical information for Hydra class_name - the full path to the class for import. function load_obj is used in the code to instantiate classes. ${training.metric} - this syntax means interpolation; this means that the value is taken from another configuration file.
it is even possible to define augmentations using configuration files.

Running the pipeline

>>> python train.py
>>> python train.py optimizer=sgd
>>> python train.py model=efficientnet_model
>>> python train.py model.encoder.params.arch=resnet34
>>> python train.py datamodule.fold_n=0,1,2 -m

The basic way to run the training pipeline is to run the train.py script. If we want to change some parameters, we can do it through the command line. It works even if the parameter is deeply nested. It is also possible to run multiple experiments using -m (multirun). In this case, all combinations of the passed parameters’ values will be used to run the script one after another.

Logging

Logging in this pipeline is performed using two approaches.

It is possible to log the experiment into a predefined experiment tracking tool - comet_ml, W&B, and so on.

Also, logging can be done locally:

a folder with the timestamp is created. Optionally, the name of the folder can show overridden values, but this could cause errors when the path of the folder becomec too long for the OS to process correctly
hydra folder is created automatically and has the original configuration file, the overridden, and the resulting configuration file
code folder can copy the code of the project, which could be useful in cases when you change the code rapidly without committing the changes
logs folder has csv file with logs
saved_models has PyTorch-lightning checkpoints and some other possible artifacts
version_0 has tensorboard logs if it is enabled