The name or path of the pre-trained model to use.
The name of the dataset to be used for training. Defaults to an empty string.
The maximum length of the input sequences. Defaults to
128
.A list of strings to use as data if
from_hf
is False
. Defaults to an empty list.The number of training epochs. Defaults to
3
.The batch size for training. Defaults to
8
.A flag to enable 16-bit floating-point (FP16) training. Defaults to
False
.A flag to enable 16-bit Brain Floating Point (BF16) training. Defaults to
False
.The learning rate for optimization. Defaults to
5e-5
.A flag to determine whether to load the dataset from Hugging Face. Defaults to
True
.A flag to determine whether to split the dataset into training and validation sets. Defaults to
True
.The ratio of the dataset to be used for validation. Defaults to
0.2
.The number of steps for gradient accumulation. Defaults to
4
.A flag to enable gradient checkpointing for reducing memory usage. Defaults to
False
.The service to report training logs to (e.g.,
wandb
). Defaults to 'none'
.The API key for Weights and Biases (WandB) logging. Defaults to an empty string.
The configuration for Weights and Biases (WandB) logging. Defaults to
None
.A flag to enable Parameter-Efficient Fine-Tuning (PEFT). Defaults to
False
.The configuration object for PEFT. Defaults to
None
.The Hugging Face token required for accessing private datasets or models. Defaults to an empty string.
The name of the column in the dataset to use for training. Defaults to
'text'
.The type of learning rate scheduler to use. Defaults to
'linear'
.The number of steps for evaluation accumulation. Defaults to
8
.The directory to save the output model and logs. Defaults to
'clm_output'
.A flag to enable Distributed Data Parallel (DDP) training. Defaults to
False
.A flag to enable ZeRO (Zero Redundancy Optimizer) for memory optimization. Defaults to
True
.