Prerequisites

Before you begin, ensure you have the following:

  • A Simplifine API key.
  • Access to a cloud GPU (L4 or A100) supported by Simplifine.

To obtain a Simplifine API key with free credit, express interest here.

Setting Up the Client

The first step is to initialize the Client class with your API key and GPU type. This client will handle communication with the Simplifine servers.

from simplifine_alpha.train_utils import Client

# Initialize the client with your API key and GPU type
api_key = ''  # Enter your Simplifine API key here
gpu_type = 'a100'  # 'l4' or 'a100'
client = Client(api_key=api_key, gpu_type=gpu_type)

Replace '' with your actual Simplifine API key.
The gpu_type can be 'l4' or 'a100'.

Training a Model with Data Driven Parallelism (DDP)

The example below shows how to use DDP to distribute the training process across multiple GPUs.

Step 1: Define Your Training Job with DDP

You can train a model using the sft_train_cloud method and enable DDP by setting the use_ddp parameter to True.

client.sft_train_cloud(
    job_name='ddp_job',
    model_name='EleutherAI/gpt-neo-125M',
    dataset_name='my_dataset',
    data_from_hf=True,
    keys=['title', 'abstract', 'explanation'],
    data={'title': ['title 1', 'title 2'], 'abstract': ['abstract 1', 'abstract 2'], 'explanation': ['explanation 1', 'explanation 2']},
    template='### TITLE: {title}\n ### ABSTRACT: {abstract}\n ###EXPLANATION: {explanation}',
    response_template='###EXPLANATION:',
    use_zero=False,
    use_ddp=True
)

Step 2: Monitor Your Jobs

After sending the query, you can check the status of your jobs. The status will be one of the following: completed, in progress, or pending.

status = client.get_all_jobs()
for num, i in enumerate(status[-5:]):
    print(f'Job {num}: {i}')

Step 3: Retrieve Training Logs

You can retrieve the logs for any job to check detailed information about the training process.

job_id = status[-1]['job_id']
logs = client.get_train_logs(job_id)
print(logs['response'])

Step 4: Downloading the Trained Model

Once your model has finished training, you can download it using the download_model function.

import os

# Create a folder to store the model
os.mkdir('sf_trained_model')

# Download and save the model
client.download_model(job_id=job_id, extract_to='/content/sf_trained_model')

Step 5: Loading and Using the Trained Model

Finally, you can load the trained model and tokenizer to generate text.

from transformers import AutoModelForCausalLM, AutoTokenizer

path = '/content/sf_trained_model'
sf_model = AutoModelForCausalLM.from_pretrained(path)
sf_tokenizer = AutoTokenizer.from_pretrained(path)

input_example = '''### TITLE: title 1\n ### ABSTRACT: abstract 1\n ###EXPLANATION: '''

input_example = sf_tokenizer(input_example, return_tensors='pt')

output = sf_model.generate(input_example['input_ids'],
                           attention_mask=input_example['attention_mask'],
                           max_length=30,
                           eos_token_id=sf_tokenizer.eos_token_id,
                           early_stopping=True,
                           pad_token_id=sf_tokenizer.eos_token_id
)

print(sf_tokenizer.decode(output[0]))

This quickstart guide covers setting up the client, training a model using DDP, monitoring jobs, downloading the trained model, and using it for inference. Let me know if you need any further adjustments!