Welcome to the Cloud Quickstart guide for Simplifine’s Train Engine. This guide will help you get up and running quickly with training models in the cloud using Distributed Data Parallel (DDP).
The first step is to initialize the Client class with your API key and GPU type. This client will handle communication with the Simplifine servers.
Copy
Ask AI
from simplifine_alpha.train_utils import Client# Initialize the client with your API key and GPU typeapi_key = '' # Enter your Simplifine API key heregpu_type = 'a100' # 'l4' or 'a100'client = Client(api_key=api_key, gpu_type=gpu_type)
[2024-07-28 18:13:08,712] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)[WARNING] async_io requires the dev libaio .so object and headers but these were not found....{'train_runtime': 6.5488, 'train_samples_per_second': 73.295, 'train_steps_per_second': 9.162, 'train_loss': 0.1135852018992106, 'epoch': 1.0}
Once your model has finished training, you can download it using the download_model function.
Copy
Ask AI
import os# Create a folder to store the modelos.mkdir('sf_trained_model')# Download and save the modelclient.download_model(job_id=job_id, extract_to='/content/sf_trained_model')
Output Example
Copy
Ask AI
Directory downloaded successfully and saved to /content/sf_trained_model/5d55d46a-7793-4c06-9cef-279f03a0f953.zipModel unzipped successfully to /content/sf_trained_modelModel downloaded, unzipped, and zip file deleted successfully!
This quickstart guide covers setting up the client, training a model using DDP, monitoring jobs, downloading the trained model, and using it for inference. Let me know if you need any further adjustments!