Training an AI model involves teaching it to recognize patterns in data for decision-making. Fine-tuning is a strategy that adapts models trained on large datasets such as OpenAI’s GPT-4, to smaller, task-specific datasets by continuing the training process.
In the following sections, we will delve deeper into the process of training a custom AI model using OpenAI fine-tuning, guiding you through each step of the fine-tuning process.
Understanding AI and Model Training
Artificial Intelligence (AI) involves developing systems capable of tasks that typically require human-like intelligence, such as learning, problem-solving, and decision-making. An AI model, at its core, is a set of algorithms that make predictions based on input data. Machine learning (ML), a subset of AI, enables machines to learn from data and improve performance autonomously.
AI models learn much like a child distinguishing between cats and dogs, observing features, making guesses, correcting errors, and retrying. This process, known as model training, involves the model processing input data, analyzing and processing patterns, and using this knowledge to make predictions. The model’s performance is evaluated by comparing its output against the expected result, and adjustments are made to enhance performance. With sufficient training, the set of algorithms within the model will represent an accurate mathematical predictor for a given situation that can handle different variations of the input data.
Training a model from scratch involves teaching a model to learn patterns in the data without any prior knowledge. This requires a large amount of data and computational resources, and the model may not perform well with limited data.
Fine-tuning, on the other hand, starts with a pre-trained model that has learned general patterns from a large dataset. The model is then further trained on a smaller, specific dataset, allowing it to apply its previously learned knowledge to the new task, often leading to better performance with less data and computational resources. Fine-tuning is particularly useful when the task-specific dataset is relatively small.
Preparing for Fine-Tuning
While fine-tuning an existing model with additional training on a curated dataset may look like an attractive option over building and training an AI model from scratch. However, the success of the fine-tuning process depends on several key factors.
Choosing the right model
When selecting a base model for fine-tuning, consider the following:
Task Alignment: It is important to define your problem scope and expected model functionality clearly. Choose models that excel in tasks similar to yours because dissimilarity between the source and target tasks during the fine-tuning process can lead to reduced performance. For instance, for text generation tasks, GPT-3 might be suitable, while for text classification tasks, BERT or RoBERTa could be better.
Model Size and Complexity: Balance performance and efficiency as needed because while larger models capture complex patterns better, they require more resources.
Evaluation Metrics: Choose evaluation metrics that are relevant to your task. For instance, accuracy may be important for classification, while BLEU or ROUGE might be beneficial for language generation tasks.
Community and Resources: Choose models with a large community and ample resources for troubleshooting and implementation. Prioritize models with clear fine-tuning guidelines for your task and seek reputable sources for pre-trained model checkpoints.
Data collection and preparation
When fine-tuning, the quality and diversity of your data can significantly impact the performance of your model. Here are some key considerations:
Types of Data Needed: The data type depends on your specific task and the data the model was pre-trained on. For NLP tasks, you typically need text data from sources like books, articles, social media posts, or speech transcripts. Use methods like web scraping, surveys, or APIs from social media platforms to gather data. For example, web scraping with AI can be particularly useful when you need a vast amount of diverse and updated data.
Data Cleaning and Annotation: Data cleaning involves removing irrelevant data, handling missing or inconsistent data, and normalizing. Annotating involves labeling the data so the model can learn from it. Utilizing automated tools such as Bright data can streamline these processes and improve efficiency.
Incorporation of a diverse and representative dataset: During model fine-tuning, a diverse and representative dataset ensures that the model learns from various perspectives, leading to more generalized and reliable predictions. For instance, if you’re fine-tuning a sentiment analysis model for movie reviews, your dataset should include reviews from a wide range of movies, genres, and sentiments, mirroring the real world’s class distribution.
Setting up the training environment
Ensure you have the necessary hardware and software for the chosen AI model and framework. For instance, Large Language Models (LLMs) often require substantial computational power, typically provided by GPUs.
Frameworks like TensorFlow or PyTorch are commonly used for AI model training. Installing relevant libraries and tools, along with any additional dependencies, is essential for seamless integration into the training workflow. For example, tools like the OpenAI API may be needed for fine-tuning specific models developed by OpenAI.
The Fine-Tuning Process
Having understood the basics of fine-tuning, let’s go through an application in natural language processing.
I’ll use the OpenAI API to fine-tune a pre-trained model. Fine-tuning is currently possible for models like gpt-3.5-turbo-0125 (recommended), gpt-3.5-turbo-1106, gpt-3.5-turbo-0613, babbage-002, davinci-002, and the experimental gpt-4-0613. GPT-4 fine-tuning is in an experimental phase and eligible users can request access in the fine-tuning UI.
1. Dataset preparation
According to a study, it has been found that GPT-3.5 lacks analytical reasoning. So let’s try to fine-tune gpt-3.5-turbo
model to boost its analytical reasoning using a dataset of analytical reasoning questions from the Law School Admission Test (AR-LSAT), released in 2022. The publicly available dataset can be found here.
The quality of a fine-tuned model depends directly on the data used for fine-tuning. Each example in the dataset should be a conversation formatted according to OpenAI’s Chat Completions API, with a list of messages where each message has a role, content, and optional name, and stored as a JSONL file.
The required conversational chat format for fine-tuning gpt-3.5-turbo
is as follows:
{"messages": [{"role": "system", "content": ""}, {"role": "user", "content": ""}, {"role": "assistant", "content": ""}]}
In this format, “messages”
is a list of messages forming a conversation between three "roles"
: system, user, and assistant
. The “content”
of the “system”
role should specify the behavior of the fine-tuned system.
Given below is a formatted example taken from the AR-LSAT dataset which we will use in this guide:
Here are the key considerations when creating the dataset:
- OpenAI requires at least 10 examples for fine-tuning and recommends using 50 to 100 training examples with
gpt-3.5-turbo
. The exact number varies based on the use case. You can also create a validation file in addition to the training file for hyperparameter adjustment. - Model fine-tuning and usage of fine-tuned models are billed on a token basis, differentiated by the base model. Detailed pricing can be found on OpenAI’s pricing page.
- Token limits depend on the selected model. For gpt-3.5-turbo-0125, the maximum context length is 16,385, limiting each training example to 16,385 tokens. Longer examples will be truncated. Token counts can be computed using the counting tokens notebook from the OpenAI cookbook.
- OpenAI provides a Python script to find potential errors and validate the formatting of your training and validation files.
2. Generating the API key and installing the OpenAI library
In order to fine-tune an OpenAI model, having an OpenAI developer account with a sufficient credit balance is mandatory.
To generate the API key and install the OpenAI library, follow these steps:
1. Sign up on the OpenAI official website.
2. To enable fine-tuning, top up your credit balance from the ‘Billing’ tab under ‘Settings’.
3. Click on the user profile icon at the top-left corner and select “API Keys” to access the key creation page.
4. Generate a new secret key by providing a name.
5. Install the Python OpenAI library for fine-tuning.
pip install openai
6. Use the os library to set the token as an environment variable and establish API communication.
import os
from openai import OpenAI
# Set the OPENAI_API_KEY environment variable
os.environ['OPENAI_API_KEY'] = 'The key generated in step 4'
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'])
3. Uploading the training and validation files
After validating your data, upload the files using the Files API for fine-tuning jobs.
training_file_id = client.files.create(
file=open(training_file_name, "rb"),
purpose="fine-tune"
)
validation_file_id = client.files.create(
file=open(validation_file_name, "rb"),
purpose="fine-tune"
)
print(f"Training File ID: {training_file_id}")
print(f"Validation File ID: {validation_file_id}")
The unique identifiers for the training and validation data are displayed upon successful execution.
4. Creating a fine-tuning job
After uploading the files, create a fine-tuning job either via the UI or programmatically.
Here’s how to initiate a fine-tuning job using the OpenAI SDK:
response = client.fine_tuning.jobs.create(
training_file=training_file_id.id,
validation_file=validation_file_id.id,
model="gpt-3.5-turbo",
hyperparameters={
"n_epochs": 10,
"batch_size": 3,
"learning_rate_multiplier": 0.3
}
)
job_id = response.id
status = response.status
print(f'Fine-tunning model with jobID: {job_id}.')
print(f"Training Response: {response}")
print(f"Training Status: {status}")
model
: the name of the model to fine-tune (gpt-3.5-turbo
,babbage-002
,davinci-002
, or an existing fine-tuned model).training_file
andvalidation_file
: the file IDs returned when the files were uploaded.n_epochs
,batch_size
, andlearning_rate_multiplier
: Hyperparameters that can be customized.
To set additional fine-tuning parameters refer to the API specification for fine-tuning.
The code above generates the following information for the jobID (`ftjob-0EVPunnseZ6Xnd0oGcnWBZA7`):
A fine-tuning job may take time to complete. It could be queued behind other jobs, and the training duration can vary from minutes to hours depending on the model and dataset size.
Once the training is complete, an email confirmation will be sent to the user who initiated the fine-tuning job.
You can monitor the status of your fine-tuning job via the fine-tuning UI:
5. Analyzing the fine-tuned model
OpenAI computes the following metrics during training:
- Training loss
- Training token accuracy
- Validation loss
- Validation token accuracy
Validation loss and validation token accuracy are calculated in two ways: on a small data batch at each step, and on the full validation set at the end of each epoch. The full validation loss and full validation token accuracy are the most accurate metrics for tracking your model’s performance and serve as a sanity check to ensure smooth training (loss should decrease, token accuracy should increase).
While a fine-tuning job is active, you can view these metrics via
1. The UI:
2. The API:
import os
from openai import OpenAI
client = OpenAI(api_key=os.environ['OPENAI_API_KEY'],)
jobid = ‘jobid you want to monitor’
print(f"Streaming events for the fine-tuning job: {jobid}")
# signal.signal(signal.SIGINT, signal_handler)
events = client.fine_tuning.jobs.list_events(fine_tuning_job_id=jobid)
try:
for event in events:
print(
f'{event.data}'
)
except Exception:
print("Stream interrupted (client disconnected).")
The above code will output the streaming events for the fine-tuning job, including the step number, training loss, validation loss, total steps, and mean token accuracy for both training and validation:
Streaming events for the fine-tuning job: ftjob-0EVPunnseZ6Xnd0oGcnWBZA7
{'step': 67, 'train_loss': 0.30375099182128906, 'valid_loss': 0.49169286092122394, 'total_steps': 67, 'train_mean_token_accuracy': 0.8333333134651184, 'valid_mean_token_accuracy': 0.8888888888888888}
6. Adjusting parameters and the dataset to improve performance
If the results from a fine-tuning job are not as good as you expected, consider the following ways to improve the performance:
1. Adjust the training dataset:
- To refine your training dataset, consider adding examples that address the model’s weaknesses and ensure the response distribution in your data matches the expected distribution.
- It’s also important to check your data for issues that the model is replicating and ensure that your examples contain all the necessary information for the response.
- Maintain consistency across data created by multiple people and standardize the format of all training examples to match what is expected at inference.
- In general, high-quality data is more effective than a larger quantity of low-quality data.
2. Adjusting the hyperparameters:
- OpenAI allows you to specify three hyperparameters; epochs, learning rate multiplier, and batch size.
- Start with default values picked by the built-in functions based on the dataset size, then adjust if needed.
- If the model doesn’t follow the training data as expected, increase the number of epochs.
- If the model becomes less diverse than expected, decrease the number of epochs by 1 or 2.
- If the model doesn’t appear to be converging, increase the learning rate multiplier.
7. Using a checkpointed model
Currently, OpenAI provides access to the checkpoints for the last three epochs of a fine-tuning job. These checkpoints are complete models that can be used for inference and further fine-tuning.
To access these checkpoints, wait for a job to succeed, then query the checkpoints endpoint with your fine-tuning job ID. Each checkpoint object will have the fine_tuned_model_checkpoint
field populated with the model checkpoint name. You can also get the checkpoint model name via the fine-tuning UI
You can validate checkpoint model results by running queries with a prompt and the model name using the openai.chat.completions.create() function:
completion = client.chat.completions.create(
model="ft:gpt-3.5-turbo-0125:personal::9PWZuZo5",
messages=[
{"role": "system", "content": "Instructions: You will be presented with a passage and a question about that passage. There are four options to be chosen from, you need to choose the only correct option to answer that question. If the first option is right, you generate the answer 'A', if the second option is right, you generate the answer 'B', if the third option is right, you generate the answer 'C', if the fourth option is right, you generate the answer 'D', if the fifth option is right, you generate the answer 'E'. Read the question and options thoroughly and select the correct answer from the four answer labels. Read the passage thoroughly to ensure you know what the passage entails"},
{"role": "user", "content": "Passage: For the school paper, five students\u2014Jiang, Kramer, Lopez, Megregian, and O'Neill\u2014each review one or more of exactly three plays: Sunset, Tamerlane, and Undulation, but do not review any other plays. The following conditions must apply: Kramer and Lopez each review fewer of the plays than Megregian. Neither Lopez nor Megregian reviews any play Jiang reviews. Kramer and O'Neill both review Tamerlane. Exactly two of the students review exactly the same play or plays as each other.Question: Which one of the following could be an accurate and complete list of the students who review only Sunset?\nA. Lopez\nB. O'Neill\nC. Jiang, Lopez\nD. Kramer, O'Neill\nE. Lopez, Megregian\nAnswer:"}
]
)
print(completion.choices[0].message)
The result retrieved from the answer dictionary is:
You can also compare the fine-tuned model with other models in the OpenAI’s playground as shown below:
Tips and Best Practices
For successful fine-tuning, consider these tips:
Data quality: Ensure your task-specific data is clean, diverse, and representative to avoid overfitting, where the model performs well on training data but poorly on unseen data.
Hyperparameter selection: Choose appropriate hyperparameters to avoid slow convergence or suboptimal performance. This can be complex and time-consuming but is crucial for effective training.
Resource management: Be aware that fine-tuning large models requires substantial computational resources and time.
Avoiding Pitfalls
Overfitting and underfitting: Balance your model’s complexity and the amount of training to avoid overfitting (high variance) and underfitting (high bias).
Catastrophic forgetting: During fine-tuning, the model may forget previously learned general knowledge. Regularly evaluate your model’s performance on a variety of tasks to mitigate this.
Domain shift sensitivity: If your fine-tuning data differs significantly from the pre-training data, you may encounter domain shift issues. Use domain adaptation techniques to bridge this gap.
Saving and Reusing Models
After training, save your model’s state to reuse it later. This includes the model parameters and any state of the optimizer that was used. This allows you to resume training later from the same state.
Ethical Considerations
Bias amplification: Pre-trained models can inherit biases, which may be amplified during fine-tuning. Always try to opt for pre-trained models tested for bias and fairness if unbiased predictions are required.
Unintended outputs: Fine-tuned models may generate plausible but incorrect outputs. Implement robust post-processing and validation mechanisms to handle this.
Model drift: A model’s performance can deteriorate over time due to changes in the environment or data distribution. Monitor your model’s performance regularly and re-fine-tune as necessary.
Advanced Techniques and Further Learning
Advanced techniques in fine-tuning LLMs include Low Ranking Adaptation (LoRA) and Quantized LoRA (QLoRA), which reduce computational and financial costs while maintaining performance. Parameter Efficient Fine Tuning (PEFT) adapts models efficiently with minimal trainable parameters. DeepSpeed and ZeRO optimize memory usage for large-scale training. These techniques address challenges like overfitting, catastrophic forgetting, and domain shift sensitivity, enhancing the efficiency and effectiveness of LLM fine-tuning.
Beyond fine-tuning, there are other advanced training techniques such as transfer learning and reinforcement learning. Transfer learning involves applying knowledge learned from one problem to another related problem, while reinforcement learning is a type of machine learning where an agent learns to make decisions by taking actions in an environment to maximize a reward.
For those interested in diving deeper into AI model training the below resources might be helpful:
- Attention is all you need by Ashish Vaswani et al.
- The book “Deep Learning” by Ian Goodfellow, Yoshua Bengio, and Aaron Courville
- The book “Speech and Language Processing” by Daniel Jurafsky and James H. Martin
- Different ways of training LLMs
- Mastering LLM Techniques: Training
- NLP course by Hugging Face
Conclusion
Training an AI model is a process that requires a significant amount of high-quality data. While defining the problem, selecting a model, and refining it through iterations are essential, the true differentiator is the quality and volume of data used. Instead of building and maintaining web scrapers, you can simplify data collection by using pre-built or custom datasets available on Bright Data’s platform.
With the Dataset Marketplace, you can access validated, ready-made datasets from popular websites, or you can generate custom datasets to meet your specific needs using the automated platform. This way, you can focus on training your models efficiently with accurate and compliant data, enabling faster, more reliable results across various industries.
Explore Bright Data’s dataset solutions, and easily integrate them into your workflow for seamless data collection.
Sign up now and start your free trial of Bright Data’s scraping infrastructure, including free dataset samples.
No credit card required