In this guide on fine-tuning Llama 4 with web data, you will learn:
- What fine-tuning is
- How to retrieve the fine-tuning-ready datasets using some scraping APIs
- How to set up the cloud infrastructure for the fine-tuning process
- How to fine-tune Llama 4 with a step-by-step tutorial
Let’s dive in!
What Is Fine-tuning?
Fine-tuning—also known as supervised fine-tuning (SFT)—is a process used to improve specific knowledge or ability in a pre-trained LLM. In the context of LLMs, pre-training refers to training an AI model from scratch.
SFT is used because a model mimics its training data. However, as of today, LLMs are mainly generalistic models. This means that if you want a model to learn specific knowledge, you have to fine-tune it.
If you want to learn more about SFT, read our guide on supervised fine-tuning in LLMs.
Scraping the Data to Fine-Tune LLama 4
To fine-tune an LLM, you first need a fine-tuning dataset. This section walks you through how to retrieve data from a website using Bright Data’s Web Scraper APIs—dedicated endpoints for 100+ domains that scrape fresh data for you and retrieve it in the desired format.
The target webpage is going to be the Amazon best-sellers office products page:
Follow the steps below to retrieve the fine-tuning data!
Requirements
To use the code to retrieve the data from Amazon, you need:
- Python 3.10+ installed on your machine.
- A valid Bright Data Scraper API key.
Follow the Bright Data documentation to retrieve your API key.
Project Structure and Dependencies
Suppose you call the main folder of your project amazon_scraper/
. At the end of this step, the folder will have the following structure:
amazon_scraper/
├── scraper.py
└── venv/
Where:
scraper.py
is the Python file that contains the coding logic.venv/
contains the virtual environment.
You can create the venv/
virtual environment directory like so:
python -m venv venv
To activate it, on Windows, run:
venvScriptsactivate
Equivalently, on macOS and Linux, execute:
source venv/bin/activate
In the activated virtual environment, install the dependencies with:
pip install requests
Where requests
is a library for making HTTP web requests.
Great! You are now ready to get the data of interest using the Scraper APIs by Bright Data.
Step #1: Define the Scraping Logic
The following snippet defines the whole scraping logic:
import requests
import json
import time
def trigger_amazon_products_scraping(api_key, urls):
# Endpoint to trigger the Web Scraper API task
url = "https://api.brightdata.com/datasets/v3/trigger"
params = {
"dataset_id": "gd_l7q7dkf244hwjntr0",
"include_errors": "true",
"type": "discover_new",
"discover_by": "best_sellers_url",
}
# Convert the input data in the desired format to call the API
data = [{"category_url": url} for url in urls]
headers = {
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json",
}
response = requests.post(url, headers=headers, params=params, json=data)
if response.status_code == 200:
snapshot_id = response.json()["snapshot_id"]
print(f"Request successful! Response: {snapshot_id}")
return response.json()["snapshot_id"]
else:
print(f"Request failed! Error: {response.status_code}")
print(response.text)
def poll_and_retrieve_snapshot(api_key, snapshot_id, output_file, polling_timeout=20):
snapshot_url = f"https://api.brightdata.com/datasets/v3/snapshot/{snapshot_id}?format=json"
headers = {
"Authorization": f"Bearer {api_key}"
}
print(f"Polling snapshot for ID: {snapshot_id}...")
while True:
response = requests.get(snapshot_url, headers=headers)
if response.status_code == 200:
print("Snapshot is ready. Downloading...")
snapshot_data = response.json()
# Write the snapshot to an output json file
with open(output_file, "w", encoding="utf-8") as file:
json.dump(snapshot_data, file, indent=4)
print(f"Snapshot saved to {output_file}")
return
elif response.status_code == 202:
print(F"Snapshot is not ready yet. Retrying in {polling_timeout} seconds...")
time.sleep(polling_timeout)
else:
print(f"Request failed! Error: {response.status_code}")
print(response.text)
break
if __name__ == "__main__":
BRIGHT_DATA_API_KEY = "<your api key>" # Replace it with your Bright Data's Web Scraper API key or read it from the envs
# URLs of best-selling products to retrieve data from
urls = [
"https://www.amazon.com/gp/bestsellers/office-products/ref=pd_zg_ts_office-products"
]
snapshot_id = trigger_amazon_products_scraping(BRIGHT_DATA_API_KEY, urls)
poll_and_retrieve_snapshot(BRIGHT_DATA_API_KEY, snapshot_id, "amazon-data.json")
This code:
- Creates the
trigger_amazon_products_scraping()
function that initiates the web scraping task by:- Defining the Scraper API endpoint to trigger.
- Setting up the parameters for the scraping activity.
- Formatting the input
urls
into a JSON structure that the API expects. - Sending a
POST
request to the Bright Data Scraper API with the specified endpoint, headers, parameters, and data. - Managing the response status.
- Creates a
poll_and_retrieve_snapshot()
function that checks the status of the scraping task (identified bysnapshot_id
) and retrieves the data once it is ready.
Note that the scraping API was called using only one URL. Thus, the above code retrieves the data only from one target Amazon page. This is sufficient for the scope of this tutorial, but you can add as many Amazon URLs as you prefer in the list.
Consider that the more URLs you add, the more the dataset will increase in size. A bigger dataset—if curated well—means better fine-tuning. On the other hand, the bigger the dataset, the longer the computational time needed.
Perfect! Your scraping logic is well-defined, and you are now ready to run the script.
Step #2: Run the Script
To scrape the target webpage, run the script with:
python scraper.py
You will get a result as follows:
Request successful! Response: s_m9in0ojm4tu1v8h78
Polling snapshot for ID: s_m9in0ojm4tu1v8h78...
Snapshot is not ready yet. Retrying in 20 seconds...
# ...
Snapshot is not ready yet. Retrying in 20 seconds...
Snapshot is ready. Downloading...
Snapshot saved to amazon-data.json
At the end of the process, the project folder will contain:
amazon_scraper/
├── scraper.py
├── amazon-data.json # <-- Note the fine-tuning dataset
└── venv/
The process has automatically created the amazon-data.json
that contains the scraped data. Below is the expected structure of the JSON file:
[
{
"title": "Amazon Basics Multipurpose Copy Printer Paper, 8.5 x 11 inches, 20 lb, 1 Ream, 500 Sheets, 92 Bright, White",
"seller_name": "Amazon.com",
"brand": "Amazon Basics",
"description": "Product Description Amazon Basics Multipurpose Copy Printer Paper, 8.5 x 11 Inch 20Lb Paper - 1 Ream (500 Sheets), 92 GE Bright White From the Manufacturer AmazonBasics",
"initial_price": 6.65,
"currency": "USD",
"availability": "In Stock",
"reviews_count": 190989,
"categories": [
"Office Products",
"Office & School Supplies",
"Paper",
"Copy & Printing Paper",
"Copy & Multipurpose Paper"
],
...
// omitted for brevity...
}
Very well! You have successfully scraped data from Amazon and saved it into a JSON file. This JSON file is the fine-tuning dataset you will use later in the fine-tuning process.
Setting Up Hugging Face to Use Llama 4
The model you will use is Llama-4-Scout-17B-16E-Instruct
from Hugging Face.
If you have never used Hugging Face before, when you click on the link for the first time you will be asked to create an account:
After creating the account, if you have never used any Llama 4 model, you need to compile the agreement form. Click on “Expand to review and access” to read and compile the form:
After filling in the form, your request will be reviewed:
Check the status of your request in the “Gated Repositories” section:
Once your request is accepted, you can create a new token. Go to the “Access Tokens” and create a token with write
permissions. Then, copy and save it somewhere safe to use it later:
Hooray! You have completed all the necessary steps to use a Llama 4 model with Hugging Face.
Setting Up the Cloud Infrastructure to Fine-tune Llama 4
The Llama 4 models are very big—and their name helps you understand how big they are. For example, Llama-4-Scout-17B-16E-Instruct
means it has 17 billion parameters with 128 experts.
The fine-tuning process requires you to train the model using the fine-tuning dataset you retrieved before. Since the model has 17 billion parameters, you need a lot of hardware to do so. Specifically, you need more than one GPU. For this reason, you will use a cloud service to carry out the fine-tuning process.
For this tutorial, you will use RunPod as a cloud service. Go to “RunPod” and create an account. Then, go to the “Billing” menu and add $25 using the credit card:
Note: You will pay immediately 25$ and RunPod will add the equivalent of 25$ in credits to your account. You will consume credits hourly, depending on how many hours your pod will be up when deployed. So, deploy it only when you are sure you can use it. Otherwise, you will consume credits without actually using them. The actual hourly consumption depends on the type and number of GPUs you will choose in the next steps.
Navigate to the “Pods” menu to begin configuring your pod. The pod serves as a virtual server that provides you with the necessary CPUs, GPUs, memory, and storage for your tasks. Click the “Deploy” button:
You can choose among different configurations:
Select the “H200 SXM GPU” option. Give the pod a name and select the number of GPUs. 3 GPUs are fine for this tutorial:
Select “Start a Jupyter Notebook” and click on “Deploy on Demand.” Now, go to the “Pods” section and edit your Pod:
Change the “Contained Disk” and “Volume Disk” values as below, then save:
When the setup is complete, click on the “Connect” button:
This allows you to connect the Pod to a Jupiter Lab notebook:
Select the Notebook with the “Python 3 (ipykernel)” card:
Very well! You now have the right infrastructure to train the Llama 4 model.
Fine-Tuning Llama 4 With the Scraped Data
Before starting fine-tuning your model, upload the amazon-data.json
file to your Jupyter Lab notebook. To do so, click on the “Upload files” button:
The objective of the fine-tuning for this tutorial is to train Llama 4 using the amazon-data.json
dataset. This way you teach Llama 4 how to create descriptions for office objects given some characteristics like the name of the object and some features.
You are now ready to start training the model. Follow the steps below to fine-tune Llama 4 with fresh web data!
Step #1: Install the Libraries
In the first cell of your notebook, install the needed libraries:
%%capture
!pip install transformers==4.51.0
%pip install -U datasets
%pip install -U accelerate
%pip install -U peft
%pip install -U trl
%pip install -U bitsandbytes
%pip install huggingface_hub[hf_xet]
Those libraries are:
transformers
: Provides thousands of pre-trained models.datasets
: Offers access to a vast collection of datasets and efficient data processing tools.accelerate
: Simplifies running PyTorch training scripts across various distributed configurations with minimal code changes.peft
: Enables fine-tuning large pre-trained models more efficiently by only updating a small subset of parameters.trl
: Designed for training transformer language models using reinforcement learning techniques.scipy
: A library for scientific and technical computing in Python.huggingface_hub
: Provides a Python interface to interact with the Hugging Face Hub. This allows you to download and upload models, datasets, and Spaces.bitsandbytes
: Offers easy-to-use 8-bit optimizers and quantization functions, reducing the memory footprint for training and inference of large deep learning models.
Perfect! You have installed the needed libraries for the fine-tuning process.
Step #2: Connect to Hugging Face
In the second cell of your notebook, write:
from huggingface_hub import notebook_login, login
# Interactive login
notebook_login()
print("Login cell executed. If successful, you can proceed.")
When you run it, it will display the following:
In the “Token” box, paste the token you have created on your Hugging Face account.
Awesome! You are now able to retrieve the Llama 4 model from Hugging Face.
Step #3: Load the Llama 4 Model
In the third cell of your notebook, write the following code:
import os
import torch
import json
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline, Llama4ForConditionalGeneration, BitsAndBytesConfig
from trl import SFTTrainer
# Load model
base_model_name = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
# Configuration for BitsAndBytes quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=False,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
)
# Load the Llama4 model with specified configurations
model = Llama4ForConditionalGeneration.from_pre-trained(
base_model_name,
device_map="auto",
torch_dtype=torch.bfloat16,
quantization_config=bnb_config,
trust_remote_code=True,
)
# Disable caching for the model
model.config.use_cache = False
# Set pre-training tensor parallelism to 1
model.config.pre-training_tp = 1
# Path to fine-tuning JSON data file.
fine_tuning_data_file_path = "amazon-data.json"
# Path to results
output_model_dir = "results_llama_office_items_finetuned/"
final_model_adapter_path = os.path.join(output_model_dir, "final_adapter")
max_seq_length_for_tokenization = 1024
# Create output directory
os.makedirs(output_model_dir)
The above snippet:
- Defines the name of the model to load with
base_model_name
. - Configure the model’s weights with
bnb_config
using theBitsAndBytesConfig()
method. - Loads the model with the method
from_pre-trained()
for training it. - Loads the fine-tuning dataset with
fine_tuning_data_file_path
. - Defines the output directory path for the results and creates it with the method
makedirs()
.
When the cell finishes running, you should see a result like this:
Fantastic! Your Llama 4 model is set up and loaded into the notebook.
Step #4: Prepare the Fine-Tuning Dataset for the Training Process
Write the following code in the fourth cell of your notebook to prepare the fine-tuning dataset for the training process:
from datasets import Dataset
# Open fine-tuning dataset
with open(fine_tuning_data_file_path, "r") as f:
data_list = json.load(f)
# Convert the list of data items into a Hugging Face Dataset object
raw_fine_tuning_dataset = Dataset.from_list(data_list)
print(f"Converted JSON data to Hugging Face Dataset. Num examples: {len(raw_fine_tuning_dataset)}")
def format_fine_tuning_entry(data_item):
system_message = "You are an expert copywriter. Generate a concise and appealing product description based on the provided details."
# ADJUST THE FOLLOWING LINES to your fine-tuning file
item_title = data_item.get("title")
item_brand = data_item.get("brand")
item_category = data_item.get("categories")
item_name = data_item.get("name")
item_features_list = data_item.get("features")
item_features_str = ", ".join(item_features_list) if isinstance(item_features_list, list) else str(item_features_list)
target_description = data_item.get("description")
# Training prompt
user_prompt = (
f"Generate a product description for the following item:n"
f"Title: {item_title}nBrand: {item_brand}nCategory: {item_category}n"
f"Name: {item_name}nFeatures: {item_features_str}nDescription:"
)
# Llama chat format
formatted_string = (
f"<|start_header_id|>system<|end_header_id|>nn{system_message}<|eot_id|>"
f"<|start_header_id|>user<|end_header_id|>nn{user_prompt}<|eot_id|>"
f"<|start_header_id|>assistant<|end_header_id|>nn{target_description}<|eot_id|>"
)
return {"text": formatted_string}
# Apply the formatting function to each entry in the raw dataset to structure it for fine-tuning
text_formatted_dataset = raw_fine_tuning_dataset.map(format_fine_tuning_entry)
# Tokenizer Setup
tokenizer = AutoTokenizer.from_pre-trained(base_model_name, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
# Pre-tokenize the dataset
def tokenize_function_for_sft(examples):
# Tokenize the "text" field which contains the full chat-formatted string
tokenized_output = tokenizer(
examples["text"],
truncation=True,
padding="max_length",
max_length=max_seq_length_for_tokenization,
)
return tokenized_output
# Apply the tokenization function to the formatted dataset
tokenized_train_dataset = text_formatted_dataset.map(
tokenize_function_for_sft,
batched=True,
remove_columns=["text"]
)
This cell of the notebook:
- Opens the fine-tuning dataset and converts it into a Hugging Face
Dataset
object using the methodDataset.from_list()
. - Defines a
format_fine_tuning_entry()
function. Its purpose is to take a single data item (a product’s details) and transform it into a structured text format suitable for instruction fine-tuning a chat model like Llama. Note that this must be tailored to the structure of your fine-tuning dataset. - Tokenizes the dataset and applies the tokenization with the method
map()
. This is done because Language models do not understand raw text. They operate on numerical representations called tokens.
When the cell finishes running, the expected result is as follows:
Note that the value of “Num examples” depends on your fine-tuning dataset.
Incredible! Your fine-tuning dataset is ready for the fine-tuning process.
Step #5: Configure the Environment and Parameters for Parameter-Efficient Fine-Tuning (PEFT)
In a new cell of your notebook, write the following code for setting the environment and parameters for PEFT:
from transformers import BitsAndBytesConfig
from peft import LoraConfig
# QLoRA configuration
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16,
bnb_4bit_use_double_quant=True,
)
# LoRA configuration
lora_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
)
This code:
- Defines the QLoRA configuration for quantization with the method
BitsAndBytesConfig()
to specify how a pre-trained language model should be quantized when loaded. Quantization is a technique to reduce computational and memory costs. - Defines LoRA configuration to set up the model for parameter-efficient fine-tuning with the method LoraConfig().
Very well! The environment is ready for efficient fine-tuning.
Step #6: Initialize the Training Process
In a new cell, write the following code to initialize the training process:
from peft import get_peft_model, prepare_model_for_kbit_training
from transformers import TrainingArguments
# Prepare model for k-bit training
model = prepare_model_for_kbit_training(
model,
gradient_checkpointing_kwargs={"use_reentrant": False}
)
# Apply the PEFT (LoRA) configuration to the model.
model = get_peft_model(model, lora_config)
# Disable caching in the model's configuration.
model.config.use_cache = False
# Print the number of trainable parameters in the model.
model.print_trainable_parameters()
# Define Training Arguments
training_args = TrainingArguments(
output_dir=output_model_dir,
num_train_epochs=3,
per_device_train_batch_size=1,
gradient_accumulation_steps=4,
learning_rate=2e-4,
logging_steps=25,
save_steps=50,
fp16=True,
optim="paged_adamw_8bit",
lr_scheduler_type="cosine",
warmup_ratio=0.03,
report_to="none",
max_grad_norm=0.3,
save_total_limit=2,
)
# Initialize SFTTrainer
trainer = SFTTrainer(
model=model,
args=training_args,
train_dataset=tokenized_train_dataset,
peft_config=lora_config,
)
The code in this cell:
- The
prepare_model_for_kbit_training()
method readies the pre-loadedmodel
for training with quantization. - The
get_peft_model()
method takes the quantized and prepared basemodel
and applies thelora_config
. - Defines the training arguments by calling the
TrainingArguments()
class. - Initializes the trainer with
SFTTrainer()
.
Below is the expected result:
Step #7: Train the Model
The process is finally ready to train the Llama 4 model using the train()
method:
# Train the model
trainer.train()
# Save Fine-tuned Model
trainer.save_model(final_model_adapter_path) # Saves the LoRA adapter
tokenizer.save_pre-trained(final_model_adapter_path) # Save tokenizer with the adapter
The result is as follows:
Note that you may obtain different numbers due to the stochastic nature of AI.
Step #8: Prepare the Model for Inference
To prepare the model for inference, write the following code in a new cell:
# Load the model with quantization for inference
base_model_for_inference = AutoModelForCausalLM.from_pre-trained(
base_model_name,
quantization_config=bnb_config,
device_map="auto",
trust_remote_code=True
)
# Load the fine-tuned LoRA adapter and attach it to the model
fine_tuned_model_for_testing = PeftModel.from_pre-trained(
base_model_for_inference,
final_model_adapter_path
)
# Merge LoRA adapter into the base model
fine_tuned_model_for_testing = fine_tuned_model_for_testing.merge_and_unload()
# Load the tokenizer
fine_tuned_tokenizer_for_testing = AutoTokenizer.from_pre-trained(
final_model_adapter_path,
trust_remote_code=True
)
# Configure the tokenizer for inference
fine_tuned_tokenizer_for_testing.pad_token = fine_tuned_tokenizer_for_testing.eos_token
fine_tuned_tokenizer_for_testing.padding_side = "left"
# Set the fine-tuned model to evaluation mode
fine_tuned_model_for_testing.eval()
The code in this cell:
- Loads the model with the method
from_pre-trained()
for inferencing it. - Loads, applies, and merges LoRA adapter to the base model for inference.
- Loads the fine-tuned tokenizer and configures it for inference.
- Sets the model to evaluation mode with the method
eval()
. This disables training-specific behaviors, ensuring consistent and deterministic outputs during inference.
Here we go! Everything is set up for inference.
Step #9: Inference the Model
In this last step, you will perform the inference. Previously, you have trained Llama 4 on Amazon-scraped products. Now, given some data that include the name and the features of office-like items, you want to see if the model is able to generate its description.
The following code allows you to manage the inference process:
# Define a list of synthetic product data items for testing the fine-tuned model
synthetic_test_items = [
{
"title": "Executive Ergonomic Office Chair", "brand": "ComfortLuxe", "category": "Office Chairs", "name": "ErgoPro-EL100",
"features": ["High-back design", "Adjustable lumbar support", "Breathable mesh fabric", "Synchronized tilt mechanism", "Padded armrests", "Heavy-duty nylon base"]
},
{
"title": "Adjustable Standing Desk Converter", "brand": "FlexiDesk", "category": "Desks & Workstations", "name": "HeightRise-FD20",
"features": ["Spacious dual-tier surface", "Smooth gas spring lift", "Adjustable height range 6-17 inches", "Supports up to 35 lbs", "Keyboard tray included", "Non-slip rubber feet"]
},
{
"title": "Wireless Keyboard and Mouse Combo", "brand": "TechGear", "category": "Computer Peripherals", "name": "SilentType-KM850",
"features": ["Full-size keyboard with numeric keypad", "Quiet-click keys", "Ergonomic mouse with adjustable DPI", "2.4GHz wireless connectivity", "Long battery life", "Plug-and-play USB receiver"]
},
{
"title": "Desktop Organizer with Drawers", "brand": "NeatOffice", "category": "Desk Accessories", "name": "SpaceSaver-DO3",
"features": ["Multi-compartment design", "Two pull-out drawers", "Durable wooden construction", "Compact footprint", "Ideal for pens, notes, and small supplies"]
},
{
"title": "LED Desk Lamp with USB Charging Port", "brand": "BrightSpark", "category": "Office Lighting", "name": "LumiCharge-LS50",
"features": ["Adjustable brightness levels (5)", "Color temperature modes (3)", "Flexible gooseneck design", "Built-in USB charging port", "Eye-caring, flicker-free light", "Energy-efficient LED"]
},
]
# System message and prompt structure for inference
system_message_inference = "You are an expert copywriter. Generate a concise and appealing product description based on the provided details."
print("n--- Generating Descriptions with Fine-Tuned Model using Synthetic Test Data ---")
# Iterate through each item in the synthetic_test_items list
for item_data in synthetic_test_items:
# Construct the user prompt part based on the synthetic item's structure
user_prompt_inference = (
f"Generate a product description for the following office item:n"
f"Title: {item_data["title"]}n"
f"Brand: {item_data["brand"]}n"
f"Category: {item_data["category"]}n"
f"Name: {item_data["name"]}n"
f"Features: {", ".join(item_data["features"])}n"
f"Description:" # The model will generate text after this.
)
full_prompt_for_inference = (
f"<|start_header_id|>system<|end_header_id|>nn{system_message_inference}<|eot_id|>"
f"<|start_header_id|>user<|end_header_id|>nn{user_prompt_inference}<|eot_id|>"
f"<|start_header_id|>assistant<|end_header_id|>nn"
)
print(f"nPROMPT for item: {item_data["name"]}")
# Tokenize the full prompt string using the fine-tuned tokenizer.
inputs = fine_tuned_tokenizer_for_testing(
full_prompt_for_inference,
return_tensors="pt",
padding=False,
truncation=True,
max_length=max_seq_length_for_tokenization - 150
).to(fine_tuned_model_for_testing.device)
# Perform inference
with torch.no_grad():
outputs = fine_tuned_model_for_testing.generate(
**inputs,
max_new_tokens=150,
num_return_sequences=1,
do_sample=True,
temperature=0.6,
top_k=50,
top_p=0.9,
pad_token_id=fine_tuned_tokenizer_for_testing.eos_token_id,
eos_token_id=[
fine_tuned_tokenizer_for_testing.eos_token_id,
fine_tuned_tokenizer_for_testing.convert_tokens_to_ids("<|eot_id|>")
]
)
# Decode the generated token IDs back into a human-readable text string
generated_text_full = fine_tuned_tokenizer_for_testing.decode(outputs[0], skip_special_tokens=False)
# Define the marker that indicates the beginning of the assistant's response in the Llama chat format.
assistant_marker = "<|start_header_id|>assistant<|end_header_id|>nn"
# Find the last occurrence of the assistant marker in the generated text
assistant_response_start_index = generated_text_full.rfind(assistant_marker)
# Extract the actual generated description from the full model output
if assistant_response_start_index != -1:
# If the assistant marker is found, extract the text that comes after it
generated_description = generated_text_full[assistant_response_start_index + len(assistant_marker):]
# Define the end-of-turn token for Llama
eot_token = "<|eot_id|>"
# Check if the extracted description ends with the Llama end-of-turn token and remove it.
if generated_description.endswith(eot_token):
generated_description = generated_description[:-len(eot_token)]
# Also check if it ends with the tokenizer's standard end-of-sequence token and remove it.
if generated_description.endswith(fine_tuned_tokenizer_for_testing.eos_token):
generated_description = generated_description[:-len(fine_tuned_tokenizer_for_testing.eos_token)]
# Remove any leading or trailing whitespace from the cleaned description
generated_description = generated_description.strip()
else:
# Fallback: If the assistant marker is not found, try to extract the generated part by assuming it's everything after the original input prompt.
input_prompt_decoded_len = len(fine_tuned_tokenizer_for_testing.decode(inputs["input_ids"][0], skip_special_tokens=False))
# Decode the input prompt tokens to get its length as a string.
generated_description = generated_text_full[input_prompt_decoded_len:].strip()
# Clean up any trailing Llama end-of-turn token from this fallback extraction.
if generated_description.endswith("<|eot_id|>"):
generated_description = generated_description[:-len("<|eot_id|>")]
generated_description = generated_description.strip()
# Print the extracted and cleaned generated description
print(f"GENERATED (Fine-tuned):n{generated_description}")
# Print a separator line for better readability between items.
print("-" * 50)
This last Jupyter Notebook cell manages the inference process. That process is useful to see how good the training was during the fine-tuning process.
In particular, the above code:
- Defines testing data as a list called
synthetic_test_items
. Each element in this list is a dictionary representing a product, containing details like its title, brand, category, name, and a list of features. This data serves as the input for the model and its structure must match the one of the fine-tuning dataset. - Sets up their rnference prompt structure with
system_message_inference
. This must match the prompt used during the training process. - The
for item_data in synthetic_test_items
loop creates a user prompt for eachitem_data
. The structure of eachitem_data
must match the one used in the training process. - Tokenizes and controls how the model produces the output text. The actual inference is made under the
with
statement. Particularly, thanks to the methodgenerate()
that is the core inference step. - Decodes back the raw output from the model (which is a sequence of token IDs) into a human-readable string (
generated_text_full
) using the tokenizer. - Uses an
if-else
block to clean up the raw output from the language model to extract only the assistant’s generated product description. The raw output (generated_text_full
) typically includes the entire input prompt followed by the model’s response, all formatted with Llama’s special chat tokens. - Prints the results.
You can expect the result as follows:
--- Generating Descriptions with Fine-Tuned Model using Synthetic Test Data ---
PROMPT for item: ErgoPro-EL100
GENERATED (Fine-tuned):
**Introducing the ErgoPro-EL100: The Ultimate Executive Ergonomic Office Chair**
Experience the pinnacle of comfort and support with the ComfortLuxe ErgoPro-EL100, designed to elevate your work experience. This premium office chair boasts a high-back design that cradles your upper body, providing unparalleled lumbar support and promoting a healthy posture.
The breathable mesh fabric ensures a cool and comfortable seating experience, while the synchronized tilt mechanism allows for seamless adjustments to your preferred working position. The padded armrests provide additional support and comfort, reducing strain on your shoulders and wrists.
Built to last, the ErgoPro-EL100 features a heavy-duty nylon base that ensures stability and durability. Whether you're working long hours or simply
--------------------------------------------------
PROMPT for item: HeightRise-FD20
GENERATED (Fine-tuned):
**Elevate Your Productivity with FlexiDesk's HeightRise-FD20 Adjustable Standing Desk Converter**
Take your work to new heights with FlexiDesk's HeightRise-FD20, the ultimate adjustable standing desk converter. Designed to revolutionize your workspace, this innovative converter transforms any desk into a comfortable and ergonomic standing station.
**Experience the Benefits of Standing**
The HeightRise-FD20 features a spacious dual-tier surface, perfect for holding your laptop, monitor, and other essential work tools. The smooth gas spring lift allows for effortless height adjustments, ranging from 6 to 17 inches, ensuring a comfortable standing position that suits your needs.
**Durable and Reliable**
With a sturdy construction and non-slip rubber feet
--------------------------------------------------
Et voilà! You have fine-tuned Llama 4 with a fresh dataset retrieved using the Bright Data Scraper APIs.
Conclusion
In this article, you learned how to fine-tune Llama 4 with a dataset scraped from Amazon using Bright Data Scraper APIs. You have gone through all the process that consists in:
- Retrieving the data from the web.
- Setting up a Hugging Face account with a token.
- Setting up the necessary cloud infrastructure.
- Training and testing (inferencing) Llama 4.
The core of the fine-tuning process relies on having high-quality datasets. Luckily, Bright Data has you covered with numerous AI-ready services for dataset acquisition or creation:
- Scraping Browser: A Playwright, Selenium-, Puppeter-compatible browser with built-in unlocking capabilities.
- Web Scraper APIs: Pre-configured APIs for extracting structured data from 100+ major domains.
- Web Unlocker: An all-in-one API that handles site unlocking on sites with anti-bot protections.
- SERP API: A specialized API that unlocks search engine results and extracts complete SERP data.
- Foundation models: Access compliant, web-scale datasets to power pre-training, evaluation, and fine-tuning.
- Data providers: Connect with trusted providers to source high-quality, AI-ready datasets at scale.
- Data packages: Get curated, ready-to-use datasets—structured, enriched, and annotated.
Create a Bright Data account for free to test our AI-ready data infrastructure!