Fine-tune Llama 3.2 to generate Markdown friendly Python functions#
In this notebook, we are going to fine tune a Llama 3.2 1B model using QLORA and the Google Mostly Basic Python Problems dataset.
🛠️ Supported Hardware#
This notebook can run in a CPU or in a GPU.
✅ AMD Instinct™ Accelerators
✅ AMD Radeon™ RX/PRO Graphics Cards
Suggested hardware: AMD Instinct™ Accelerators, this notebook may not run in a CPU if your system does not have enough memory.
⚡ Recommended Software Environment#
🎯 Goals#
Specialize a model using fine tuning
Quantize the model using BitsandBytes
Define QLoRa parameters
Fine tune using SFTTrainer
See also
This notebook is partially based on the FluidNumerics webinar.
bitsandbytes is a Python wrapper library that offers fast and efficient 8-bit quantization of machine learning models.
Get the Model and Tokenizer#
Import some of the necessary packages
import torch
from numpy import argmax
from transformers import AutoTokenizer, BitsAndBytesConfig, LlamaForCausalLM, pipeline, TrainingArguments
from peft import LoraConfig, get_peft_model
import evaluate
from trl import SFTConfig, SFTTrainer
Select GPU if available, note that a consumer CPU may not be able to fine-tune this model if it does not have enough VRAM memory.
Note
Using a GPU with large memory is recommended.
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Device: {device}")
if device == torch.device("cuda"):
print(f'Device name: {torch.cuda.get_device_name(0)}')
print(f'GPU available memory: {torch.cuda.mem_get_info()[1]/1024/1024//1024} GB')
Device: cuda
Device name: AMD Instinct MI210
GPU available memory: 63.0 GB
Define the model id from HuggingFace, Llama 3.2 1 Billion parameter model. Get the tokenizer and set padding token to the EOS token. Also, set padding_side to right.
model_id = 'unsloth/Llama-3.2-1B'
my_tokenizer = AutoTokenizer.from_pretrained(path_to_model)
my_tokenizer.pad_token = my_tokenizer.eos_token
my_tokenizer.padding_side = 'right'
We will use BitsandBytes to quantize the model. First, we define the BitsAndBytesConfig, we will use 4-bit quantization with the fp4 datatype with nested quantization, finally the computation type is float16.
fp4_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="fp4",
bnb_4bit_use_double_quant=True,
bnb_4bit_compute_dtype=torch.float16,
)
g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Then we use transformers.LlamaForCausalLM.from_pretrained to load the model from Hugging Face and apply the fp4_config configuration. We will also set the device that we got before.
quantized_model = LlamaForCausalLM.from_pretrained(
model_id,
quantization_config=fp4_config,
device_map=device,
)
Sample Prompt#
Now, we will evaluate the model with a sample prompt. We define transformers.pipeline for text-generation using the quantized model.
sample_prompt = (
r"write a python function to find duplicate numbers in a list"
)
quantized_pipeline = pipeline(
"text-generation",
model=quantized_model,
tokenizer=my_tokenizer,
torch_dtype=torch.float16,
device_map=device,
)
Device set to use cuda:0
/opt/conda/envs/py_3.12/lib/python3.12/site-packages/transformers/integrations/sdpa_attention.py:54: UserWarning: Using AOTriton backend for Flash Attention forward... (Triggered internally at /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/flash_attn/flash_api.h:267.)
attn_output = torch.nn.functional.scaled_dot_product_attention(
Result:
write a python function to find duplicate numbers in a list of integer values
import from collections
duplicate_numbers_list = [1,0,3,0,2,3,6]
print(dup_number = [i for a if i!= a[i]] for i in enumerate(a.values()) if i == 1)
duplicate_numbers_list = [i for i in a.values()
for i in enumerate(a)]
print(dup number of numbers = duplicate_numbers)
print([i for a if i for i in enumerate(a)])
```
```
[1,0,3,0,2,3,0]
[0]
duplicate_number of number s = [i for i in enumerate(a)]
```
Then we can invoke the model to generate an answer to our prompt. We will also print the generated sequences.
Tip
Explore different values of top_k and temperature and run the prompt twice. What happens if you increase the temperature?
sequences = quantized_pipeline(
text_inputs=sample_prompt,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=my_tokenizer.eos_token_id,
max_new_tokens=512,
temperature=0.2,
)
for seq in sequences:
print(f"\nResult:\n{seq['generated_text']}")
Define fine-tune parameters#
Now, to fine tune the model we will use the Low-Rank Adaption technique. In this technique, instead of modifying the model itself a few extra parameters (rank) are added and then updated during the fine tuning process. For more information, check here.
We can define the LoRA configuration with peft.LoraConfig:
r: size of adaptation layerlora_alpha: indicates how strongly does the adaptation layer affect the base model see 4.1lora_dropout: optional dropout layerbias: whether or not to set biastask_type: task type see TaskTypetarget_modules: which modules to apply adapter layers
lora_config = LoraConfig(
r=16,
lora_alpha=16,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
target_modules=[
"up_proj",
"down_proj",
"gate_proj",
"k_proj",
"q_proj",
"v_proj",
"o_proj",
],
)
We this configuration, we can define our adapted_model, the model we will use the fine tune. And our adapted_pipeline
adapted_model = get_peft_model(quantized_model, lora_config)
adapted_pipeline = pipeline(
"text-generation",
model=adapted_model,
tokenizer=my_tokenizer,
device_map=device,
)
Device set to use cuda:0
Let’s run the sample_prompt on the adapted model.
Tip
Do you note anything different from the original model?
sequences = adapted_pipeline(
text_inputs=sample_prompt,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=my_tokenizer.eos_token_id,
max_new_tokens=512,
temperature=0.2
)
for seq in sequences:
print(f"\nResult:\n{seq['generated_text']}")
Result:
write a python function to find duplicate numbers in a list
def find_duplicate_numbers(numbers):
"""Find duplicate numbers in a list.
:param numbers: a list of numbers to search for duplicates.
:returns: a list of numbers that are duplicates.
"""
duplicates = []
for i in range(len(numbers)):
if numbers[i] == numbers[i+1]:
duplicates.append(numbers[i])
return duplicates
Result:
write a python function to find duplicate numbers in a list of integers
You can use the built-in function to find duplicates in Python. The function is named find_dublicates and it is declared inside the Python standard library.
The find_dublicates function takes a list of integers as its argument. It then uses a for loop to iterate over the list and checks if each integer is equal to any of the other integers in the list. If it is, then the function returns a boolean True, which means that there are duplicate numbers in the list. Otherwise, the function returns a boolean False, which means that there are no duplicate numbers in the list.
The following code shows how to use the find_dublicates function to find duplicate numbers in a list of integers:
list_of_ints = [2, 4, 5, 6, 8, 10, 11, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30]
print("The list of integers is: " + str(list_of_ints))
print("There are no duplicate numbers in the list.")
print("The list of integers is: " + str(list_of_ints))
The output of the code is as follows:
The list of integers
Result:
write a python function to find duplicate numbers in a list of integers
1. Write a Python function to find duplicate numbers in a list of integers. The function should return a list of tuples. The first element of each tuple should be the number of times the number occurs in the list, and the second element should be the number of times the number occurs in the list. The function should also print a message indicating whether the number occurs more than once in the list. For example, if the input list is [1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5], the function should return the list [[2, 2, 2], [4, 4, 4], [5, 5, 5]].
2. Write a Python function to find duplicate numbers in a list of integers. The function should return a list of tuples. The first element of each tuple should be the number of times the number occurs in the list, and the second element should be the number of times the number occurs in the list. The function should also print a message indicating whether the number occurs more than once in the list. For example, if the input list is [1, 2,
Result:
write a python function to find duplicate numbers in a list
The function returns True if the given list has at least one duplicate, and False otherwise. It returns False if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty. It returns false if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty. It returns False if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty. It returns False if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty. It returns False if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty. It returns False if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty. It returns False if the given list is empty
Result:
write a python function to find duplicate numbers in a list
I have a list of numbers that are duplicates, how can I find the duplicate numbers in the list?
The problem is that I need to find the duplicates numbers in the list. I have tried the code below but it is not working.
list = [1,2,3,4,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,
Get Dataset to fine-tune model#
We are going to use the Google Mostly Basic Python Problems dataset. Although, large language models are very good at Python, the idea of this example is to fine-tune the model into providing the output in a particular style. It may be possible to get similar results with prompt-engineering techniques, however the idea of the notebook is to show you an example of fine-tuning.
Load dataset and print it.
Note
By executing the next cell, you will download the dataset google-research-datasets/mbpp and you agree to its license and obtaining permission to use it from dataset owner if needed.
from datasets import load_dataset
google_python = load_dataset("google-research-datasets/mbpp", "sanitized")
print(google_python)
DatasetDict({
train: Dataset({
features: ['source_file', 'task_id', 'prompt', 'code', 'test_imports', 'test_list'],
num_rows: 120
})
test: Dataset({
features: ['source_file', 'task_id', 'prompt', 'code', 'test_imports', 'test_list'],
num_rows: 257
})
validation: Dataset({
features: ['source_file', 'task_id', 'prompt', 'code', 'test_imports', 'test_list'],
num_rows: 43
})
prompt: Dataset({
features: ['source_file', 'task_id', 'prompt', 'code', 'test_imports', 'test_list'],
num_rows: 7
})
})
We are now going to define the output format that we want the model to be fine tuning on using chat templates. The task is to fine-tune the model so the output Python is Markdown friendly, i.e., being able to print code snippets.
The function instructify receives the qr_row dictionary that contains the prompt, code and test_list. We define the qr_json template with the user and assistant role. The user role contains the prompt and the assistant role contains the Python code as snippet and the test list. Finally, we apply the apply_chat_template to the roles dict and add it to the text key and return qr_row.
def instructify(qr_row):
qr_json = [
{
"role": "user",
"content": qr_row["prompt"],
},
{
"role": "assistant",
"content": f'''
```python
{qr_row["code"]}
```
Test List:
```python
test_list={qr_row["test_list"]}
```
''',
},
]
qr_row["text"] = my_tokenizer.apply_chat_template(qr_json, tokenize=False)
return qr_row
We will define the chat template. Check Llama-3 prompt formats here. Concatenating query/response is sufficient for our use case.
my_tokenizer.chat_template = """{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = message['content'] | trim + '\n' %}{{ content }}{% endfor %}"""
print(my_tokenizer.chat_template)
{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = message['content'] | trim + '
' %}{{ content }}{% endfor %}
We now can apply the chat template to our dataset
formatted_dataset = google_python.map(instructify)
Display one example, you can see how the dataset now is formatted to show code snippets (```).
print(formatted_dataset["train"][0]["text"])
Write a python function to find the first repeated character in a given string.
```python
def first_repeated_char(str1):
for index,c in enumerate(str1):
if str1[:index+1].count(c) > 1:
return c
```
Test List:
```python
test_list=['assert first_repeated_char("abcabc") == "a"', 'assert first_repeated_char("abc") == None', 'assert first_repeated_char("123123") == "1"']
```
Display the same content using the IPython.display.Markdown visualization
from IPython.display import display, Markdown
Markdown(formatted_dataset["train"][0]["text"])
Write a python function to find the first repeated character in a given string.
def first_repeated_char(str1):
for index,c in enumerate(str1):
if str1[:index+1].count(c) > 1:
return c
Test List:
test_list=['assert first_repeated_char("abcabc") == "a"', 'assert first_repeated_char("abc") == None', 'assert first_repeated_char("123123") == "1"']
Let’s run this example prompt on the adapted model and observe the output. Although, we see some code snippet, the test list is not there.
example_prompt = formatted_dataset["test"][0]["prompt"]
sequences = adapted_pipeline(
text_inputs=example_prompt,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=my_tokenizer.eos_token_id,
max_new_tokens=512,
)
for seq in sequences:
print(f"\nResult:\n{seq['generated_text']}")
Result:
Write a python function to remove first and last occurrence of a given character from the string.
The function should return a new string with the given character removed from the string.
For example, if the string is 'hello', the function should return 'hello' and if the string is 'world', the function should return 'w'.
Hint: use the `count` method to count the number of occurrences of a character in the string and then remove the first and last occurrences of the character.
```python
string = 'hello'
char = 'o'
result = string.count(char)
print(f"String '{string}' has character '{char}' {result} times")
```
🚀 Fine-tune the Adapted Model#
We now define the metric that will be used to evaluate the fine-tuned model, we will use accuracy. We will also define the loss function with the compute_metric function.
metric = evaluate.load("accuracy")
def compute_metrics(eval_pred):
logits, labels = eval_pred
predictions = argmax(logits, axis=-1)
return evaluate.metric.compute(predictions=predictions, references=labels)
We also need to tokenize the dataset before it can be consumed in the training.
def tokenize_dataset(dataset, tokenizer, text_field):
def tokenize_function(examples):
return tokenizer(examples[text_field], truncation=True, padding=True)
return dataset.map(tokenize_function, batched=True)
tokenized_train_dataset = tokenize_dataset(formatted_dataset["train"], my_tokenizer, "text")
tokenized_eval_dataset = tokenize_dataset(formatted_dataset["test"], my_tokenizer, "text")
Let’s define our training configuration, we do this with trl.SFTConfig, some of the most relevant arguments are listed below:
per_device_train_batch_size: size of the training batchper_device_eval_batch_size: size of the evaluation batchgradient_accumulation_steps: Gradient accumulation stepsoptim: optimizer typenum_train_epochs: number of training epochseval_steps: evaluation stepslogging_steps: how often the model logs progresswarmup_steps: warmup stepslearning_rate: rate of learningWe use
fp16precisiongroup_by_length: Group samples by length
sft_config = SFTConfig(
output_dir="Llama-Python-Single-GPU",
per_device_train_batch_size=8,
per_device_eval_batch_size=8,
gradient_accumulation_steps=1,
optim="paged_adamw_8bit",
num_train_epochs=20,
eval_steps=0.5,
logging_steps=1,
warmup_steps=10,
logging_strategy="steps",
learning_rate=1e-4,
fp16=True,
bf16=False,
group_by_length=True,
max_seq_length=512,
)
No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.
With the configuration defined, we can finally create the trl.SFTTrainer that will help us with the fine tuning.
We initialize it with the adapted_model, the tokenized tran and eval datasets, the SFTConfig and the lora_config.
trainer = SFTTrainer(
model=adapted_model,
train_dataset=tokenized_train_dataset,
eval_dataset=tokenized_eval_dataset,
args=sft_config,
peft_config=lora_config,
)
Finally, we can call the .train() method to start fine tuning the model.
trainer.train()
/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:823: UserWarning: Using AOTriton backend for Flash Attention backward... (Triggered internally at /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/flash_attn/flash_api.h:452.)
return Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
| Step | Training Loss |
|---|---|
| 1 | 3.569500 |
| 2 | 2.826900 |
| 3 | 3.377800 |
| 4 | 3.833900 |
| 5 | 3.580000 |
| 6 | 3.138300 |
| 7 | 1.186300 |
| 8 | 0.832100 |
| 9 | 0.877700 |
| 10 | 0.707000 |
| 11 | 0.662100 |
| 12 | 0.608500 |
| 13 | 0.583300 |
| 14 | 0.515600 |
| 15 | 0.615300 |
| 16 | 0.524200 |
| 17 | 0.563000 |
| 18 | 0.554100 |
| 19 | 0.597600 |
| 20 | 0.570800 |
| 21 | 0.520200 |
| 22 | 0.508100 |
| 23 | 0.622500 |
| 24 | 0.571600 |
| 25 | 0.454400 |
| 26 | 0.446500 |
| 27 | 0.461600 |
| 28 | 0.452200 |
| 29 | 0.439300 |
| 30 | 0.437300 |
| 31 | 0.387000 |
| 32 | 0.400400 |
| 33 | 0.391100 |
| 34 | 0.505800 |
| 35 | 0.451600 |
| 36 | 0.493600 |
| 37 | 0.464200 |
| 38 | 0.338600 |
| 39 | 0.363300 |
| 40 | 0.393800 |
| 41 | 0.409900 |
| 42 | 0.428500 |
| 43 | 0.409100 |
| 44 | 0.361000 |
| 45 | 0.439500 |
| 46 | 0.393300 |
| 47 | 0.455900 |
| 48 | 0.430300 |
| 49 | 0.385500 |
| 50 | 0.349300 |
| 51 | 0.372000 |
| 52 | 0.376100 |
| 53 | 0.402300 |
| 54 | 0.296200 |
| 55 | 0.478900 |
| 56 | 0.296300 |
| 57 | 0.352800 |
| 58 | 0.437200 |
| 59 | 0.365000 |
| 60 | 0.266100 |
| 61 | 0.316700 |
| 62 | 0.371000 |
| 63 | 0.331200 |
| 64 | 0.280900 |
| 65 | 0.326300 |
| 66 | 0.357000 |
| 67 | 0.444300 |
| 68 | 0.347000 |
| 69 | 0.349800 |
| 70 | 0.369600 |
| 71 | 0.403700 |
| 72 | 0.334000 |
| 73 | 0.330900 |
| 74 | 0.334200 |
| 75 | 0.306300 |
| 76 | 0.231900 |
| 77 | 0.433400 |
| 78 | 0.337900 |
| 79 | 0.298800 |
| 80 | 0.318100 |
| 81 | 0.397400 |
| 82 | 0.266800 |
| 83 | 0.384300 |
| 84 | 0.292300 |
| 85 | 0.311600 |
| 86 | 0.360800 |
| 87 | 0.278800 |
| 88 | 0.288300 |
| 89 | 0.276300 |
| 90 | 0.293100 |
| 91 | 0.254400 |
| 92 | 0.315400 |
| 93 | 0.283700 |
| 94 | 0.349500 |
| 95 | 0.294100 |
| 96 | 0.362800 |
| 97 | 0.232200 |
| 98 | 0.255300 |
| 99 | 0.265800 |
| 100 | 0.220800 |
| 101 | 0.320300 |
| 102 | 0.263700 |
| 103 | 0.269200 |
| 104 | 0.325000 |
| 105 | 0.263800 |
| 106 | 0.248200 |
| 107 | 0.240100 |
| 108 | 0.271100 |
| 109 | 0.268400 |
| 110 | 0.248100 |
| 111 | 0.236700 |
| 112 | 0.228900 |
| 113 | 0.277300 |
| 114 | 0.251400 |
| 115 | 0.209000 |
| 116 | 0.243500 |
| 117 | 0.314900 |
| 118 | 0.222000 |
| 119 | 0.254200 |
| 120 | 0.247900 |
| 121 | 0.183900 |
| 122 | 0.260100 |
| 123 | 0.199800 |
| 124 | 0.209500 |
| 125 | 0.231200 |
| 126 | 0.199900 |
| 127 | 0.264000 |
| 128 | 0.194800 |
| 129 | 0.235700 |
| 130 | 0.272500 |
| 131 | 0.153900 |
| 132 | 0.166400 |
| 133 | 0.210300 |
| 134 | 0.226100 |
| 135 | 0.203000 |
| 136 | 0.209000 |
| 137 | 0.202800 |
| 138 | 0.140800 |
| 139 | 0.239200 |
| 140 | 0.159800 |
| 141 | 0.153900 |
| 142 | 0.143900 |
| 143 | 0.194200 |
| 144 | 0.151900 |
| 145 | 0.128100 |
| 146 | 0.144700 |
| 147 | 0.160100 |
| 148 | 0.204300 |
| 149 | 0.250600 |
| 150 | 0.199300 |
| 151 | 0.151200 |
| 152 | 0.139200 |
| 153 | 0.115300 |
| 154 | 0.127300 |
| 155 | 0.178400 |
| 156 | 0.136900 |
| 157 | 0.161900 |
| 158 | 0.141400 |
| 159 | 0.179700 |
| 160 | 0.141700 |
| 161 | 0.126400 |
| 162 | 0.154400 |
| 163 | 0.123900 |
| 164 | 0.137000 |
| 165 | 0.179800 |
| 166 | 0.134100 |
| 167 | 0.108700 |
| 168 | 0.115800 |
| 169 | 0.121900 |
| 170 | 0.147200 |
| 171 | 0.139700 |
| 172 | 0.092700 |
| 173 | 0.117300 |
| 174 | 0.089800 |
| 175 | 0.134700 |
| 176 | 0.098500 |
| 177 | 0.124000 |
| 178 | 0.090500 |
| 179 | 0.121300 |
| 180 | 0.105100 |
| 181 | 0.077800 |
| 182 | 0.064200 |
| 183 | 0.129400 |
| 184 | 0.080300 |
| 185 | 0.078100 |
| 186 | 0.068900 |
| 187 | 0.107600 |
| 188 | 0.088600 |
| 189 | 0.082100 |
| 190 | 0.118300 |
| 191 | 0.066500 |
| 192 | 0.103400 |
| 193 | 0.082200 |
| 194 | 0.156200 |
| 195 | 0.082900 |
| 196 | 0.053700 |
| 197 | 0.052300 |
| 198 | 0.060400 |
| 199 | 0.065800 |
| 200 | 0.087200 |
| 201 | 0.091500 |
| 202 | 0.056800 |
| 203 | 0.093500 |
| 204 | 0.088000 |
| 205 | 0.077500 |
| 206 | 0.052200 |
| 207 | 0.073000 |
| 208 | 0.084000 |
| 209 | 0.086200 |
| 210 | 0.076300 |
| 211 | 0.061200 |
| 212 | 0.041400 |
| 213 | 0.057200 |
| 214 | 0.065400 |
| 215 | 0.041500 |
| 216 | 0.042200 |
| 217 | 0.065700 |
| 218 | 0.056100 |
| 219 | 0.046400 |
| 220 | 0.056600 |
| 221 | 0.062000 |
| 222 | 0.077000 |
| 223 | 0.082900 |
| 224 | 0.046800 |
| 225 | 0.076400 |
| 226 | 0.039500 |
| 227 | 0.037800 |
| 228 | 0.045500 |
| 229 | 0.077600 |
| 230 | 0.073400 |
| 231 | 0.054100 |
| 232 | 0.044400 |
| 233 | 0.058500 |
| 234 | 0.053800 |
| 235 | 0.039500 |
| 236 | 0.028500 |
| 237 | 0.058700 |
| 238 | 0.037500 |
| 239 | 0.043700 |
| 240 | 0.036500 |
| 241 | 0.050600 |
| 242 | 0.041500 |
| 243 | 0.023900 |
| 244 | 0.039600 |
| 245 | 0.071200 |
| 246 | 0.029500 |
| 247 | 0.026500 |
| 248 | 0.032400 |
| 249 | 0.047700 |
| 250 | 0.043100 |
| 251 | 0.047700 |
| 252 | 0.026300 |
| 253 | 0.051000 |
| 254 | 0.055200 |
| 255 | 0.040100 |
| 256 | 0.029200 |
| 257 | 0.028000 |
| 258 | 0.032400 |
| 259 | 0.056400 |
| 260 | 0.030100 |
| 261 | 0.055000 |
| 262 | 0.033100 |
| 263 | 0.020600 |
| 264 | 0.029700 |
| 265 | 0.067200 |
| 266 | 0.041800 |
| 267 | 0.034200 |
| 268 | 0.039600 |
| 269 | 0.045400 |
| 270 | 0.024600 |
| 271 | 0.029900 |
| 272 | 0.030900 |
| 273 | 0.028000 |
| 274 | 0.031800 |
| 275 | 0.027400 |
| 276 | 0.021900 |
| 277 | 0.064100 |
| 278 | 0.034900 |
| 279 | 0.059000 |
| 280 | 0.026200 |
| 281 | 0.033600 |
| 282 | 0.036200 |
| 283 | 0.038700 |
| 284 | 0.029800 |
| 285 | 0.031200 |
| 286 | 0.038100 |
| 287 | 0.024300 |
| 288 | 0.023000 |
| 289 | 0.029200 |
| 290 | 0.047100 |
| 291 | 0.025700 |
| 292 | 0.044700 |
| 293 | 0.027000 |
| 294 | 0.035400 |
| 295 | 0.040800 |
| 296 | 0.029500 |
| 297 | 0.035400 |
| 298 | 0.028200 |
| 299 | 0.039600 |
| 300 | 0.028800 |
TrainOutput(global_step=300, training_loss=0.2689305164354543, metrics={'train_runtime': 192.0013, 'train_samples_per_second': 12.5, 'train_steps_per_second': 1.562, 'total_flos': 6322328115609600.0, 'train_loss': 0.2689305164354543})
You can decide to save the model.
save_model = False
if save_model:
trainer.save_model()
Evaluate Fine-tuned Model#
After the fine tuning, we can evaluate if we achieved our desired outcome. Let us define a different prompt and invoke the fine-tuned model.
example_prompt = r"write a python function that returns the least common denominator of all elements in a list."
sequences = adapted_pipeline(
text_inputs=example_prompt,
do_sample=True,
top_k=10,
num_return_sequences=1,
eos_token_id=my_tokenizer.eos_token_id,
max_new_tokens=512,
temperature=0.2
)
Display the generated text using the Markdown display.
Markdown(sequences[0]["generated_text"])
write a python function that returns the least common denominator of all elements in a list. https://www.geeksforgeeks.org/least-common-denominator/
def lcm_of_elements(arr):
(left, right) = (arr[0], arr[-1])
for m in (le, rt):
if (m == left or m == m * right / m):
return m
else:
return m
Test List:
test_list=['assert lcm_of_elements([2,2,1])->1', 'assert lcm_of_elements([1,5,7,1])->5', 'assert lcm_of_elements([12,45,67,12])->45']
Summary#
In this notebook you quantized a Llama model, then added LoRA to adapt the model to be able to train on a custom dataset. You also defined chat templates that guided the fine-tuning process.
Now, you may be wondering how much bigger is the adapted model. Let’s have a look.
from torchinfo import summary
model_quant = summary(quantized_model, input_size=(1, 112, 112), col_names=["input_size", "output_size", "num_params", "mult_adds", "trainable"])
model_quant
adapt_model_quant = summary(adapted_model, input_size=(1, 112, 112), col_names=["input_size", "output_size", "num_params", "mult_adds", "trainable"])
adapt_model_quant
Copyright (C) 2025 Advanced Micro Devices, Inc. All rights reserved. Portions of this file consist of AI-generated content.
SPDX-License-Identifier: MIT