Fine-tune Llama 3.2 to generate Markdown friendly Python functions

Fine-tune Llama 3.2 to generate Markdown friendly Python functions#

In this notebook, we are going to fine tune a Llama 3.2 1B model using QLORA and the Google Mostly Basic Python Problems dataset.

🛠️ Supported Hardware#

This notebook can run in a CPU or in a GPU.

✅ AMD Instinct™ Accelerators
✅ AMD Radeon™ RX/PRO Graphics Cards

Suggested hardware: AMD Instinct™ Accelerators, this notebook may not run in a CPU if your system does not have enough memory.

⚡ Recommended Software Environment#

Linux

🎯 Goals#

Specialize a model using fine tuning
Quantize the model using BitsandBytes
Define QLoRa parameters
Fine tune using SFTTrainer

Get the Model and Tokenizer#

Import some of the necessary packages

import torch
from numpy import argmax

from transformers import AutoTokenizer, BitsAndBytesConfig, LlamaForCausalLM, pipeline, TrainingArguments
from peft import LoraConfig, get_peft_model
import evaluate
from trl import SFTConfig, SFTTrainer

Select GPU if available, note that a consumer CPU may not be able to fine-tune this model if it does not have enough VRAM memory.

Note

Using a GPU with large memory is recommended.

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

print(f"Device: {device}")
if device == torch.device("cuda"):
    print(f'Device name: {torch.cuda.get_device_name(0)}')
    print(f'GPU available memory: {torch.cuda.mem_get_info()[1]/1024/1024//1024} GB')

Device: cuda
Device name: AMD Instinct MI210
GPU available memory: 63.0 GB

Define the model id from HuggingFace, Llama 3.2 1 Billion parameter model. Get the tokenizer and set padding token to the EOS token. Also, set padding_side to right.

model_id = 'unsloth/Llama-3.2-1B'

my_tokenizer = AutoTokenizer.from_pretrained(path_to_model)
my_tokenizer.pad_token = my_tokenizer.eos_token
my_tokenizer.padding_side = 'right'

We will use BitsandBytes to quantize the model. First, we define the BitsAndBytesConfig, we will use 4-bit quantization with the fp4 datatype with nested quantization, finally the computation type is float16.

fp4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="fp4",
    bnb_4bit_use_double_quant=True,
    bnb_4bit_compute_dtype=torch.float16,
)

g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
Copyright (C) 2023 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Then we use transformers.LlamaForCausalLM.from_pretrained to load the model from Hugging Face and apply the fp4_config configuration. We will also set the device that we got before.

quantized_model = LlamaForCausalLM.from_pretrained(
    model_id,
    quantization_config=fp4_config,
    device_map=device,
)

Sample Prompt#

Now, we will evaluate the model with a sample prompt. We define transformers.pipeline for text-generation using the quantized model.

sample_prompt = (
    r"write a python function to find duplicate numbers in a list"
)

quantized_pipeline = pipeline(
    "text-generation",
    model=quantized_model,
    tokenizer=my_tokenizer,
    torch_dtype=torch.float16,
    device_map=device,
)

Device set to use cuda:0
/opt/conda/envs/py_3.12/lib/python3.12/site-packages/transformers/integrations/sdpa_attention.py:54: UserWarning: Using AOTriton backend for Flash Attention forward... (Triggered internally at /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/flash_attn/flash_api.h:267.)
  attn_output = torch.nn.functional.scaled_dot_product_attention(

Result:
write a python function to find duplicate numbers in a list of integer values
import  from  collections

duplicate_numbers_list = [1,0,3,0,2,3,6]

print(dup_number = [i for a if i!= a[i]] for i in enumerate(a.values()) if  i == 1)

duplicate_numbers_list = [i  for  i in a.values()
                              for  i in enumerate(a)]
print(dup number  of numbers = duplicate_numbers)

print([i for a if i  for  i in enumerate(a)])
```
```
[1,0,3,0,2,3,0]

[0]

duplicate_number  of  number s = [i for  i in enumerate(a)]
```

Then we can invoke the model to generate an answer to our prompt. We will also print the generated sequences.

Tip

Explore different values of top_k and temperature and run the prompt twice. What happens if you increase the temperature?

sequences = quantized_pipeline(
    text_inputs=sample_prompt,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=my_tokenizer.eos_token_id,
    max_new_tokens=512,
    temperature=0.2,
)

for seq in sequences:
    print(f"\nResult:\n{seq['generated_text']}")

Define fine-tune parameters#

Now, to fine tune the model we will use the Low-Rank Adaption technique. In this technique, instead of modifying the model itself a few extra parameters (rank) are added and then updated during the fine tuning process. For more information, check here.

We can define the LoRA configuration with peft.LoraConfig:

r: size of adaptation layer
lora_alpha: indicates how strongly does the adaptation layer affect the base model see 4.1
lora_dropout: optional dropout layer
bias: whether or not to set bias
task_type: task type see TaskType
target_modules: which modules to apply adapter layers

lora_config = LoraConfig(
    r=16,
    lora_alpha=16,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
    target_modules=[
        "up_proj",
        "down_proj",
        "gate_proj",
        "k_proj",
        "q_proj",
        "v_proj",
        "o_proj",
    ],
)

We this configuration, we can define our adapted_model, the model we will use the fine tune. And our adapted_pipeline

adapted_model = get_peft_model(quantized_model, lora_config)

adapted_pipeline = pipeline(
    "text-generation",
    model=adapted_model,
    tokenizer=my_tokenizer,
    device_map=device,
)

Device set to use cuda:0

Let’s run the sample_prompt on the adapted model.

Tip

Do you note anything different from the original model?

sequences = adapted_pipeline(
    text_inputs=sample_prompt,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=my_tokenizer.eos_token_id,
    max_new_tokens=512,
    temperature=0.2
)

for seq in sequences:
    print(f"\nResult:\n{seq['generated_text']}")

Result:
write a python function to find duplicate numbers in a list

def find_duplicate_numbers(numbers):
  """Find duplicate numbers in a list.
  :param numbers: a list of numbers to search for duplicates.
  :returns: a list of numbers that are duplicates.
  """
  duplicates = []
  for i in range(len(numbers)):
    if numbers[i] == numbers[i+1]:
      duplicates.append(numbers[i])
  return duplicates


Result:
write a python function to find duplicate numbers in a list of integers
You can use the built-in function to find duplicates in Python. The function is named find_dublicates and it is declared inside the Python standard library.
The find_dublicates function takes a list of integers as its argument. It then uses a for loop to iterate over the list and checks if each integer is equal to any of the other integers in the list. If it is, then the function returns a boolean True, which means that there are duplicate numbers in the list. Otherwise, the function returns a boolean False, which means that there are no duplicate numbers in the list.
The following code shows how to use the find_dublicates function to find duplicate numbers in a list of integers:
list_of_ints = [2, 4, 5, 6, 8, 10, 11, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30]
print("The list of integers is: " + str(list_of_ints))
print("There are no duplicate numbers in the list.")
print("The list of integers is: " + str(list_of_ints))
The output of the code is as follows:
The list of integers

Result:
write a python function to find duplicate numbers in a list of integers
1. Write a Python function to find duplicate numbers in a list of integers. The function should return a list of tuples. The first element of each tuple should be the number of times the number occurs in the list, and the second element should be the number of times the number occurs in the list. The function should also print a message indicating whether the number occurs more than once in the list. For example, if the input list is [1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 5], the function should return the list [[2, 2, 2], [4, 4, 4], [5, 5, 5]].
2. Write a Python function to find duplicate numbers in a list of integers. The function should return a list of tuples. The first element of each tuple should be the number of times the number occurs in the list, and the second element should be the number of times the number occurs in the list. The function should also print a message indicating whether the number occurs more than once in the list. For example, if the input list is [1, 2, 

Result:
write a python function to find duplicate numbers in a list
The function returns True if the given list has at least one duplicate, and False otherwise. It returns False if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty. It returns false if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty. It returns False if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty. It returns False if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty. It returns False if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty. It returns False if the given list is empty.
A function that returns true if the given list has at least one duplicate, and false if the given list is empty. It returns False if the given list is empty

Result:
write a python function to find duplicate numbers in a list
I have a list of numbers that are duplicates, how can I find the duplicate numbers in the list?
The problem is that I need to find the duplicates numbers in the list. I have tried the code below but it is not working.
list = [1,2,3,4,5,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,6,

Get Dataset to fine-tune model#

We are going to use the Google Mostly Basic Python Problems dataset. Although, large language models are very good at Python, the idea of this example is to fine-tune the model into providing the output in a particular style. It may be possible to get similar results with prompt-engineering techniques, however the idea of the notebook is to show you an example of fine-tuning.

Load dataset and print it.

Note

By executing the next cell, you will download the dataset google-research-datasets/mbpp and you agree to its license and obtaining permission to use it from dataset owner if needed.

from datasets import load_dataset

google_python = load_dataset("google-research-datasets/mbpp", "sanitized")

print(google_python)

DatasetDict({
    train: Dataset({
        features: ['source_file', 'task_id', 'prompt', 'code', 'test_imports', 'test_list'],
        num_rows: 120
    })
    test: Dataset({
        features: ['source_file', 'task_id', 'prompt', 'code', 'test_imports', 'test_list'],
        num_rows: 257
    })
    validation: Dataset({
        features: ['source_file', 'task_id', 'prompt', 'code', 'test_imports', 'test_list'],
        num_rows: 43
    })
    prompt: Dataset({
        features: ['source_file', 'task_id', 'prompt', 'code', 'test_imports', 'test_list'],
        num_rows: 7
    })
})

We are now going to define the output format that we want the model to be fine tuning on using chat templates. The task is to fine-tune the model so the output Python is Markdown friendly, i.e., being able to print code snippets.

The function instructify receives the qr_row dictionary that contains the prompt, code and test_list. We define the qr_json template with the user and assistant role. The user role contains the prompt and the assistant role contains the Python code as snippet and the test list. Finally, we apply the apply_chat_template to the roles dict and add it to the text key and return qr_row.

def instructify(qr_row):
    qr_json = [
        {
            "role": "user",
            "content": qr_row["prompt"],
        },
        {
            "role": "assistant",
            "content": f'''
```python
{qr_row["code"]}
```

Test List:

```python
test_list={qr_row["test_list"]}
```
''',
        },
    ]

    qr_row["text"] = my_tokenizer.apply_chat_template(qr_json, tokenize=False)
    return qr_row

We will define the chat template. Check Llama-3 prompt formats here. Concatenating query/response is sufficient for our use case.

my_tokenizer.chat_template = """{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = message['content'] | trim + '\n' %}{{ content }}{% endfor %}"""

print(my_tokenizer.chat_template)

{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = message['content'] | trim + '
' %}{{ content }}{% endfor %}

We now can apply the chat template to our dataset

formatted_dataset = google_python.map(instructify)

Display one example, you can see how the dataset now is formatted to show code snippets (```).

print(formatted_dataset["train"][0]["text"])

Write a python function to find the first repeated character in a given string.
```python
def first_repeated_char(str1):
  for index,c in enumerate(str1):
    if str1[:index+1].count(c) > 1:
      return c
```

Test List:

```python
test_list=['assert first_repeated_char("abcabc") == "a"', 'assert first_repeated_char("abc") == None', 'assert first_repeated_char("123123") == "1"']
```

Display the same content using the IPython.display.Markdown visualization

from IPython.display import display, Markdown
Markdown(formatted_dataset["train"][0]["text"])

Write a python function to find the first repeated character in a given string.

def first_repeated_char(str1):
  for index,c in enumerate(str1):
    if str1[:index+1].count(c) > 1:
      return c

Test List:

test_list=['assert first_repeated_char("abcabc") == "a"', 'assert first_repeated_char("abc") == None', 'assert first_repeated_char("123123") == "1"']

Let’s run this example prompt on the adapted model and observe the output. Although, we see some code snippet, the test list is not there.

example_prompt = formatted_dataset["test"][0]["prompt"]

sequences = adapted_pipeline(
    text_inputs=example_prompt,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=my_tokenizer.eos_token_id,
    max_new_tokens=512,
)

for seq in sequences:
    print(f"\nResult:\n{seq['generated_text']}")

Result:
Write a python function to remove first and last occurrence of a given character from the string. 
The function should return a new string with the given character removed from the string.

For example, if the string is 'hello', the function should return 'hello' and if the string is 'world', the function should return 'w'.

Hint: use the `count` method to count the number of occurrences of a character in the string and then remove the first and last occurrences of the character.
```python
string = 'hello'
char = 'o'
result = string.count(char)
print(f"String '{string}' has character '{char}' {result} times")
```

🚀 Fine-tune the Adapted Model#

We now define the metric that will be used to evaluate the fine-tuned model, we will use accuracy. We will also define the loss function with the compute_metric function.

metric = evaluate.load("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = argmax(logits, axis=-1)
    return evaluate.metric.compute(predictions=predictions, references=labels)

We also need to tokenize the dataset before it can be consumed in the training.

def tokenize_dataset(dataset, tokenizer, text_field):
    def tokenize_function(examples):
        return tokenizer(examples[text_field], truncation=True, padding=True)

    return dataset.map(tokenize_function, batched=True)

tokenized_train_dataset = tokenize_dataset(formatted_dataset["train"], my_tokenizer, "text")
tokenized_eval_dataset = tokenize_dataset(formatted_dataset["test"], my_tokenizer, "text")

Let’s define our training configuration, we do this with trl.SFTConfig, some of the most relevant arguments are listed below:

per_device_train_batch_size: size of the training batch
per_device_eval_batch_size: size of the evaluation batch
gradient_accumulation_steps: Gradient accumulation steps
optim: optimizer type
num_train_epochs: number of training epochs
eval_steps: evaluation steps
logging_steps: how often the model logs progress
warmup_steps: warmup steps
learning_rate: rate of learning
We use fp16 precision
group_by_length: Group samples by length

sft_config = SFTConfig(
    output_dir="Llama-Python-Single-GPU",
    per_device_train_batch_size=8,
    per_device_eval_batch_size=8,
    gradient_accumulation_steps=1,
    optim="paged_adamw_8bit",
    num_train_epochs=20,
    eval_steps=0.5,
    logging_steps=1,
    warmup_steps=10,
    logging_strategy="steps",
    learning_rate=1e-4,
    fp16=True,
    bf16=False,
    group_by_length=True,
    max_seq_length=512,
)

No label_names provided for model class `PeftModelForCausalLM`. Since `PeftModel` hides base models input arguments, if label_names is not given, label_names can't be set automatically within `Trainer`. Note that empty label_names list will be used instead.

With the configuration defined, we can finally create the trl.SFTTrainer that will help us with the fine tuning. We initialize it with the adapted_model, the tokenized tran and eval datasets, the SFTConfig and the lora_config.

trainer = SFTTrainer(
    model=adapted_model,
    train_dataset=tokenized_train_dataset,
    eval_dataset=tokenized_eval_dataset,
    args=sft_config,
    peft_config=lora_config,
)

Finally, we can call the .train() method to start fine tuning the model.

trainer.train()

/opt/conda/envs/py_3.12/lib/python3.12/site-packages/torch/autograd/graph.py:823: UserWarning: Using AOTriton backend for Flash Attention backward... (Triggered internally at /var/lib/jenkins/pytorch/aten/src/ATen/native/transformers/hip/flash_attn/flash_api.h:452.)
  return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass

[300/300 03:10, Epoch 20/20]

Step	Training Loss
1	3.569500
2	2.826900
3	3.377800
4	3.833900
5	3.580000
6	3.138300
7	1.186300
8	0.832100
9	0.877700
10	0.707000
11	0.662100
12	0.608500
13	0.583300
14	0.515600
15	0.615300
16	0.524200
17	0.563000
18	0.554100
19	0.597600
20	0.570800
21	0.520200
22	0.508100
23	0.622500
24	0.571600
25	0.454400
26	0.446500
27	0.461600
28	0.452200
29	0.439300
30	0.437300
31	0.387000
32	0.400400
33	0.391100
34	0.505800
35	0.451600
36	0.493600
37	0.464200
38	0.338600
39	0.363300
40	0.393800
41	0.409900
42	0.428500
43	0.409100
44	0.361000
45	0.439500
46	0.393300
47	0.455900
48	0.430300
49	0.385500
50	0.349300
51	0.372000
52	0.376100
53	0.402300
54	0.296200
55	0.478900
56	0.296300
57	0.352800
58	0.437200
59	0.365000
60	0.266100
61	0.316700
62	0.371000
63	0.331200
64	0.280900
65	0.326300
66	0.357000
67	0.444300
68	0.347000
69	0.349800
70	0.369600
71	0.403700
72	0.334000
73	0.330900
74	0.334200
75	0.306300
76	0.231900
77	0.433400
78	0.337900
79	0.298800
80	0.318100
81	0.397400
82	0.266800
83	0.384300
84	0.292300
85	0.311600
86	0.360800
87	0.278800
88	0.288300
89	0.276300
90	0.293100
91	0.254400
92	0.315400
93	0.283700
94	0.349500
95	0.294100
96	0.362800
97	0.232200
98	0.255300
99	0.265800
100	0.220800
101	0.320300
102	0.263700
103	0.269200
104	0.325000
105	0.263800
106	0.248200
107	0.240100
108	0.271100
109	0.268400
110	0.248100
111	0.236700
112	0.228900
113	0.277300
114	0.251400
115	0.209000
116	0.243500
117	0.314900
118	0.222000
119	0.254200
120	0.247900
121	0.183900
122	0.260100
123	0.199800
124	0.209500
125	0.231200
126	0.199900
127	0.264000
128	0.194800
129	0.235700
130	0.272500
131	0.153900
132	0.166400
133	0.210300
134	0.226100
135	0.203000
136	0.209000
137	0.202800
138	0.140800
139	0.239200
140	0.159800
141	0.153900
142	0.143900
143	0.194200
144	0.151900
145	0.128100
146	0.144700
147	0.160100
148	0.204300
149	0.250600
150	0.199300
151	0.151200
152	0.139200
153	0.115300
154	0.127300
155	0.178400
156	0.136900
157	0.161900
158	0.141400
159	0.179700
160	0.141700
161	0.126400
162	0.154400
163	0.123900
164	0.137000
165	0.179800
166	0.134100
167	0.108700
168	0.115800
169	0.121900
170	0.147200
171	0.139700
172	0.092700
173	0.117300
174	0.089800
175	0.134700
176	0.098500
177	0.124000
178	0.090500
179	0.121300
180	0.105100
181	0.077800
182	0.064200
183	0.129400
184	0.080300
185	0.078100
186	0.068900
187	0.107600
188	0.088600
189	0.082100
190	0.118300
191	0.066500
192	0.103400
193	0.082200
194	0.156200
195	0.082900
196	0.053700
197	0.052300
198	0.060400
199	0.065800
200	0.087200
201	0.091500
202	0.056800
203	0.093500
204	0.088000
205	0.077500
206	0.052200
207	0.073000
208	0.084000
209	0.086200
210	0.076300
211	0.061200
212	0.041400
213	0.057200
214	0.065400
215	0.041500
216	0.042200
217	0.065700
218	0.056100
219	0.046400
220	0.056600
221	0.062000
222	0.077000
223	0.082900
224	0.046800
225	0.076400
226	0.039500
227	0.037800
228	0.045500
229	0.077600
230	0.073400
231	0.054100
232	0.044400
233	0.058500
234	0.053800
235	0.039500
236	0.028500
237	0.058700
238	0.037500
239	0.043700
240	0.036500
241	0.050600
242	0.041500
243	0.023900
244	0.039600
245	0.071200
246	0.029500
247	0.026500
248	0.032400
249	0.047700
250	0.043100
251	0.047700
252	0.026300
253	0.051000
254	0.055200
255	0.040100
256	0.029200
257	0.028000
258	0.032400
259	0.056400
260	0.030100
261	0.055000
262	0.033100
263	0.020600
264	0.029700
265	0.067200
266	0.041800
267	0.034200
268	0.039600
269	0.045400
270	0.024600
271	0.029900
272	0.030900
273	0.028000
274	0.031800
275	0.027400
276	0.021900
277	0.064100
278	0.034900
279	0.059000
280	0.026200
281	0.033600
282	0.036200
283	0.038700
284	0.029800
285	0.031200
286	0.038100
287	0.024300
288	0.023000
289	0.029200
290	0.047100
291	0.025700
292	0.044700
293	0.027000
294	0.035400
295	0.040800
296	0.029500
297	0.035400
298	0.028200
299	0.039600
300	0.028800

TrainOutput(global_step=300, training_loss=0.2689305164354543, metrics={'train_runtime': 192.0013, 'train_samples_per_second': 12.5, 'train_steps_per_second': 1.562, 'total_flos': 6322328115609600.0, 'train_loss': 0.2689305164354543})

You can decide to save the model.

save_model = False
if save_model:
    trainer.save_model()

Evaluate Fine-tuned Model#

After the fine tuning, we can evaluate if we achieved our desired outcome. Let us define a different prompt and invoke the fine-tuned model.

example_prompt = r"write a python function that returns the least common denominator of all elements in a list."

sequences = adapted_pipeline(
    text_inputs=example_prompt,
    do_sample=True,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=my_tokenizer.eos_token_id,
    max_new_tokens=512,
    temperature=0.2
)

Display the generated text using the Markdown display.

Markdown(sequences[0]["generated_text"])

write a python function that returns the least common denominator of all elements in a list. https://www.geeksforgeeks.org/least-common-denominator/

def lcm_of_elements(arr):
    (left, right) = (arr[0], arr[-1])
    for m in (le, rt):
        if (m == left or m == m * right / m):
            return m
        else:
            return m

Test List:

test_list=['assert lcm_of_elements([2,2,1])->1', 'assert lcm_of_elements([1,5,7,1])->5', 'assert lcm_of_elements([12,45,67,12])->45']

Summary#

In this notebook you quantized a Llama model, then added LoRA to adapt the model to be able to train on a custom dataset. You also defined chat templates that guided the fine-tuning process.

Now, you may be wondering how much bigger is the adapted model. Let’s have a look.

from torchinfo import summary

model_quant = summary(quantized_model, input_size=(1, 112, 112), col_names=["input_size", "output_size", "num_params", "mult_adds", "trainable"])
model_quant

adapt_model_quant = summary(adapted_model, input_size=(1, 112, 112), col_names=["input_size", "output_size", "num_params", "mult_adds", "trainable"])
adapt_model_quant

SPDX-License-Identifier: MIT