Record instruction fine-tuning Qwen-7B-Chat

The writing is a bit casual, and the teacher provided four topics:

YOLO Object Detection - 6 classes
Unet Object Segmentation - 3 classes
Fine-tuning large models
Poetry continuation based on LSTM.

The first two topics don't teach anything, and the fourth one is too troublesome with tokenization and dataset cleaning.

So I went ahead and chose the third one. I haven't done it before and thought I could learn something while having some fun. It requires a doctor that can converse. However, I actually want to create a cyber girlfriend instead. =-=.

The girlfriend will have to wait until next time.

I'm quite happy that I can run lora fine-tuning locally. The dataset needs some ideas and references.

Instruction Fine-tuning#

Usually refers to using the following format as the model training set:

{
  "instruction": "Now you need to play the role of a woman beside the emperor -- Zhen Huan",
  "input": "Who are you?",
  "output": "My father is Zhen Yuandao, the minor official of the Dali Temple."
}

There is a very interesting model that fine-tunes a large model using Zhen Huan's dialogues. It seems to be called Chat-Huan Huan.

However, what I am going to do this time is relatively less interesting; it's a doctor used to answer questions about symptoms and medical issues. If possible, I still prefer to fine-tune a cyber girlfriend.

But for now, let's first familiarize ourselves with the process, and later we can fine-tune as we wish.

The large model has wrapped up the tokenizer and everything; we just need to put the dataset in.

Those embeddings, masks, and attention don't need to be considered, which allows even a lazy person like me to play.

The QA dataset I made:

Link: https://pan.baidu.com/s/1VX-ffLlXjShXJOUlkAZIwA?pwd=tv2y Extraction code: tv2y

Top 3 previews:

{
    "instruction": "Headache and dizziness for more than ten years. Wind and sun exposure. Noise causes headaches, what should be done if no results were found in the examination?",
    "input": "",
    "output": ": Consider it as a type of neural headache, a functional disease caused by nervous dysfunction due to stress or psychological factors, with some genetic factors. Generally, consider taking some traditional Chinese medicine like Chuanxiong Cha Tiao San for treatment. Combining acupuncture and massage will yield better results."
},
{
    "instruction": "Eye fatigue, always dry and hard to open",
    "input": "",
    "output": "Generally caused by conjunctivitis or dry inflammation, you can check at the hospital."
},
{
    "instruction": "What is under a woman's left breast? What does it mean if it hurts there?",
    "input": "",
    "output": "The heart is located there, and heart angina can occur there; it may also be caused by stomach pain, which is slightly lower than the heart."
}

Some questions may correspond to multiple answers.

The dataset source is: zhangsheng93/cMedQA2

I just did some simple processing to convert the dataset into JSON format.

Additionally, I cleaned out responses longer than 100 or 150 characters because I don't have enough GPU memory, and even if inputted, they need to be truncated.

There are two training methods here: one is what I tried on a small dataset, training the top 100 for 60 rounds to make the model remember the answers, and the other is to increase the dataset size and only train for three or four rounds.

I haven't decided yet.

alt text

This is the loss situation of my training. If I only train for two or three rounds, the model has only superficially looked at the data, and I feel it won't change its original way of answering much, although some friends might say that LLM's loss is not useful.

However, I hope the model can change its original way of answering by listing all possibilities one by one. I prefer it to provide just one possibility like the dataset I provided and then give a solution. I hope it can learn that way of speaking; only then will I consider my fine-tuning successful.

Because if I want to train a cyber girlfriend in the future, I don't want her to talk to me like this:

What do you want to eat tonight? I think these are good:

- Hot pot, eating hot pot in winter is the best choice
- Sushi, sushi is a type of Japanese cuisine (it seems to have some strange stuff mixed in, this was written by copilot)
- Roast duck, roast duck is a specialty of Beijing
- Grilled fish, grilled fish is a specialty of Hunan

I hope she tells me directly like this:

Let's eat hot pot tonight; it's the best in winter.

Exciting!

Prompts can change some of the model's answering styles, but it seems unable to fundamentally change the first type of answering, and I hope to achieve the second type. So I choose fine-tuning; therefore, I choose to train more.

After half an hour of fine-tuning the Top 100 for 60 rounds, the results are quite satisfactory on the training set:

alt text

Although I don't quite agree with stewing dog meat, it leaves me a bit puzzled.

alt text

It seems to be related to dog meat. However, there is diversity; I remember there was also mention of eating Liuwei Dihuang Wan. =-.

This is the QA:

"instruction": "Is sweating every morning before getting up curable? I have a stomach disease but haven't used medication before. Is there any folk remedy for sweating every morning before getting up?",
"input": "",
"output": "Easily sweating may indicate qi deficiency; you can use astragalus to brew tea and drink more chicken soup. You can also take some Bu Zhong Yi Qi Wan, Shen Ling Bai Zhu San, and Bu Zhong Yi Qi Tang Wan."

Let me try one outside the training set.

alt text

I have to say, I highly agree with this.

copilot, do you think it's better to run at night or in the morning?

- Morning running, morning running helps improve the body's metabolism and aids in weight loss.
- Night running, night running helps relax the body and mind and aids in sleep.
- Morning running, morning running helps improve the body's metabolism and aids in weight loss.
- Night running, night running helps relax the body and mind and aids in sleep.

This was written by copilot.

Don't you know that running after nine o'clock at night makes it harder to sleep?

Let's stop here and continue.

However, I thought of a training method, which is to first train multiple rounds on a small dataset and then train fewer rounds on a large dataset, allowing the model to learn the speaking tone and improve its expressive ability.

But this involves checkpoint continuation, and I'm not sure if llama-factory requires additional operations.

Fine-tuning Tools:#

~~datawhalechina/self-llm~~

This repository has documented the fine-tuning process of many large models using Jupyter Notebook + Markdown, and users can basically run it once.

It can be said to be lazy enough, but there are still potential problems. As time goes on, with updates to Python library versions, some breaking changes were introduced in transformers>4.35.0, causing many models to not run as before.

Even if I try to synchronize the same environment as the author, I still encounter issues, such as encountering similar win errors where files cannot be found.

At this point, Docker is needed.

After a day of torment, I decisively switched to Docker.

4060Ti16G GPU graphical fine-tuning training of Qwen model (suitable for beginners)

Here, someone packaged llama-factory into a Docker image, and I just need to install the image and run it directly.

The only downside is that he didn't specify the version when adding packages later:

pip install einops transformers_stream_generator optimum auto-gptq -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn

By default, this installs the latest version, and if too much time passes between the video and my attempt, new issues may arise, so I should come back to specify the version if I manage to run it.

However, the benefit of Docker is that you don't even have to worry about the results caused by slight differences in the system environment. The starting point is already very high.

Indeed, this thing has stalled me for an hour, but fortunately, my GPU can be called directly. This solves a lot of trouble; I will record the version after I finish testing.

The final confirmed versions are:

>>> import einops
>>> einops.__version__
'0.8.0'
>>> import transformers
>>> transformers.__version__
'4.34.1'

(llama-factroy) root@docker-desktop:/LLaMA-Factory# pip show transformers_stream_generator
Name: transformers-stream-generator
Version: 0.0.5

>>> import datasets
>>> datasets.__version__
'2.14.6'

(llama-factroy) root@docker-desktop:/LLaMA-Factory# pip show optimum
Name: optimum
Version: 1.23.3

(llama-factroy) root@docker-desktop:/LLaMA-Factory# pip show auto-gptq
Name: auto-gptq
Version: 0.6.0

If written as requirements.txt:

einops==0.8.0
transformers==4.34.1
transformers-stream-generator==0.0.5
datasets==2.14.6
optimum==1.23.3
auto-gptq==0.6.0

In addition to these package issues, you also need to solve the --gpu all problem. If you are on Windows, you can refer to Windows enabling Docker to support NVIDIA Container Toolkit.

Doing these things is certainly most convenient on Linux, but unfortunately, my computer is at home, and I can only connect to Windows.

Steps:#

1. Download the dataset and place it in the `data` folder.#

You need a dataset_info.json. Refer to the link to my dataset in the cloud.

Note that the sha1 key inside cannot be deleted; I remember this is used for verification, but even if I use the one left over from before, it still works.

However, if you delete it, in llama-factory, it will keep reporting that dataset_info.json cannot be found.

2. Download the Docker image:#

docker pull bucess/llama-factory:1

I am not using the official image.

Its download might take a while, and sometimes some layers seem to get stuck, but don't worry, find a relatively stable network environment, connect your adapter, set up your proxy, and wait.

With the image, starting it each time is quite fast. Even after the container is set up, you can directly use start -i next time.

3. Start the container:#

docker run -it --name llama-factory --gpus all --network host --shm-size 4g -v D:\senmen\data:/LLaMA-Factory/data bucess/llama-factory:1 /bin/bash

You need to change D:\senmen\data to the location of your dataset. Of course, if you are confident, you can also put the model in. Just be careful not to accidentally delete it.

The best attempt is to use docker cp to copy the model.

4. Add Python packages and start llama-factory#

pip install einops==0.8.0 transformers==4.34.1 transformers-stream-generator==0.0.5 datasets==2.14.6 optimum==1.23.3 auto-gptq==0.6.0 -i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn

-i https://pypi.tuna.tsinghua.edu.cn/simple/ --trusted-host pypi.tuna.tsinghua.edu.cn

This parameter is quite useful because the container seems to have no proxy by default, so using this means you don't have to set a temporary http_proxy.

Modify src/train_web.py (optional):

After I started the Gradio application, I couldn't access the panel locally via ip+port, so I used Gradio's share parameter.

(llama-factroy) root@docker-desktop:/LLaMA-Factory# cat src/train_web.py
from llmtuner import create_ui


def main():
    demo = create_ui()
    demo.queue()
    demo.launch(share=True, inbrowser=True)


if __name__ == "__main__":
    main()

It is worth noting that you need to download this:

https://cdn-media.huggingface.co/frpc-gradio-0.2/frpc_linux_amd64

It is always flagged by Windows as a virus and automatically deleted.

You need to place it in the right place and rename it; running share=True without this file will throw an error, but I won't elaborate on this here. You can also refer to:

https://github.com/gradio-app/gradio/issues/8186

Here are the details of the error.

5. Start llama-factory#

python src/train_web.py

If you executed step 4, this will generate a temporary URL that you can call from any computer.

Something like this: https://d6cdc0f5cda64dd72b.gradio.live/

Since my computer is at home, and I need to train, call, and demonstrate at school, this is very convenient.

6. Load our top 100 dataset and train#

After this, it will all be web UI operations, and as a blogger, I won't write about it.

You need to change the "name" after dataset_info.json to "qa_top100.json".

For specific operations, you can refer to: 4060Ti16G GPU graphical fine-tuning training of Qwen model (suitable for beginners)

Interestingly, it uses a method of training with int4 quantized models and finally exporting them as float32, which not only lowers the training threshold but also makes inference speed very fast after exporting the non-quantized model.

This directly solves my biggest problem, as I previously tried to train a non-quantized model, and even with lora and a batch_size set to 1, it still exploded the GPU memory and was unstable, causing remote desktop black screens and flickering.

After this operation, I set my batch_size to 8, with GPU memory usage around 13G, fully utilized, but it doesn't affect my remote desktop, which is great.

7. Checkpoint continuation (not done yet)#

This is what I want to do last because I want to train multiple rounds on a small dataset and then fewer rounds on a large dataset, so the model can learn both the speaking style and gain more knowledge from the large dataset.