Fine-tuning (deep learning)

In machine learning, fine-tuning is an approach to transfer learning in which the weights of a pre-trained model are trained on new data.^[1] Fine-tuning can be done on a subset of the layers of a neural network or on the entire network.^[2] In the first case, the layers that are not being fine-tuned are "frozen" and not updated during backpropagation step.

For some architectures such as convolutional neural networks, it is common to keep the earlier layers frozen because they are believed to capture lower level features of images, unlike the final layers which often focus on high level features that can be more related to the specific task that the model is trained on.^[2]^[3]

Fine-tuning is also common in natural language processing (NLP), especially in the domain of language modeling. Large language models like OpenAIs GPT-2 have been fine-tuned on downstream NLP tasks to produce better results than the pre-trained model can normally achieve. Models that are pre-trained on large and general corpora are usually fine-tuned on downstream tasks by reusing the model’s parameters as a starting point and adding a task-specific layer trained from scratch.^[4]^[5] Fully fine-tuning the model often yields better results, but it is a more computationally expensive approach.^[4] Fully fine-tuning a language model is also more prone to overfitting and may cause the model to perform worse out-of-distribution.^[6]

References

^ Quinn, Joanne (2020). Dive into deep learning: tools for engagement. Thousand Oaks, California. p. 551. ISBN 978-1-5443-6137-6. Archived from the original on January 10, 2023. Retrieved January 10, 2023.{{cite book}}: CS1 maint: location missing publisher (link)
^ ^a ^b "CS231n Convolutional Neural Networks for Visual Recognition". cs231n.github.io. Retrieved 9 March 2023.
^ Zeiler, Matthew D; Fergus, Rob (2013). "Visualizing and Understanding Convolutional Networks". doi:10.48550/arXiv.1311.2901. {{cite journal}}: Cite journal requires |journal= (help)
^ ^a ^b Dingliwal, Saket; Shenoy, Ashish; Bodapati, Sravan; Gandhe, Ankur; Gadde, Ravi Teja; Kirchhoff, Katrin (2021). "Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems". doi:10.48550/arXiv.2112.08718. {{cite journal}}: Cite journal requires |journal= (help)
^ Dodge, Jesse; Ilharco, Gabriel; Schwartz, Roy; Farhadi, Ali; Hajishirzi, Hannaneh; Smith, Noah (2020). "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping". doi:10.48550/arXiv.2002.06305. {{cite journal}}: Cite journal requires |journal= (help)
^ Kumar, Ananya; Raghunathan, Aditi; Jones, Robbie; Ma, Tengyu; Liang, Percy (2022). "Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution". doi:10.48550/arXiv.2202.10054. {{cite journal}}: Cite journal requires |journal= (help)

[d2l-1] Quinn, Joanne (2020). Dive into deep learning: tools for engagement. Thousand Oaks, California. p. 551. ISBN 978-1-5443-6137-6. Archived from the original on January 10, 2023. Retrieved January 10, 2023.{{cite book}}: CS1 maint: location missing publisher (link)

[cs231n-2] "CS231n Convolutional Neural Networks for Visual Recognition". cs231n.github.io. Retrieved 9 March 2023.

[3] Zeiler, Matthew D; Fergus, Rob (2013). "Visualizing and Understanding Convolutional Networks". doi:10.48550/arXiv.1311.2901. {{cite journal}}: Cite journal requires |journal= (help)

[amazon-4] Dingliwal, Saket; Shenoy, Ashish; Bodapati, Sravan; Gandhe, Ankur; Gadde, Ravi Teja; Kirchhoff, Katrin (2021). "Prompt Tuning GPT-2 language model for parameter-efficient domain adaptation of ASR systems". doi:10.48550/arXiv.2112.08718. {{cite journal}}: Cite journal requires |journal= (help)

[5] Dodge, Jesse; Ilharco, Gabriel; Schwartz, Roy; Farhadi, Ali; Hajishirzi, Hannaneh; Smith, Noah (2020). "Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping". doi:10.48550/arXiv.2002.06305. {{cite journal}}: Cite journal requires |journal= (help)

[6] Kumar, Ananya; Raghunathan, Aditi; Jones, Robbie; Ma, Tengyu; Liang, Percy (2022). "Fine-Tuning can Distort Pretrained Features and Underperform Out-of-Distribution". doi:10.48550/arXiv.2202.10054. {{cite journal}}: Cite journal requires |journal= (help)

[1]

[2]

[3]

[4]

[5]

[6]

See also

References