Striking The Strings of Big Models | Delta Tuning Results Featured on The Cover of A Nature…
Captured source
source ↗Striking The Strings of Big Models | Delta Tuning Results Featured on The Cover of A Nature Sub-journal! | by OpenBMB | Medium
Sign up
Get app
Sign up
Striking The Strings of Big Models | Delta Tuning Results Featured on The Cover of A Nature Sub-journal!
9 min read
Mar 27, 2023
--
Share
Press enter or click to view image in full size
The OpenBMB team’s research achievement, “Parameter-efficient Fine-tuning of Large-scale Pre-trained Language Models,” was published on March 2nd in Nature Machine Intelligence. The sub-journal was published on March 23rd, and our product was selected as the main cover for this sub-journal of the magazine.
Press enter or click to view image in full size
The cover designed by OpenBMB&ModelBest Designer Qirui Shao
Sub-journal 🔗: https://www.nature.com/natmachintell/volumes/5/issues/3
The co-first authors of the paper, Ning Ding and Yujia Qin, and the corresponding authors, Zhiyuan Liu and Maosong Sun, are all key members of the OpenBMB open-source community. This research was supported by the Ministry of Science and Technology of the People´s Republic of China’s major scientific and technological innovation project 2030 “New Generation Artificial Intelligence,” the National Natural Science Foundation of China, Beijing Academy of Artificial Intelligence, and Tsinghua University Guoqiang Institute.
This paper defines and describes the Delta Tuning problem and systematically reviews previous research through a unified framework. The paper develops theoretical analyses of delta-tuning by proposing theoretical frameworks from two different perspectives of optimization and optimal control, guiding the subsequent design of structures and algorithms. Additionally, the paper conducts a comprehensive experimental comparison of representative methods and demonstrates the comprehensive performance of different methods in over 100 NLP tasks. The experimental results cover research analyses on the performance, convergence, efficiency, combinability, power of scale, and transferability of Delta Tuning. The team also developed an open-source toolkit, OpenDelta, which enables practitioners to efficiently and flexibly implement Delta Tuning on PLMs.
➤ DeltaTuning Paper link: 🔗 https://www.nature.com/articles/s42256-023-00626-4
➤ OpenDelta Open-sourced Toolkit: 🔗 https://github.com/thunlp/OpenDelta
Background of Delta Tuning
In 2018, pre-trained language models (PLMs) emerged, and the “pre-training-fine-tuning” method has become the mainstream paradigm for NLP tasks. In this new paradigm, we can use large-scale unlabeled data to PLMs through self-supervised learning, obtain the basic model, and then fine-tune the model parameters with labeled data from downstream tasks to adapt to downstream tasks. The recently popular ChatGPT is a representative of large PLMs, and more and more experiments and practices have shown that larger models not only perform better on known tasks but also demonstrate strong generalization ability to complete more complex unknown tasks.
Press enter or click to view image in full size
Traditional Deep Learning Paradigm vs Big Model Pre-training-Fine-tuning Paradigm
However, larger models also face greater challenges in applications. The traditional method of full-parameter fine-tuning for ultra-large-scale pre-training models consumes a lot of GPU computing and storage resources in the process, and the huge cost is daunting. The paper selected 1200 papers from the latest six NLP conferences and found that although pre-training models have become the mainstream paradigm, papers involving large PLMs are few and far between.
To address this challenge, parameter-efficient fine-tuning methods have gradually attracted attention. Compared with full-parameter fine-tuning, parameter-efficient fine-tuning methods freeze more than 99% of the parameters of the pre-training model, and only use a small amount of downstream task data to fine-tune less than 1% of the model size parameters as model plug-ins to achieve the adaptation of big models to downstream tasks, achieving performance comparable to full-parameter fine-tuning, and significantly reducing the computational and storage costs of the fine-tuning process.
Delta Tuning: Method and Analysis
Our research proposes that the essence of parameter-efficient fine-tuning methods is to adjust the “delta parameters”. Therefore, we name such methods “Delta Tuning”, where “delta” is a mathematical symbol commonly used to represent changes and is borrowed here to refer to the parameter parts that are “changed” during training. Based on a unified analytical framework, we summarize and classify existing Delta Tuning methods into three categories: Addition-based, Specification-based, and Reparameterization-based methods. In order to guide future model architecture and algorithm design, our study further proposes a theoretical framework for Delta Tuning from the perspectives of parameter optimization and optimal control, providing feasible solutions for exploring and explaining the underlying mechanisms of Delta Tuning.
Press enter or click to view image in full size
The Delta Tuning Framework
The Theoretical Perspective of Delta Tuning
Are these methods essentially doing the same thing? We believe that Delta Tuning methods not only have high practical value, but also have profound theoretical significance. They all seem to prove one thing: the adaptation process of PLMs seems to be a very low-cost process (compared to pre-training), and it can be achieved with very few data and parameters. The success of Delta Tuning inspired us to further explore the theoretical framework behind model adaptation. This paper proposes a framework from the perspectives of optimization and optimal control to interpret Delta Tuning at the theoretical level.
From the optimization perspective, we analyze the effect of Delta Tuning and discuss the design of some Delta Tuning methods under the low-dimensional assumption. After using Delta Tuning, the objective function and the parameters it depends on may change. For the new objective function, only the parameters related to Delta Tuning are optimized. If the initial value is good enough, the model’s performance will not be greatly damaged in a certain sense. However, to ensure the effectiveness of Delta Tuning, it is necessary to develop the structure of the problem to design this new objective function. The starting point is to use the…
Excerpt shown — open the source for the full document.