Machine unlearning

Machine unlearning is a branch of machine learning focused on removing specific undesired element, such as private data, outdated information, copyrighted material, harmful content, dangerous abilities, or misinformation, without needing to rebuild models from the ground up.

History

Early research efforts were largely motivated by Article 17 of the GDPR, the European Union's privacy regulation commonly known as the "right to be forgotten" (RTBF), introduced in 2014. RTBF was not designed with machine learning in mind. In 2014, policymakers couldn’t foresee the complexity of deep learning’s data-computation mix, making data erasure challenging. This challenge later spurred research into “data deletion” and “machine unlearning.”

Following the deployment of large language models, unlearning is driven by more than just user privacy. The focus has shifted from training small networks on face images to large models trained on data that included also harmful content which needs to be "erased" or forgotten.

References