Objective
The research seeks to address the limitations of existing preference alignment in large language models (LLMs) that uniformly block access to non-preferred knowledge, which could be beneficial for advanced users.
Method
The authors propose a framework called SudoLM that implements authorization alignment, allowing LLMs to control access to parametric knowledge based on user credentials. Authorized users can use a SUDO key to unlock restricted knowledge, while non-qualified users are denied access. The study employs a dataset constructed from public and privileged knowledge samples, utilizing statistical methods for data analysis and fine-tuning the model to minimize cross-entropy loss during training.
Results
Experiments conducted in two application scenarios show that SudoLM effectively manages user access to parametric knowledge and preserves the overall utility of the LLM for diverse user expertise. The framework demonstrates high precision and recall rates in controlling access to sensitive information, indicating strong performance in protecting privileged data.
Significance
The implementation of SudoLM enhances the functionality of LLMs by allowing differentiated access based on user qualifications, which improves the model's utility for advanced users without compromising access control for others. This is particularly important in risk-sensitive applications, such as healthcare, where responsible information access is crucial to prevent misuse while supporting legitimate queries.
ArXiv Link