amazon-science/personalization-editing-upqa
Python
Captured source
source ↗amazon-science/personalization-editing-upqa
Language: Python
License: NOASSERTION
Stars: 1
Forks: 0
Open issues: 11
Created: 2025-12-03T23:33:59Z
Pushed: 2026-02-21T18:30:33Z
Default branch: main
Fork: no
Archived: no
README:
Personalization Editing
Repository Overview: This repository contains the code and data for the paper *"Towards Effective Model Editing for LLM Personalization"*
Table of Contents
1. [Overview](#overview) 2. [Repository Structure](#repository-structure) 3. [Installation](#installation) 4. [Usage](#usage)
- [Data Preparation](#data-preparation)
- [Running Experiments](#running-experiments)
5. [Citation](#citation)
Repository Structure
data/: Contains the datasets used in Personalization Editing.code/: Includes scripts and code to perform Personalization Editing and reproduce the results in the paper.
Installation
To set up the environment for running the code, follow these steps:
1. Clone the repository
2. Create a virtual environment and activate it:
conda create -n edit python=3.9 -y conda activate edit
3. Install the required dependencies:
pip install -r requirements.txt
Usage
Data Preparation
1. Datasets are stored in the data/ directory. There are following files:
data/ . ├── prefeval_pro └── UPQA
Data Format
Each generated entry contains:
input_attribute: Original persona textattribute_type: High-level category (e.g., "hobby", "profession", "pet", "location")question: Direct question using the attribute_type (e.g., "What's my hobby?")question_paraphrased: Natural rewording of the direct questionimplicit_question: Conversational question that guides toward the target without naming the attributeproduct_recommendation_question: Product suggestion question relevant to the attribute_typetarget: Concise description extracted from the persona
Running Experiments
Quick start test run: To get started (e.g. using ROME to edit llama3-8b on UPQA), run:
cd ./code python3 edit_cluster.py \ --hparams_dir=ROME/llama3-8b \ --data_path=../data/UPQA/balanced_subset.json \ --device=0 \ --size=100 \
To run the multi-turn evaluation, here is an example:
cd ./code python run_edit.py \ --hparams_dir=ROME/olmo2-7b \ --data_path=prefeval_pro/prefeval_pro_balanced.json \ --size=100 \ --inter_turns=2 \ --results_dir=prefeval_multi_turn \ --device=0
- Use
--inter_turnsto set the number of turns for multi-turn evaluation.
We use claude-3-7-sonnet as the evaluator to assess if model responses match the labels, switch to a local LLM (e.g., Llama3-8b) with ''. For experiments, we recommend using at least one GPU with 48 GB of memory (e.g., NVIDIA RTX A6000) Adjust the device number and evaluation model using --model_eval and --device_eval as shown in the example above.
For full experiments to reproduce the results in the paper: 1. Experiments for clustering-based preference representations:
./run_edit_cluster.sh
2. Experiments for multi-turn:
./run_edit.sh ./run_eval.sh
We evaluate models including Llama-3-8B-Instruct, OLMo-7B-Instruct-hf, Qwen3-8B, DeepSeek-R1-Distill-Qwen-7B, GPT-J-6B and Mistral-7B-v0.3. All parameters are in the code/hparams//.
Acknowledgements
We gratefully acknowledge the use of code and data from the following projects: EasyEdit, ROME, and PrefEval.
Notability
notability 3.0/10Low traction research repo