RepoAmazon (Nova)Amazon (Nova)published Dec 3, 2025seen 5d

amazon-science/personalization-editing-upqa

Python

Open original ↗

Captured source

source ↗

amazon-science/personalization-editing-upqa

Language: Python

License: NOASSERTION

Stars: 1

Forks: 0

Open issues: 11

Created: 2025-12-03T23:33:59Z

Pushed: 2026-02-21T18:30:33Z

Default branch: main

Fork: no

Archived: no

README:

Personalization Editing

Repository Overview: This repository contains the code and data for the paper *"Towards Effective Model Editing for LLM Personalization"*

Table of Contents

1. [Overview](#overview) 2. [Repository Structure](#repository-structure) 3. [Installation](#installation) 4. [Usage](#usage)

  • [Data Preparation](#data-preparation)
  • [Running Experiments](#running-experiments)

5. [Citation](#citation)

Repository Structure

  • data/: Contains the datasets used in Personalization Editing.
  • code/: Includes scripts and code to perform Personalization Editing and reproduce the results in the paper.

Installation

To set up the environment for running the code, follow these steps:

1. Clone the repository

2. Create a virtual environment and activate it:

conda create -n edit python=3.9 -y
conda activate edit

3. Install the required dependencies:

pip install -r requirements.txt

Usage

Data Preparation

1. Datasets are stored in the data/ directory. There are following files:

data/
.
├── prefeval_pro
└── UPQA

Data Format

Each generated entry contains:

  • input_attribute: Original persona text
  • attribute_type: High-level category (e.g., "hobby", "profession", "pet", "location")
  • question: Direct question using the attribute_type (e.g., "What's my hobby?")
  • question_paraphrased: Natural rewording of the direct question
  • implicit_question: Conversational question that guides toward the target without naming the attribute
  • product_recommendation_question: Product suggestion question relevant to the attribute_type
  • target: Concise description extracted from the persona

Running Experiments

Quick start test run: To get started (e.g. using ROME to edit llama3-8b on UPQA), run:

cd ./code
python3 edit_cluster.py \
--hparams_dir=ROME/llama3-8b \
--data_path=../data/UPQA/balanced_subset.json \
--device=0 \
--size=100 \

To run the multi-turn evaluation, here is an example:

cd ./code
python run_edit.py \
--hparams_dir=ROME/olmo2-7b \
--data_path=prefeval_pro/prefeval_pro_balanced.json \
--size=100 \
--inter_turns=2 \
--results_dir=prefeval_multi_turn \
--device=0
  • Use --inter_turns to set the number of turns for multi-turn evaluation.

We use claude-3-7-sonnet as the evaluator to assess if model responses match the labels, switch to a local LLM (e.g., Llama3-8b) with ''. For experiments, we recommend using at least one GPU with 48 GB of memory (e.g., NVIDIA RTX A6000) Adjust the device number and evaluation model using --model_eval and --device_eval as shown in the example above.

For full experiments to reproduce the results in the paper: 1. Experiments for clustering-based preference representations:

./run_edit_cluster.sh

2. Experiments for multi-turn:

./run_edit.sh
./run_eval.sh

We evaluate models including Llama-3-8B-Instruct, OLMo-7B-Instruct-hf, Qwen3-8B, DeepSeek-R1-Distill-Qwen-7B, GPT-J-6B and Mistral-7B-v0.3. All parameters are in the code/hparams//.

Acknowledgements

We gratefully acknowledge the use of code and data from the following projects: EasyEdit, ROME, and PrefEval.

Notability

notability 3.0/10

Low traction research repo