amazon-science/Personalized-chat-interaction-autocomplete
Python
Captured source
source ↗amazon-science/Personalized-chat-interaction-autocomplete
Language: Python
License: NOASSERTION
Stars: 0
Forks: 0
Open issues: 0
Created: 2026-01-04T12:55:50Z
Pushed: 2026-01-05T13:05:17Z
Default branch: main
Fork: no
Archived: no
README: Before running experiments, get the needed packages by running:
pip install -r requirements.txt
Experiments are run in 2 stages:
1. Inference - run_baselines.py Generates completions for all prefixes in the dataset for a single model. Output is a completions .pkl file saved in ./completions. Important arguments (full list of arguments can be found in the file):
model_id: any huggingface model. If you have a finetuned model, try to give it a meaningful name that contains the word "finetuned", and then specify a path to S3 for your finetuned model in the code beforeprepare_modelis called.dataset: currently supported datasets appear in thechoicesfield of this argument. Your chosen dataset have a preperation file, such assrc/prepare_prism.pyandsrc/prepare_wildchat.py.gpu_id: to simultaneously run several models on different GPUs of the same instance, eachgpu_idcorresponds to a different port.- Inference arguments:
best_of,max_new_tokens,top_n_tokens,temperature,top_p.
Make sure to put your huggingface token in ./resources/hf_token.txt.
Example:
python run_baselines.py --dataset wildchat --model_id mistralai/Mistral-7B-v0.1 --gpu_id 0 --best_of 5 --temperature 1.0
2. Metrics - metrics.py Computes metrics such as saved@k and acceptance_rate@k. Outputs a .csv file saved in ./results/saved_at_k. Important arguments (full list of arguments can be found in the file):
model_id: if None, run for all models inmodels_list. You can specify a single model to run the metrics on.dataset: used to find the saved completions path, so make sure to use the appropriate name according to the dataset inference was run on.rank_by: confidence measure to rank completions by, e.g.: perplexity, entropty etc. Currentlylog_likelihoodis the best ranker for all models and datasets, so use it as default if you're not experimenting.
Example:
python src/metrics.py --dataset wildchat --model_id mistralai/Mistral-7B-v0.1 --rank_by log_likelihood --personalization_scheme recency_turns --personalization_r 20
There are other arguments referring to previous experiments, such as word/char prefixes, single word/partial completions etc. The default values of these arguments are the ones we ran our baseline experiments with.
Notability
notability 5.0/10New research repo from Amazon Science.