OpenBMB/Tell_Me_More
Python
Captured source
source ↗OpenBMB/Tell_Me_More
Description: Repo for paper "Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents"
Language: Python
License: Apache-2.0
Stars: 66
Forks: 9
Open issues: 1
Created: 2024-02-01T15:10:26Z
Pushed: 2024-02-20T03:28:31Z
Default branch: master
Fork: no
Archived: no
README:
Features • Training • Evaluation • Citation
The repo is for the implementation and evaluation of Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals before starting downstream agent task execution.
Source codes and datasets for [Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents](https://arxiv.org/abs/2402.09205). We release Intention-in-Interaction (IN3) benchmark and develop Mistral-Interact, capable of discerning vague instructions and recovering missing details.
✨ Features
Mistral-Interact has the following features:
- Better understanding of user judgments: Among all the open-source models, Mistral-Interact is the best at predicting task vagueness and missing details that users regard as necessary.
- Comprehensive summarization of user intentions: Mistral-Interact is effective in making an explicit and comprehensive summary based on detailed user intentions.
- Enhanced model-user interaction experience: Mistral-Interact inquires about missing details in vague tasks more reasonably and friendly than other open-source models, thus promoting a clearer understanding of the user’s implicit intentions.
- Comparable performance with closed-source GPT-4: We prove that smaller-scale model experts can approach or even exceed general-purpose large-scale models across various aspects including vagueness judgment, comprehensiveness of summaries, and friendliness of interaction.
📖 Introduction
Intention-in-Interaction (IN3) benchmark
Current agent benchmarks usually assume the clearance of given tasks and exclude user intention understanding as an important aspect for evaluation. Given this ignorance in assessment, we formulate Intention-in-Interaction (IN3), a benchmark aiming to test the agent’s interaction ability through explicit task vagueness judgment and user intention understanding.
It is located in the [data/IN3](https://github.com/HBX-hbx/Mistral-Interact/tree/master/data/IN3) directory. You can also download it from here.
As illustrated in the figure above , with human-written seed tasks (Step 1), the model iteratively generates new tasks to augment the dataset, while sampling demonstrations from the dataset as new examples for itself to perform the next round of generation (Step 2). We perform human annotation of each task’s vagueness, missing details, and each detail’s importance level and potential options with the help of GPT-4 (Step 3). GPT-4 will first suggest the task’s vagueness and potential missing details with options and importance level, while human annotators take them as references and adapt them with their own perspectives and intentions.
Mistral-Interact
- We apply training split tasks in IN3 to construct simulated model-user conversation records that provide explicit initial thoughts, rounds of queries with options, summarization of implicit intentions, and diverse user response tones.
- Training on these conversations, we adapt Mistral-7B into Mistral-Interact, a powerful and robust variant capable of judging the vagueness of user instruction, actively querying for missing details with suggestions, and explicitly summarizing the detailed and clear user intentions.
🛠️ Training
Construction of Training Data
- As IN3 has already provided diverse agent tasks with annotations, we apply its training split to construct the conversation records for training.
- We employ two GPT-4s to simulate the conversation, with one imitating the user aiming to complete a certain task (User-GPT), and the other as an assistant aiming to clearly understand user intentions with the annotations from IN3 as help (Assistant-GPT).
It is located in the [data/interactions](https://github.com/HBX-hbx/Mistral-Interact/tree/master/data/interactions) directory. You can also download it from here.
With IN3's annotations regarding task vagueness, missing details, and potential options, we apply several strategies during the construction of conversation records to better inspire the target model's robust inquiry and reasoning ability.
- Explicit Initial Thought
- Query with Options
- Diverse User Tones
- Explicit Intention Summary
Usage
We utilize the model-center framework to conduct full-parameter fine-tuning of Mistral-7B on two 80GB A800s. Specific hyper-parameters can be tuned in scripts/sft.sh. Here are some parameters need to check:
model_name_or_path: Path to the Mistral-7B-v0.1 base model weights. Note that the weights should be transformed from huggingface weight to bmtrain weight using the script provided [here](https://github.com/OpenBMB/ModelCenter/blob/main/transfer/hugLLaMa2_bmtrainLLaMa2.py).data_dir: Path to the training data with conversation records.save_dir: Path to the saved checkpoints.
Just run the script in the root of repo to start training:
bash scripts/sft.sh
🎮 Inference
Download Mistral-Interact here, and put it under ./models. The model weights downloaded from huggingface is the format of huggingface. For inference, we need to convert the format from huggingface to model-center using src/hf_2_mc.py.
Then run the following script in the root of repo to start inferencing:
bash scripts/test_one_new.sh
📊 Evaluation
An agent's intention understanding capability can be assessed directly through user interaction and indirectly through downstream task execution.
Instruction Understanding
Instruction understanding does not involve any real-time agent execution, so we directly evaluate the language models themselves during interaction to judge their…
Excerpt shown — open the source for the full document.