What does this repo signal mean?

OpenBMB (MiniCPM) published OpenBMB/Tell_Me_More (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo OpenBMB/Tell_Me_More · language Python. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

OpenBMB (MiniCPM) Repo: OpenBMB/Tell_Me_More

Captured source

source ↗

GitHub/github.com/OpenBMB/Tell_Me_More

OpenBMB/Tell_Me_More repository metadata

Source ↗

published Feb 1, 2024seen 5dcaptured 11hhttp 200method plain

OpenBMB/Tell_Me_More

Description: Repo for paper "Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents"

Language: Python

License: Apache-2.0

Stars: 66

Forks: 9

Open issues: 1

Created: 2024-02-01T15:10:26Z

Pushed: 2024-02-20T03:28:31Z

Default branch: master

Fork: no

Archived: no

README:

Features • Training • Evaluation • Citation

The repo is for the implementation and evaluation of Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals before starting downstream agent task execution.

Source codes and datasets for [Tell Me More! Towards Implicit User Intention Understanding of Language Model Driven Agents](https://arxiv.org/abs/2402.09205). We release Intention-in-Interaction (IN3) benchmark and develop Mistral-Interact, capable of discerning vague instructions and recovering missing details.

✨ Features

Mistral-Interact has the following features:

Better understanding of user judgments: Among all the open-source models, Mistral-Interact is the best at predicting task vagueness and missing details that users regard as necessary.
Comprehensive summarization of user intentions: Mistral-Interact is effective in making an explicit and comprehensive summary based on detailed user intentions.

Enhanced model-user interaction experience: Mistral-Interact inquires about missing details in vague tasks more reasonably and friendly than other open-source models, thus promoting a clearer understanding of the user’s implicit intentions.

Comparable performance with closed-source GPT-4: We prove that smaller-scale model experts can approach or even exceed general-purpose large-scale models across various aspects including vagueness judgment, comprehensiveness of summaries, and friendliness of interaction.

📖 Introduction

Intention-in-Interaction (IN3) benchmark

Current agent benchmarks usually assume the clearance of given tasks and exclude user intention understanding as an important aspect for evaluation. Given this ignorance in assessment, we formulate Intention-in-Interaction (IN3), a benchmark aiming to test the agent’s interaction ability through explicit task vagueness judgment and user intention understanding.

It is located in the [data/IN3](https://github.com/HBX-hbx/Mistral-Interact/tree/master/data/IN3) directory. You can also download it from here.

As illustrated in the figure above , with human-written seed tasks (Step 1), the model iteratively generates new tasks to augment the dataset, while sampling demonstrations from the dataset as new examples for itself to perform the next round of generation (Step 2). We perform human annotation of each task’s vagueness, missing details, and each detail’s importance level and potential options with the help of GPT-4 (Step 3). GPT-4 will first suggest the task’s vagueness and potential missing details with options and importance level, while human annotators take them as references and adapt them with their own perspectives and intentions.

Mistral-Interact

We apply training split tasks in IN3 to construct simulated model-user conversation records that provide explicit initial thoughts, rounds of queries with options, summarization of implicit intentions, and diverse user response tones.
Training on these conversations, we adapt Mistral-7B into Mistral-Interact, a powerful and robust variant capable of judging the vagueness of user instruction, actively querying for missing details with suggestions, and explicitly summarizing the detailed and clear user intentions.

🛠️ Training

Construction of Training Data

As IN3 has already provided diverse agent tasks with annotations, we apply its training split to construct the conversation records for training.

We employ two GPT-4s to simulate the conversation, with one imitating the user aiming to complete a certain task (User-GPT), and the other as an assistant aiming to clearly understand user intentions with the annotations from IN3 as help (Assistant-GPT).

It is located in the [data/interactions](https://github.com/HBX-hbx/Mistral-Interact/tree/master/data/interactions) directory. You can also download it from here.

With IN3's annotations regarding task vagueness, missing details, and potential options, we apply several strategies during the construction of conversation records to better inspire the target model's robust inquiry and reasoning ability.

Explicit Initial Thought
Query with Options
Diverse User Tones
Explicit Intention Summary

Usage

We utilize the model-center framework to conduct full-parameter fine-tuning of Mistral-7B on two 80GB A800s. Specific hyper-parameters can be tuned in scripts/sft.sh. Here are some parameters need to check:

model_name_or_path: Path to the Mistral-7B-v0.1 base model weights. Note that the weights should be transformed from huggingface weight to bmtrain weight using the script provided [here](https://github.com/OpenBMB/ModelCenter/blob/main/transfer/hugLLaMa2_bmtrainLLaMa2.py).
data_dir: Path to the training data with conversation records.
save_dir: Path to the saved checkpoints.

Just run the script in the root of repo to start training:

bash scripts/sft.sh

🎮 Inference

Download Mistral-Interact here, and put it under ./models. The model weights downloaded from huggingface is the format of huggingface. For inference, we need to convert the format from huggingface to model-center using src/hf_2_mc.py.

Then run the following script in the root of repo to start inferencing:

bash scripts/test_one_new.sh

📊 Evaluation

An agent's intention understanding capability can be assessed directly through user interaction and indirectly through downstream task execution.

Instruction Understanding

Instruction understanding does not involve any real-time agent execution, so we directly evaluate the language models themselves during interaction to judge their…

Excerpt shown — open the source for the full document.