RepoAmazon (Nova)Amazon (Nova)published May 29, 2025seen 5d

amazon-science/confetti

Open original ↗

Captured source

source ↗
published May 29, 2025seen 5dcaptured 9hhttp 200method plain

amazon-science/confetti

License: CC-BY-4.0

Stars: 4

Forks: 1

Open issues: 0

Created: 2025-05-29T03:22:32Z

Pushed: 2025-05-29T03:42:46Z

Default branch: main

Fork: no

Archived: no

README:

ConFETTI: Conversational Function-Calling Evaluation Through Turn-Level Interactions

ConFETTI is a Conversational Function-Calling benchmark that works on the turn-level. The benchmark is designed to evaluate function-calling capabilities and response quality of large language models (LLMs). Current benchmarks lack comprehensive assessment of LLMs in complex conversational scenarios. CONFETTI addresses this gap through 109 human-simulated conversations, comprising 313 user turns and covering 86 APIs.

Conversation Complexities

These conversations explicitly target various conversational complexities, such as follow-ups, goal correction and switching, ambiguous and implicit goals. Below is a list of all included complexities and the number of dialogs covering those complexities.

| COMPLEXITY | # DIALOGS | DESCRIPTION | |---------------------------|-----------|-------------| | EXCEPTION_IN_EXECUTION | 5 | Errors or exceptions that occur during the execution of an action | | FAILED_CONVERSATION | 5 | Interactions where the intended goal is not achieved | | CONFIRMATION | 6 | Requesting user approval before executing an action | | GOAL_SWITCHING | 6 | When the user changes their objective during the conversation | | NO_TARGET_COMPLEXITY | 6 | Conversations without specific complexity requirements | | GOAL_CORRECTION | 7 | Adjusting or refining the user's goal based on feedback | | GOAL_STACKING | 7 | Managing multiple user objectives simultaneously | | AMBIGUOUS_GOAL | 9 | When the user's intention is unclear and requires clarification | | FOLLOWUP_QUESTION | 10 | Additional queries or requests for information after the initial response | | IMPLICIT_DESCRIPTIVE_GOAL | 10 | The user describes a problem/background without directly stating their goal | | OVERFILL | 11 | Providing more information than required for an action | | UNDERFILL | 11 | Missing required arguments or information for an action | | GOAL_NOT_SUPPORTED | 15 | The user's request is not supported by the available tools or is out of scope |

Dialog Acts

We provide dialog act annotation to assess response quality. We include the following dialog acts:

  • Agent seeking information (seek_info): targets responses that elicit information from the

user, whether it is intent elicitation, param- eter elicitation, or asking for clarification or confirmation.

  • Agent informing user (inform): targets responses that provide information to the user,

whether the information is general or specific to the function-calling results.

  • Agent rejecting request (reject): targets responses that reject the user request.
  • Other: any response that does not belong to

the previous dialog acts.

Usage

The data is formatted and structured to work with the Berkeley Function-Calling Leaderboard (BFCL).

Security

See [CONTRIBUTING](CONTRIBUTING.md#security-issue-notifications) for more information.

License

This project is licensed under the CC-By-4.0 License.

Notability

notability 2.0/10

Low traction new repo