What does this repo signal mean?

Amazon (Nova) published amazon-science/LARCQ (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo amazon-science/LARCQ · language Python · New research repo from Amazon Science. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Amazon (Nova) Repo: amazon-science/LARCQ

Captured source

source ↗

GitHub/github.com/amazon-science/LARCQ

amazon-science/LARCQ repository metadata

Source ↗

published Aug 12, 2025seen 5dcaptured 13hhttp 200method plain

amazon-science/LARCQ

Description: Codes of LARCQ Paper (Interspeech 2025)

Language: Python

License: Apache-2.0

Stars: 0

Forks: 0

Open issues: 10

Created: 2025-08-12T22:35:31Z

Pushed: 2026-02-05T18:59:38Z

Default branch: main

Fork: no

Archived: no

README:

🚀 Official codes of our Interspeech paper On Retrieval of Long Audios with Complex Text Queries

Project website https://sites.google.com/view/larcq
Paper https://www.isca-archive.org/interspeech_2025/yang25n_interspeech.html

@inproceedings{yang25n_interspeech,
title = {On Retrieval of Long Audios with Complex Text Queries},
author = {Ruochu Yang and Milind Rao and Harshavardhan Sundar and Anirudh Raju and Aparna Khare and Srinath Tankasala and Di He and Venkatesh Ravichandran},
year = {2025},
booktitle = {Interspeech 2025},
pages = {2660--2664},
doi = {10.21437/Interspeech.2025-2085},
issn = {2958-1796},
}

Prerequisite

1. Configure environments

conda create -n larcq python=3.10
conda activate larcq
pip install -r requirements.txt
pip install -e hf-dev-train/transformers-main
pip install -e peft-main

2. Download benchmarks

Save the benchmarks in the datasets folder.

Due to license restriction, we cannot open-source our Clotho_LARCQ and SoundDescs_LARCQ benchmarks. However, we provide the codes of generating the benchmarks. Actually, you can use our codes to generate any LARCQ-style benchmark you want.

3. Download models

Download the clap-htsat-fused model from the Hugging Face model link. Save the model in the models folder.

Download the gpt2 model from the Hugging Face model link. Save the model in the models folder.

Download the Llama-2-7b-chat-hf-qformer folder from the Google Drive website link. Save the folder in the models folder.

Download the stage5_epoch2 folder from the Google Drive website link. Unzip and save the folder in the models folder.

Download the clapcap_weights_2023.pth checkpoint from the Hugging Face website link. Save the checkpoint in the models folder.

Download the opt-iml-max-1.3b folder from the Hugging Face website link. Unzip and save the folder in the models folder.

Download the foundation.pt checkpoint from the Hugging Face website link. Save the checkpoint in the models folder.

Download the ms-marco-MiniLM-L-6-v2 folder from the Hugging Face website link. Unzip and save the folder in the models folder.

4. Nvidia GPUs

The results in the paper are generated in a computer with Nvidia GPUs. Better to have four GPUs and configure nvidia-smi ready.

LARCQ Benchmark Generation

1. Clotho_LARCQ benchmark

We provide the codes of generating our Clotho_LARCQ benchmark based on Clotho Version 2.1 dataset so that you can follow it to create any LARCQ benchmark you want.

(1) Download the clotho_audio_evaluation.7z folder and the clotho_captions_evaluation.csv file from the Zenodo website link. Save them in the datasets/Clotho folder.

(2) Synthesize long-audio-long-query pairs as LARCQ benchmarks

Run terminal command python -m benchmark_generation.synthesize

The raw LARCQ captions are saved as datasets/Clotho_LARCQ/raw_LARCQ_captions.csv The LARCQ audios are saved as 'datasets/Clotho_LARCQ/audios/

(3) Run LLMs to refine the raw LARCQ captions

We use two options to refine the raw LARCQ captions into natural long queries.

Condense the raw captions

Run terminal command python -m benchmark_generation.llm_condense The condensed LARCQ captions are saved as datasets/Clotho_LARCQ/condensed_caption.csv

Rephrase the raw captions

Run terminal command python -m benchmark_generation.llm_rephrase The rephrased LARCQ captions are saved as datasets/Clotho_LARCQ/rephrased_caption.csv

2. SoundDescs_LARCQ benchmark

(1) Download the original SoundDescs dataset from the official GitHub website link. Save them in the datasets/SoundDescs folder.

(2) We filter for audios between 75-150 seconds with captions exceeding 150 characters as complex queries. This results in 1639 audio-query pairs, forming our SoundDescs-LARCQ benchmark.

Run Pipeline

Our pipeline consists of two main parts: multi-modal retrieval and ALM/LLM refining.

1. Run multi-modal rertieval

The retrieval scripts are in the folder pipeline/multi_modal_retrieval. Each script is independent and can be directly executed, which means that you can evaluate any method on any dataset for comprehensive comparison.

(1)retrieval_no_chunking.py is to retrieve the relevant audios given the queries without any audio chunking or query chunking applied. Run terminal command python -m pipeline.multi_modal_retrieval.retrieval_no_chunking Retrieved short-list audios are saved as results/retrieved_results/{benchmark}/retrieved_audios_no_chunking.csv

(2)retrieval_audio_chunking.py is to retrieve the relevant audios given the queries with only audio chunking max/sum vote and without any query chunking. Run terminal command python -m pipeline.multi_modal_retrieval.retrieval_audio_chunking Retrieved short-list audios are saved as results/retrieved_results/{benchmark}/retrieved_audios_audio_chunking.csv

(3)retrieval_query_chunking.py is to retrieve the relevant audios given the queries with only query chunking max/sum vote and without any audio chunking. Run terminal command python -m pipeline.multi_modal_retrieval.retrieval_query_chunking Retrieved short-list audios are saved as results/retrieved_results/{benchmark}/retrieved_audios_query_chunking.csv

(4)retrieval_audio_chunking_query_chunking.py is to apply the four combinations of audio chunking max vote × query chunking sum vote, audio chunking sum vote × query chunking sum vote, audio chunking sum vote × query chunking max vote, audio chunking max vote × query chunking max vote to retrieve the audios. Run terminal command `python -m…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New research repo from Amazon Science