What does this repo signal mean?

Google (DeepMind / Gemini) published google-deepmind/loft (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo google-deepmind/loft · language Python. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Evals and quality in the data-business radar.

Google (DeepMind / Gemini) Repo: google-deepmind/loft

Captured source

source ↗

GitHub/github.com/google-deepmind/loft

google-deepmind/loft repository metadata

Source ↗

published Jun 18, 2024seen 5dcaptured 8hhttp 200method plain

google-deepmind/loft

Description: LOFT: A 1 Million+ Token Long-Context Benchmark

Language: Python

License: Apache-2.0

Stars: 233

Forks: 17

Open issues: 1

Created: 2024-06-18T02:23:08Z

Pushed: 2026-04-13T20:45:44Z

Default branch: main

Fork: no

Archived: no

README:

LOFT: A 1 Million+ Token Long-Context Benchmark

This repository houses the resources for LOFT, the Long Context Frontiers benchmark, introduced in the research paper Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More?. LOFT consists of 6 long-context task categories spanning retrieval, multi-hop compositional reasoning, and more, totaling 35 datasets and 4 modalities.

Installation

$ git clone git@github.com:google-deepmind/loft.git
$ cd loft/
$ pip install -r requirements.txt

Download Datasets and Prompts

The script below downloads all the LOFT datasets under BASE_DIR.

$ BASE_DIR=your-choice-of-directory
$ sh download.sh $BASE_DIR

Each dataset is also available from the links in the [Datasets](#datasets) table. For a small subset, download.sh will additionally run preprocess.py, which infills the missing fields in the queries and corpus files. Once the download is completed, you will see the file structure as below:

$BASE_DIR
└── data
├── retrieval
│   ├── arguana
│   │   ├── 128k
│   │   │   ├── corpus.jsonl
│   │   │   ├── dev_queries.jsonl
│   │   │   ├── few_shot_queries.jsonl
│   │   │   └── test_queries.jsonl
│   │   ├── 1m
│   │   └── 32k
│   ├── fever
│   │   ├── ...
│   ├── ...
├── rag
├── sql
├── icl
└── mm

We also provide an example prompt in PROMPT_EXAMPLE.txt showing how Corpus-in-Context (CiC) prompting can be done for the text retrieval task.

Inference and Evaluation

We currently support using Gemini (e.g., gemini-1.5-flash-002) from VertexAI for inference. Please prepare your PROJECT_ID from Google Cloud. To run the inference with gemini-1.5-flash-002 and evaluate predictions:

BASE_DIR=$1
DATASET=$2
LENGTH="128k"
TASK_TYPE="retrieval"
SPLIT="dev"
PROMPT_TYPE="few_shot_with_cot"
PROMPT="${TASK_TYPE}_${DATASET}_${LENGTH}_${SPLIT}:${PROMPT_TYPE}"
echo "Prompt: ${PROMPT}"

mkdir -p ${BASE_DIR}/outputs/${TASK_TYPE}/${DATASET}/${LENGTH}
answer_file_extension="jsonl"

python run_inference.py \
--prompt_name ${PROMPT} \
--task_type ${TASK_TYPE} \
--base_dir ${BASE_DIR} \
--data_dir ${TASK_TYPE}/${DATASET}/${LENGTH} \
--split ${SPLIT} \
--context_length ${LENGTH} \
--output_path ${BASE_DIR}/outputs/${TASK_TYPE}/${DATASET}/${LENGTH}/${SPLIT}_predictions.jsonl \
--project_id ${PROJECT_ID} \
--overwrite

python run_evaluation.py \
--answer_file_path ${BASE_DIR}/data/${TASK_TYPE}/${DATASET}/${LENGTH}/dev_queries.${answer_file_extension} \
--pred_file_path ${BASE_DIR}/outputs/${TASK_TYPE}/${DATASET}/${LENGTH}/${SPLIT}_predictions.jsonl \
--task_type ${TASK_TYPE}

The same script can be found from infer_eval.sh. We provide example queries and predictions files in [evaluation/example_predictions/](evaluation/example_predictions/). Each task_type outputs many different metric scores. To understand which task_type to use for each dataset and also to see the primary evaluation metric reported in the paper for each dataset, see the [Datasets](#datasets) table.

Get Prompts for 3P Evaluation

You can use the following command to get prompts for specific datasets. For instance, the prompts for [LOFT-hard](#loft-hard-subset) below are obtained as follows:

TASK="retrieval"
DATASET="qampari"
LENGTH="128k"
SPLIT="test"
PROMPT_NAME="${TASK}_${DATASET}_${LENGTH}_${SPLIT}:few_shot_with_cot"
python3 dump_prompts.py \
--prompt_name="${PROMPT_NAME}" \
--base_dir="${HOME}" \
--output_format=text \
--output_dir="${HOME}/prompts/${PROMPT_NAME}" \
--output_format=csv

Datasets

Excerpt shown — open the source for the full document.