google-deepmind/predictingthepast

Python

Open original ↗

Captured source

source ↗

google-deepmind/predictingthepast

Language: Python

License: Apache-2.0

Stars: 200

Forks: 21

Open issues: 1

Created: 2025-07-22T07:24:39Z

Pushed: 2025-08-20T06:10:18Z

Default branch: main

Fork: no

Archived: no

README:

Contextualising ancient texts with generative neural networks

Yannis Assael1,\*, Thea Sommerschield2,\*, Alison Cooley3, Brendan Shillingford1, John Pavlopoulos4, Priyanka Suresh1, Bailey Herms5, Justin Grayston5, Benjamin Maynard5, Nicholas Dietrich1, Robbe Wulgaert6, Jonathan Prag7, Alex Mullen2, Shakir Mohamed1

1 Google DeepMind

2 University of Nottingham, UK

3 University of Warwick, UK

4 Athens University of Economics and Business, Greece

5 Google

6 Sint-Lievenscollege, Belgium

7 University of Oxford, UK

\*Authors contributed equally to this work.

---

Citation

When using any of the source code or outputs of this project, please cite:

@article{asssome2025contextualising,
title={Contextualising ancient texts with generative neural networks},
author={Assael*, Yannis and Sommerschield*, Thea and Cooley, Alison and Pavlopoulos, John and Shillingford, Brendan and Herms, Bailey and Suresh, Priyanka and Maynard, Benjamin and Grayston, Justin and Wulgaert, Robbe and Prag, Jonathan and Mullen, Alex and Mohamed, Shakir},
journal={Nature},
volume={643},
number={8073},
year={2025},
publisher={Nature Publishing}
}

---

Human history is born in writing. Inscriptions, among the earliest written forms, offer direct insights into the thought, language, and history of ancient civilisations. Historians capture these insights by identifying parallels - inscriptions with shared phrasing, function, or cultural setting - to enable the contextualisation of texts within broader historical frameworks, and perform key tasks such as restoration and geographical or chronological attribution. However, current digital methods are restricted to literal matches and narrow historical scopes. We introduce Aeneas, the first generative neural network for contextualising ancient texts. Aeneas retrieves textual and contextual parallels, leverages visual inputs, handles arbitrary-length text restoration, and advances the state-of-the-art in key tasks.

Fragment of a bronze military diploma from Sardinia, issued by the Emperor Trajan to a sailor on a warship. 113/14 CE (CIL XVI, 60, The Metropolitan Museum of Art, Public Domain).

To evaluate its impact, we conduct the largest Historian-AI study to date, with historians considering Aeneas’ retrieved parallels useful research starting points in 90% of cases, improving their confidence in key tasks by 44%. Restoration and geographical attribution tasks yielded superior results when historians were paired with Aeneas, outperforming both humans and AI alone. For dating, Aeneas achieved a 13-year distance from ground-truth ranges. We demonstrate Aeneas’ contribution to historical workflows through analysis of key traits in the *Res Gestae Divi Augusti*, the most renowned Roman inscription, showing how integrating Science and Humanities can create transformative tools to assist historians and advance our understanding of the past.

Given the image and textual transcription of an inscription (with damaged sections of unknown-length marked with the "#" character), Aeneas uses a transformer-based decoder, the "torso", to process the text. Specialised networks, called "heads", handle character restoration, date attribution, and geographical attribution (the latter also incorporating visual features). The torso's intermediate representations are merged into a unified, historically-enriched embedding to retrieve similar inscriptions from the LED, ranked by relevance.

References

Aeneas Inference Online

To aid further research in the field we created an online interactive python notebook, where researchers can query one of our trained models to get text restorations, visualise attention weights, and more.

Aeneas Inference Offline

Advanced users who want to perform inference using the trained model may want to do so manually using the predictingthepast library directly.

First, to install the predictingthepast library and its dependencies, run:

pip install .

Then, download the model files.

Latin Model

curl --output aeneas_117149994_2.pkl \
https://storage.googleapis.com/ithaca-resources/models/aeneas_117149994_2.pkl
curl --output led.json \
https://storage.googleapis.com/ithaca-resources/models/led.json
curl --output led_emb_xid117149994.pkl \
https://storage.googleapis.com/ithaca-resources/models/led_emb_xid117149994.pkl

Ancient Greek Model

curl --output ithaca_153143996_2.pkl \
https://storage.googleapis.com/ithaca-resources/models/ithaca_153143996_2.pkl
curl --output iphi.json \
https://storage.googleapis.com/ithaca-resources/models/iphi.json
curl --output iphi_emb_xid153143996.pkl \
https://storage.googleapis.com/ithaca-resources/models/iphi_emb_xid153143996.pkl

Inference Example

An example of using the library can be run via:

python inference_example.py \
--input_file="example_input.txt" \
--checkpoint_path="aeneas_117149994_2.pkl" \
--dataset_path="led.json" \
--retrieval_path="led_emb_xid117149994.pkl" \
--language="latin"

This will run restoration and attribution on the text in example_input.txt.

To run it with different input text, use the --input argument:

python inference_example.py \
--input="..." \
--checkpoint_path="aeneas_117149994_2.pkl" \
--dataset_path="led.json" \
--retrieval_path="led_emb_xid117149994.pkl" \
--language="latin"

Or use text in a UTF-8 encoded text file:

python inference_example.py \
--input_file="some_other_input_file.txt" \
--checkpoint_path="aeneas_117149994_2.pkl" \
--dataset_path="led.json" \
--retrieval_path="led_emb_xid117149994.pkl" \
--language="latin"

The restoration or attribution JSON can be saved to a file:

python inference_example.py \
--input_file="example_input.txt" \…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New repo, moderate stars, research release.