google-deepmind/century
Captured source
source ↗google-deepmind/century
License: Apache-2.0
Stars: 2
Forks: 0
Open issues: 0
Created: 2025-02-28T02:51:12Z
Pushed: 2025-02-28T22:43:30Z
Default branch: main
Fork: no
Archived: no
README:
Century
Century: A Framework and Dataset for Evaluating Historical Contextualisation of Sensitive Images (Akbulut et al. 2025)
To better understand how multi-modal models describe and provide context on historical figures and events, we introduce Century – a novel dataset of 1,500 sensitive historical images.
This dataset consists of images from recent history, created through an automated method combining knowledge graphs and language models with quality and diversity criteria created from the practices of museums and digital archives.
_Century_ contains images that depict events and figures that are diverse across topics and represents all regions of the world.
Instructions on replicating can be found in the ICLR paper.
Data files
We release three datasets.
century_dataset_images_only.csv
This file contains the canonical Century dataset, with links to all images in _Century_ as Wikipedia links, alongside other metadata attributes. It has six columns:
image_urlindicating the image’s location on Wikimedia Commonswikipedia_urlindicating where the image appears on Wikipediawit_splitindicating which split of the Wikipedia Text-Image Dataset the image was sourced fromcentury_methodindicating which method was used to source the search term corresponding to the imageis_starter_setindicating if the image belongs to the “starter set” we recommend as a starting point for developerscentury_idwhich contains a unique ID for each image
century_dataset_with_image_quality_labels.csv
This file contains the canonical Century dataset, with all image annotations collected through human and auto-rater methods. It contains all of the columns listed above, as well as:
rating.methodindicating which evaluation method (human or automated) was used to assign quality and diversity ratings to the imagerating.srcindicating the specific labeller model used to assign quality and diversity ratingsrating.rater_idindicating anonymous IDs for all human crowd-workersrating.contentindicating the categorical rating given to describe image content typerating.conceptindicating the categorical rating given to describe the thematic category of the imagerating.subregionindicating the categorical rating given to describe the primary subregion of the image contentsrating.time_periodindicating the binary rating given to determine whether the image corresponds to the 20th and 21st centuries (1) or not (0)rating.sensitiveindicating the ordinal rating given to the image’s sensitivity from low (1) to high (5)rating.commemorativeindicating the ordinal rating given to the image’s commemorativeness from low (1) to high (5)rating.controversialindicating the ordinal rating given to the image’s controversiality from low (1) to high (5)
knowledge_graph_search_terms.csv
This file contains all search terms of historical events and figures collected through the knowledge graph mining method described in the paper. It contains a single column (terms) of lower-cased strings.
Citing this work
To cite this work, use:
@inproceedings{Akbulut25,
title={Century: A Framework and Dataset for Evaluating Historical Contextualisation of Sensitive Images},
author={Canfer Akbulut and Kevin Robinson and Maribeth Rauh and Isabela Albuquerque and Olivia Wiles and Laura Weidinger and Verena Rieser and Yana Hasson and Nahema Marchal and Iason Gabriel and William Isaac and Lisa Anne Hendricks},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
}License and disclaimer
Copyright 2024 DeepMind Technologies Limited
All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0
All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode
Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.
This is not an official Google product.
Notability
notability 2.0/10Low-stars new repo by DeepMind