google-deepmind/synthid-text

Python

Open original ↗

Captured source

source ↗
published Oct 23, 2024seen 1dcaptured 10hhttp 200method plain

google-deepmind/synthid-text

Language: Python

License: Apache-2.0

Stars: 904

Forks: 90

Open issues: 15

Created: 2024-10-23T12:08:45Z

Pushed: 2026-06-10T01:30:45Z

Default branch: main

Fork: no

Archived: no

README:

SynthID Text

This repository provides a reference implementation of the SynthID Text watermarking and detection capabilities for the [research paper][nature-paper] published in _Nature_. It is not intended for production use. The core library is [distributed on PyPI][synthid-pypi] for easy installation in the [Python Notebook example][synthid-colab], which demonstrates how to apply these tools with the [Gemma][gemma] and [GPT-2][gpt2] models.

Installation and usage

The [Colab Notebook][synthid-colab] is self-contained reference implementation that:

1. Extends the [GemmaForCausalLM][transformers-gemma] and [GPT2LMHeadModel][transformers-gpt2] classes from [Hugging Face Transformers][transformers] with a [mix-in][synthid-mixin] to enable watermarking text content generated by models running in [PyTorch][pytorch]; and 1. Detects the watermark. This can be done either with the simple [Weighted Mean detector][synthid-detector-mean] which requires no training, or with the more powerful [Bayesian detector][synthid-detector-bayesian] that requires [training][synthid-detector-trainer]. If using the [Weighted Mean detector][synthid-detector-mean] approach across texts of varying token lengths, we recommend empirically/theoretically computing the thresholds at the desired false positives rate at specific token lengths, or using a weighted frequentist approach as described in Appendix A.3.1.

The notebook is designed to be run end-to-end with either a Gemma or GPT-2 model, and runs best on the following runtime hardware, some of which may require a [Colab Subscription][colab-subscriptions].

  • Gemma v1.0 2B IT: Use a GPU with 16GB of memory, such as a T4.
  • Gemma v1.0 7B IT: Use a GPU with 32GB of memory, such as an A100.
  • GPT-2: Any runtime will work, though a High-RAM CPU or any GPU will be

faster.

NOTE: This implementation is for reference and research reproducibility purposes only. Due to minor variations in Gemma and Mistral models across implementations, we expect minor fluctuations in the detectability and perplexity results obtained from this repository versus those reported in the paper. The subclasses introduced herein are not designed to be used in production systems. Check out the official SynthID Text implementation in [Hugging Face Transformers][transformers-blog] for a production-ready implementation.

NOTE: The synthid_text.hashing_function.accumulate_hash() function, used while computing G values in this reference implementation, does not provide any guarantees of cryptographic security.

Local notebook use

The notebook can also be used locally if installed from source. Using a virtual environment is highly recommended for any local use.

# Create and activate the virtual environment
python3 -m venv ~/.venvs/synthid
source ~/.venvs/synthid/bin/activate

# Download and install SynthID Text and Jupyter
git clone https://github.com/google-deepmind/synthid-text.git
cd synthid-text
pip install '.[notebook-local]'

# Start the Jupyter server
python -m notebook

Once your kernel is running navigate to .pynb file to execute.

Running the tests

The source installation also includes a small test suite to verify that the library is working as expected.

# Create and activate the virtual environment
python3 -m venv ~/.venvs/synthid
source ~/.venvs/synthid/bin/activate

# Download and install SynthID Text with test dependencies from source
git clone https://github.com/google-deepmind/synthid-text.git
cd synthid-text
pip install '.[test]'

# Run the test suite
pytest .

How it works

Defining a watermark configuration

SynthID Text produces unique watermarks given a configuration, with the most important piece of these configurations being the keys: a sequence of unique integers where len(keys) corresponds to the number of layers in the watermarking or detection models.

The structure of a configuration is described in the following TypedDict subclass, though in practice, the [mixin][synthid-mixin] class in this library uses a static configuration.

from collections.abc import Sequence
from typing import TypedDict

import torch

class WatermarkingConfig(TypedDict):
ngram_len: int
keys: Sequence[int]
sampling_table_size: int
sampling_table_seed: int
context_history_size: int
device: torch.device

Applying a watermark

Watermarks are applied by a [mix-in][synthid-mixin] class that wraps the [GemmaForCausalLM][transformers-gemma] and [GPT2LMHeadModel][transformers-gpt2] classes from Transformers, which results in two subclasses with the same API that you are used to from Transformers. Remember that the mix-in provided by this library uses a static watermarking configuration, making it unsuitable for production use.

from synthid_text import synthid_mixin
import transformers
import torch

DEVICE = (
torch.device('cuda:0') if torch.cuda.is_available() else torch.device('cpu')
)
INPUTS = [
"I enjoy walking with my cute dog",
"I am from New York",
"The test was not so very hard after all",
"I don't think they can score twice in so short a time",
]
MODEL_NAME = 'google/gemma-2b-it'
TEMPERATURE = 0.5
TOP_K = 40
TOP_P = 0.99

# Initialize a standard tokenizer from Transformers.
tokenizer = transformers.AutoTokenizer.from_pretrained(MODEL_NAME)
# Initialize a SynthID Text-enabled model.
model = synthid_mixin.SynthIDGemmaForCausalLM.from_pretrained(
MODEL_NAME,
device_map='auto',
torch_dtype=torch.bfloat16,
)
# Prepare your inputs in the usual way.
inputs = tokenizer(
INPUTS,
return_tensors='pt',
padding=True,
).to(DEVICE)
# Generate watermarked text.
outputs = model.generate(
**inputs,
do_sample=True,
max_length=1024,
temperature=TEMPERATURE,
top_k=TOP_K,
top_p=TOP_P,
)

Detecting a watermark

Watermark detection can be done using a variety of scoring functions (see paper). This repository contains code for the Mean, Weighted Mean, and Bayesian scoring functions described in the paper. The colab contains examples for how to use these scoring functions.

The Bayesian detector must be trained on watermarked and unwatermarked data before it can be used. The Bayesian detector must be trained for each unique watermarking key, and the training data used for this detector model…

Excerpt shown — open the source for the full document.