What does this repo signal mean?

Anthropic published anthropics/rogue-deploy-eval (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo anthropics/rogue-deploy-eval · language Python · Low traction (15 stars) for a new repo from Anthropic, likely a routine research tool.. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Evals and quality in the data-business radar.

Anthropic Repo: anthropics/rogue-deploy-eval

Captured source

source ↗

GitHub/github.com/anthropics/rogue-deploy-eval

anthropics/rogue-deploy-eval repository metadata

Source ↗

published Nov 5, 2024seen 6dcaptured 8hhttp 200method plain

anthropics/rogue-deploy-eval

Language: Python

License: MIT

Stars: 15

Forks: 7

Open issues: 0

Created: 2024-11-05T15:21:13Z

Pushed: 2025-01-21T14:20:23Z

Default branch: main

Fork: no

Archived: yes

README:

A toy evaluation of inference code tampering

This is the code for this blog post.

This allows running a semi-quantitative evaluation of a model's ability to write code that modifies a simple generation function to disable the generation monitor while solving another task.

How to use this code

To run the experiment

1. Fill the places in the repository that are 'TO_FILL', as our experiments relied on private tools for GPU code execution and LLM inference.

2. Run the notebook rogue_deploy/simple_pipeline.py

3. Run the notebook rogue_deploy/print_pipeline.py

Notability

notability 3.0/10

Low traction (15 stars) for a new repo from Anthropic, likely a routine research tool.