RepoMicrosoftMicrosoftpublished Apr 3, 2026seen 2w

microsoft/tgrep

Rust

Open original ↗

Captured source

source ↗
published Apr 3, 2026seen 2wcaptured 2whttp 200method plain

microsoft/tgrep

Description: Trigram-indexed grep with a client/server architecture for fast regex search in large codebases locally

Language: Rust

License: MIT

Stars: 18

Forks: 4

Open issues: 2

Created: 2026-04-03T15:50:37Z

Pushed: 2026-06-12T02:45:33Z

Default branch: main

Fork: no

Archived: no

README:

tgrep

Trigram-indexed grep with a client/server architecture for fast regex search in large codebases.

Why?

Tools like grep and ripgrep scan every file on every search — O(total bytes) per query. In a 100k+ file monorepo, that's painfully slow. tgrep pre-builds a trigram index so searches only touch the small set of files that could match.

Start a server once, search instantly forever.

tgrep index . # build the trigram index
tgrep serve . # start server (watches for file changes)
tgrep "fn main" . # instant — auto-connects to running server

See [full benchmark results](BENCHMARKS.md) — up to 72x faster than ripgrep on large repos.

Benchmark highlights (avg latency per query, index pre-built)

| Repo | Files | Platform | ripgrep | tgrep | Speedup | | --- | ---: | --- | ---: | ---: | ---: | | chromium | 496K | macOS arm64 | 61,110ms | 2,630ms | 23x | | chromium | 496K | Windows | 29,557ms | 2,491ms | 12x | | gecko-dev | 388K | macOS arm64 | 35,413ms | 492ms | 72x | | gecko-dev | 388K | Windows | 16,199ms | 310ms | 52x | | gecko-dev | 388K | Linux | 1,931ms | 170ms | 11x | | linux | 94K | Windows | 4,317ms | 934ms | 5x | | rust | 59K | Windows | 1,989ms | 215ms | 9x | | kubernetes | 30K | Windows | 1,489ms | 178ms | 8x | | go | 15K | Windows | 450ms | 70ms | 6x |

Architecture

tgrep ---TCP---> tgrep serve (multi-client)
(client) |
HybridIndex
/ \
IndexReader LiveIndex
(mmap disk) (in-memory overlay)
^ ^
| |
Periodic Flush File Watcher (notify)
(50K files / Background Indexer
5 min) (rayon parallel)
  • IndexReader — mmap'd on-disk index (zero-copy, binary search on sorted

trigram lookup table)

  • LiveIndex — in-memory overlay for files modified after server start, or

being built by the background indexer

  • HybridIndex — merges both layers; overlay takes precedence
  • Background Indexer — builds the index in parallel batches of 500 files

using rayon; queries are served immediately from partial data

  • Periodic Flush — every 50K files or 5 minutes, the in-memory index is

flushed to disk and the reader is swapped, keeping memory bounded

  • File Watchernotify crate watches the repo; updates LiveIndex in

real time

  • TCP Server — JSON-RPC 2.0 over newline-delimited TCP; each connection

handled in a separate thread; multiple clients can connect simultaneously

  • File Cache — 50K-entry content cache with RwLock for lock-free reads

Performance

tgrep is designed to be significantly faster than ripgrep on large repos:

  • Parallel search — candidate files are searched in parallel using rayon
  • Fast query planning — sorted posting lists are intersected/unioned without

unnecessary resorting, and on-disk posting lists skip redundant deduplication

  • Memory-efficient full builds — index builds batch extraction and stream

sorted postings, file entries, and lookup entries instead of retaining the full inverted index in memory

  • Smart file walking — extension-based binary rejection (50+ formats),

8KB content check, 1MB file size limit

  • Lock-free readsRwLock cache allows concurrent reads

without contention

  • Hot serving — queries work immediately during background index building;

no need to wait for full index

See [BENCHMARKS.md](BENCHMARKS.md) for end-to-end large-repo benchmarks and Criterion microbenchmarks for query execution, trigram extraction, and index building.

Usage

Build the index

tgrep index . # index current directory
tgrep index /path/to/repo # index a specific repo
tgrep index . --index-path /tmp/idx # custom index location
tgrep index . --exclude vendor --exclude third_party # skip directories

Start the server

tgrep serve . # start server (auto-builds index if missing)
tgrep serve . --index-path /tmp/idx # custom index location
tgrep serve . --no-watch # skip file watcher (saves memory)
tgrep serve . --exclude node_modules # exclude directories from indexing

The server builds the index in the background if none exists, and serves queries immediately from partial data. Multiple clients can connect simultaneously.

Search

tgrep "pattern" . # basic regex search
tgrep "pattern" file1.rs file2.rs # search multiple files/paths
tgrep "TODO|FIXME" . # alternations
tgrep '\w+(?!_test)' . # PCRE-style lookahead fallback
tgrep "error" . -i # case-insensitive
tgrep "error" . -S # smart-case (auto if all lowercase)
tgrep -F "Vec" . # literal string
tgrep "MyStruct" . -l # filenames only
tgrep "pattern" . -c # count per file
tgrep "pattern" . -o # only matching text
tgrep "pattern" . -w # whole word
tgrep "pattern" . -v # invert match
tgrep "pattern" . -m 5 # max 5 matches per file
tgrep "pattern" . -g "*.rs" # glob filter
tgrep "pattern" . -g "*.rs" -g "*.toml" # multiple globs (OR)
tgrep "pattern" . -t rust # type filter
tgrep "pattern" . -e "also_this" # multiple patterns
tgrep "pattern" . -A 3 # 3 lines after match
tgrep "pattern" . -B 2 # 2 lines before match
tgrep "pattern" . -C 3 # 3 lines before & after
tgrep "pattern" . --json # JSON output
tgrep "pattern" . --vimgrep # vim-compatible output
tgrep "pattern" . --stats # show query plan & timing
tgrep "pattern" . --no-index # brute-force (skip index)
tgrep "pattern" . -U # multiline matching
tgrep "pattern" . -q # quiet: exit code only
tgrep "pattern" . -L # files that DON'T match
tgrep "pattern" . --no-filename # suppress filenames
tgrep "pattern" . -N # suppress line numbers
tgrep --files . # list searchable files
tgrep --files src/main.rs # list a single file if searchable
tgrep --files -t rust . # list Rust files only
tgrep --type-list # show all file types

Check status

tgrep status .
Server status for /src/my-monorepo
PID: 37980
Port: 51043
Files: 152
Trigrams: 12265
Cache: 2/50000
Watcher: active
Indexing: complete

Count files

tgrep count-files . # count text files (no server needed)
tgrep count-files /path/to/repo # scan a specific repo

Prints the count to stdout (scriptable) and details to stderr:

284957
284957 text files (47516 binary skipped, 0 errors) in...

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Routine repo from Microsoft with minimal traction.