RepoMicrosoftMicrosoftpublished Feb 14, 2024seen 6d

microsoft/smc-clustering

Jupyter Notebook

Open original ↗

Captured source

source ↗
published Feb 14, 2024seen 6dcaptured 6dhttp 200method plain

microsoft/smc-clustering

Description: Scalable Model-Based Clustering with Sequential Monte Carlo

Language: Jupyter Notebook

License: MIT

Stars: 2

Forks: 0

Open issues: 13

Created: 2024-02-14T10:27:21Z

Pushed: 2026-06-20T00:51:23Z

Default branch: main

Fork: no

Archived: no

README:

Scalable Model-Based Clustering with Sequential Monte Carlo

This repository contains an implementation of the split SMC algorithm proposed in the paper _Scalable Model-Based Clustering with Sequential Monte Carlo_ as well as code for reproducing the experiments.

Split SMC is an online, Bayesian clustering algorithm that decomposes large problems into approximately independent subproblems, producing a compact representation of the posterior distribution over clusterings. This allows it to scale to tasks with complex cluster-likelihoods and a large number of clusters, where existing methods struggle due to prohibitive memory or computational cost. Our approach is motivated by the knowledge base construction problem, which requires clustering large datasets of free-text fragments of information.

Authors: Connie Trojan, Pavel Myshkov, Paul Fearnhead, James Hensman, Tom Minka, Christopher Nemeth

This work is done in collaboration between Microsoft Research and Lancaster University.

Setup

The recommended way to install the code is to clone the repository and install everything via uv.

1. Create the virtual environment and install the package:

uv sync --locked

2. Optional: install the MS-KeBAB package for REBEL-based experiments: This dependency is only needed for the evaluation scripts that use the REBEL datasets. Assuming the MS-KeBAB project is cloned in the parent directory of this repository, you can install it as follows:

uv pip install --no-deps -e ../MS-KeBAB

Notability

notability 4.0/10

New repo with minimal traction