What does this repo signal mean?

Amazon (Nova) published amazon-science/Automatic-Table-to-Graph-Generation (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo amazon-science/Automatic-Table-to-Graph-Generation · language Python · Low-star research repo from Amazon, routine release. onlylabs links this event to 1 captured evidence page and 6 related repo signals.

Amazon (Nova) Repo: amazon-science/Automatic-Table-to-Graph-Generation

Captured source

source ↗

GitHub/github.com/amazon-science/Automatic-Table-to-Graph-Generation

amazon-science/Automatic-Table-to-Graph-Generation repository metadata

Source ↗

published Feb 24, 2025seen 5dcaptured 8hhttp 200method plain

amazon-science/Automatic-Table-to-Graph-Generation

Language: Python

License: Apache-2.0

Stars: 18

Forks: 9

Open issues: 5

Created: 2025-02-24T21:11:35Z

Pushed: 2026-04-21T20:54:39Z

Default branch: main

Fork: no

Archived: no

README:

AutoG: Towards automatic graph construction from tabular data

Summary of updates for Oct 8 version

Improved docker configurations
Improved coding styles
Test suites under main/test_action_new.py
Fix bugs raised by @galogm
Test new programs on all 4dbinfer datasets, welcome issues and PRs if you find any new bugs

Introduction

Overview

AutoG is a novel framework that addresses the critical challenge of automatically constructing high-quality graphs from tabular data for graph machine learning (GML) applications. While GML has seen tremendous growth, the crucial step of converting tabular data into meaningful graphs remains largely manual and unstandardized. AutoG leverages Large Language Models (LLMs) to automate this process, producing graphs that rival those created by human experts.

Key Features

Automatic graph schema generation without human intervention
LLM-based solution for high-quality graph construction

Installation

Step 1: Clone the Repository

git clone https://github.com/amazon-science/Automatic-Table-to-Graph-Generation

cd Automatic-Table-to-Graph-Generation/

Step 2: Install Core Dependencies

Recommend using docker to prepare the environment.

docker build -t autog .

Then create a mapping for consistent installation

docker run --gpus all -it -v ../Automatic-Table-to-Graph-Generation:/workspace -v ./opt:/opt autog /bin/bash

Then you need to configure the LD_LIBRARY_PATH

export LD_LIBRARY_PATH="/opt/conda/envs/dbinfer-gpu/lib:$LD_LIBRARY_PATH"

Step 3: Optional Dependencies

These are required for development but not necessary if using cached LLM outputs:

pip install llama-index-llms-bedrock
pip install llama-index
pip install valentine

Usage

1. Dataset Preparation

You need to first configure your kaggle credentials for downloading the ieee-cis dataset

Generate the preprocessing dataset:

bash scripts/download.sh

This creates two dataset versions:

Old Version: A baseline preprocessed version using basic heuristics
Expert Version: Human expert-generated version with optimized column naming

> Note: AutoG uses the 'old' version as input while ignoring the schema information.

2. Running AutoG

To run the AutoG pipeline:

bash scripts/autog.sh

For detailed configuration options, see scripts/autog.sh.

3. Running Graph Machine Learning

Execute GML tasks on the constructed graphs:

bash scripts/run.sh

Update: now we stick to a non-api version of the llm, you can directly run the script and then paste the output from your browser-end LLMs for quick testing.

Using AutoG with Custom Datasets

Follow these steps to apply AutoG to your own data:

1. Generate metadata information:

from models.llm.gconstruct import analyze_dataframes
metadata = analyze_dataframes(your_dataframe)

2. Generate initial type predictions:

from prompts import identify
types = identify(metadata)

3. Create a DBBRDBDataset wrapper for your data.

4. Generate first-round prompts using AutoG.

Citation

If you use AutoG in your research, please cite:

@inproceedings{
chen2025autog,
title={AutoG: Towards automatic graph construction from tabular data},
author={Zhikai Chen and Han Xie and Jian Zhang and Xiang song and Jiliang Tang and Huzefa Rangwala and George Karypis},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=hovDbX4Gh6}
}

Notability

notability 3.0/10

Low-star research repo from Amazon, routine release