RepoAmazon (Nova)Amazon (Nova)published Nov 19, 2024seen 5d

amazon-science/conversational-ambiguous-unanswerable-text2sql

Jupyter Notebook

Open original ↗

Captured source

source ↗

amazon-science/conversational-ambiguous-unanswerable-text2sql

Language: Jupyter Notebook

License: NOASSERTION

Stars: 5

Forks: 1

Open issues: 13

Created: 2024-11-19T14:18:29Z

Pushed: 2026-04-21T21:03:10Z

Default branch: main

Fork: no

Archived: no

README:

PRACTIQ: Comprehensive Setup and Usage Guide

This is a guide for setting up and running the PRACTIQ (Ambiguous and Unanswerable Text-to-SQL) data generation pipeline.

Paper: PRACTIQ: A Practical Conversational Text-to-SQL dataset with Ambiguous and Unanswerable Queries (Dong et al., NAACL 2025)

Table of Contents

1. [Overview](#overview) 2. [Python Environment Setup](#python-environment-setup) 3. [Project Structure](#project-structure) 4. [Running the Pipeline](#running-the-pipeline) 5. [Testing and Verification](#testing-and-verification) 6. [Advanced Usage](#advanced-usage)

---

Python Environment Setup

⚠️ Platform Support Notice:

  • Ubuntu/Linux (Recommended): tested on Ubuntu 20.04 with Python 3.10
  • macOS: may encounter compatibility issues with typer library command-line argument parsing. To troubleshoot, one may need to update the typer.Option to typer.Argument to make the scripts work.

Requirements:

  • Python Version: 3.10.x (REQUIRED - Python 3.11+ will not work due to dependency compatibility)
  • Package Manager: uv (recommended, 5x faster than pip) or pip
  • API Access: AWS Bedrock access with bearer token authentication or modify to use other LLM providers or local ones via litellm (key file to edit is src/litellm_helpers.py).

IMPORTANT: The codebase requires Python 3.10 and was developed/tested on Ubuntu Linux.

Step 1: Clone the repo

Example project structure after clone and setup and generation:

ambi-unans-text-to-sql/
├── .venv/ # Python virtual environment
├── .vscode/
│ ├── .env # Authentication credentials (not in git)
│ ├── settings.json # VS Code Python interpreter config
│ ├── combined_data_all/ # requires manual downloading of the SPIDER data
│ │ └── spider/ # Spider dataset (dev.json, train.json, etc.)
│ │ ├── dev.json # Dev set for answerable questions
│ │ ├── tables.json # Database schema definitions
│ │ └── database/ # SQLite database files
│ └── output-YYYYMMDD_HHMMSS/ # Generated data (timestamped directories)
│ └── dev/ # Per-category JSONL files
├── src/ # Source code
│ ├── ambiguous/ # 4 ambiguous categories
│ │ ├── ambiguous_SELECT_column/
│ │ ├── ambiguous_VALUES_within_column/
│ │ ├── ambiguous_VALUES_across_columns/
│ │ └── vague_filter_term/
│ ├── unanswerable/ # 4 unanswerable categories
│ │ ├── nonexistent_select_column/
│ │ ├── nonexistent_value/
│ │ ├── nonexistent_where_column/
│ │ └── unsupported_joins/
│ ├── experiment/ # Evaluation and classification scripts
│ ├── litellm_helpers.py # LiteLLM unified interface
│ ├── custom_sql_engine.py # SQL execution engine
│ ├── combine_all_data_together.py # Stage 2: Combine categories
│ ├── contextualize_and_explain_execution_results.py # Stage 4: Finalize
│ └── utils.py # Utility functions
├── logs/ # Execution logs (created during runs)
├── test/ # Unit tests
├── test_per_category.sh # Main test script for E2E pipeline
├── requirements.txt # Python dependencies
└── README.md # Project overview
  • `test_per_category.sh`: Orchestrates the entire pipeline (see next section)
  • `src/litellm_helpers.py`: Configures LiteLLM with 8 model endpoints
  • `src/combine_all_data_together.py`: Merges 8 categories + answerable questions
  • `src/contextualize_and_explain_execution_results.py`: Adds natural language explanations

Step 2: Authentication Configuration: Create .env File

Security Note: Ensure the .vscode/.env file is excluded from version control via .gitignore. Never commit authentication credentials to the repository.

The project uses AWS Bedrock for LLM API calls. Authentication is handled via bearer token stored in an environment file. You can modify the implementation to use other LLMs by changing src/litellm_helpers.py.

# Create the .vscode directory if it doesn't exist
mkdir -p .vscode

# Create the .env file
cat > .vscode/.env **⚠️ Note for macOS Users:** This section is ONLY for users who want to run this project on macOS (especially Apple Silicon/ARM64). If you are using Ubuntu/Linux, please follow the main installation instructions above.

For macOS users, `requirements-macos.txt` has updated versions that have prebuilt wheels for torch, transformers, tokenizers, huggingface-hub, etc.

### Installation Steps

Navigate to the project root

cd /path/to/ambi-unans-text-to-sql-cloned-repo

Create a conda environment with Python 3.10

conda create -n ambi-text-to-sql python=3.10 -y

Activate the conda environment

conda activate ambi-text-to-sql

Verify Python version

python --version # Should show Python 3.10.x

Install dependencies using the macOS requirements file

pip install -r requirements-macos.txt

On subsequent uses:

conda activate ambi-text-to-sql

---

## Citation

If you find this work useful or use this codebase, please cite:

@inproceedings{dong-etal-2025-practiq, title = "{PRACTIQ}: A Practical Conversational Text-to-{SQL} dataset with Ambiguous and Unanswerable Queries", author = "Dong, Mingwen and Ashok Kumar, Nischal and etc", editor = "Chiruzzo, Luis and Ritter, Alan and Wang, Lu", booktitle = "Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)", month = apr, year = "2025", address = "Albuquerque, New Mexico", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2025.naacl-long.13/", doi = "10.18653/v1/2025.naacl-long.13", pages = "255--273", ISBN = "979-8-89176-189-6" }

Excerpt shown — open the source for the full document.

Notability

notability 3.0/10

Low stars, routine research repo