What does this repo signal mean?

Amazon (Nova) published amazon-science/avgen-eval-toolkit (Python). This repository signal exposes tooling, eval, infrastructure, or model-adjacent work before it may appear in a launch post. High-signal details: repo amazon-science/avgen-eval-toolkit · language Python. onlylabs links this event to 1 captured evidence page and 6 related repo signals. It also maps to Evals and quality in the data-business radar.

Amazon (Nova) Repo: amazon-science/avgen-eval-toolkit

Captured source

source ↗

GitHub/github.com/amazon-science/avgen-eval-toolkit

amazon-science/avgen-eval-toolkit repository metadata

Source ↗

published Oct 22, 2023seen 5dcaptured 8hhttp 200method plain

amazon-science/avgen-eval-toolkit

Language: Python

License: Apache-2.0

Stars: 19

Forks: 5

Open issues: 6

Created: 2023-10-22T02:02:11Z

Pushed: 2026-02-05T18:29:50Z

Default branch: main

Fork: no

Archived: no

README:

Audio-Visual Synchrony Evaluation

Paper: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores

If you use this work, please cite:

@misc{goncalves2024peavs,
title={PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores},
author={Lucas Goncalves and Prashant Mathur and Chandrashekhar Lavania and Metehan Cekic and Marcello Federico and Kyu J. Han},
year={2024},
eprint={2404.07336},
archivePrefix={arXiv},
primaryClass={cs.CV}
}

Environment

Create the environment from the environment.yml file:

conda create --name eval_toolkit --file spec-file.txt
source activate eval_toolkit
pip install -r requirements.txt

Make sure you have cmd line FFmpeg 6.0 installed. You can check for installation in the terminal, by typing ffmpeg -version and press enter

Activate environment with source activate eval_toolkit

Video Evaluation

This toolbox includes the following metrics:

FVD: Frechet Video Distance
FID: Frechet Inception distance, realized by inceptionv3
KID: Kernel Inception Distance
LPIPS: Learned Perceptual Image Patch Similarity
MiFID: Memorization-Informed Frechet Inception Distance
SSIM: Structural Similarity Index Measure
MS-SSIM: Multi-Scale SSIM
PSNR: Peak Signal-to-Noise Ratio
PSNRB: Peak Signal To Noise Ratio With Blocked Effect
VMAF: Video Multi-Method Assessment Fusion
VIF: Visual Information Fidelity
CLIP-Score: Implemented with CLIP VIT model

Running the metric

python3 run_video_eval.py --preds_folder /path/to/generated/videos --target_folder /path/to/the/target/videos --num_frames {Number of frames in your video or to be used for evaluation} --output path/to/NAME_YOUR_RESULTS_FILE.txt

NOTE:

Using --metrics you can specify the metrics you want to run ['fvd', 'ssim', 'psnr', 'psnrb', 'lpips', 'ms_ssim', 'clip', 'fid', 'kid', 'mifid', 'vmaf', 'vif'].
To run CLIP-Score, name your video as per the caption used to generate it (e.g., “A horse running on a road” becomes A_horse_running_on_a_road.mp4).
Ensure the /path/to/generated/videos and /path/to/the/target/videos directories contain an equal number of files with matching names.
With --clip_model, choose the version of the CLIP model to use. Default is ‘openai/clip-vit-base-patch16’.
With --net, specify the backbone network for LPIPS (options: ‘alex’, ‘vgg’, ‘squeeze’. Default is ‘alex’).
With --feat_layer, select the inceptionv3 feature layer for FID, KID, and MIFID computation (options: 64, 192, 768, 2048. Default is 64).
With --calculate_per_frame, set the number of frame steps for each metric computation (e.g., for a 32-frame video and argument of 8, metrics will be computed every 8 frames). Default is 8. All metrics will run over the entire video frames regardless of this argument.
If you have issues running VMAF please ensure you have cmd line FFmpeg 6.0 and ffmpeg-quality-metrics installed

Audio Evaluation

This toolbox includes the following metrics:

FAD: Frechet audio distance
ISc: Inception score
FD: Frechet distance, realized by PANNs, a state-of-the-art audio classification model
KL: KL divergence (softmax over logits)
KL_Sigmoid: KL divergence (sigmoid over logits)
SI_SDR: Scale-Invariant Signal-to-Distortion Ratio
SDR: Signal-to-Distortion Ratio
SI_SNR: Scale-Invariant Signal-to-Noise Ratio
SNR: Signal-to-Noise Ratio
PESQ: Perceptual Evaluation of Speech Quality
STOI: Short-Time Objective Intelligibility
CLAP-Score: Implemented with LAION-AI/CLAP

Running the metric

python3 run_audio_eval.py --preds_folder /path/to/generated/audios --target_folder /path/to/the/target_audios --metrics SI_SDR SDR SI_SNR SNR PESQ STOI CLAP FAD ISC FD KL --results NAME_YOUR_RESULTS_FILE.txt

NOTE:

Using --metrics, you can specify the metrics to run [SI_SDR, SDR, SI_SNR, SNR, PESQ, STOI, CLAP, FAD, ISC, FD, KL].
If no /path/to/the/target/audios is provided, only reference-free metrics will be run on your audios.
For CLAP-Score, name your audio as per the caption used to generate it (e.g., “A car revving while honking” becomes A_car_revving_while_honking.wav).
Ensure /path/to/generated/audios and /path/to/the/target/audios directories contain an equal number of files with matching names. Otherwise, only reference-free metrics will be run.
Using --clap_model, you can can specify the CLAP model id to be used for CLAP score computations. clap_model = 0 --> 630k non-fusion ckpt; clap_model = 1 --> 630k+audioset non-fusion ckpt; clap_model = 2 --> 630k fusion ckpt; clap_model = 3 --> 630k+audioset fusion ckpt

For general audio less than 10-sec: use 0 or 1; For general audio with variable-length: use 2 or 3; more model can be use with a few modifications (please refer to: https:github.com/LAION-AI/CLAP)

Distortions Generation

This script applies specified distortions to audio, video, or audio-visual media.

Dependencies

Create the environment from the environment.yml file:

conda env create -f environment_distortions.yml

Getting Started

2. Run the script:

python3 [YOUR SCRIPT NAME].py --input_path [PATH_TO_MEDIA_FOLDER] --dest_path [PATH_TO_DESTINATION_FOLDER] --media_type [TYPE_OF_MEDIA] [OTHER_OPTIONS]

Arguments

--input_path: Path to the folder containing media files you want to apply distortions to.
--dest_path: Path to the destination folder where the distorted media files will be saved.
--media_type: Specify the type of media you want to distort. Options are:
audios
videos
audiovisual

Optional Distortion Type Arguments

Specify specific distortion types or use the default ones:

--audio_distortions: List of distortion types for audio content. (e.g., gaussian, pops, low_pass, etc.)
--visual_distortions: List of distortion types for visual content. (e.g., speed_up, black_rectangles, gaussian_noise, etc.)
--audiovisual_distortions: List of distortion types for audio-visual content. (e.g., audio_shift, audio_speed_up, video_speed_up, etc.)

Note: Each distortion argument accepts multiple values. For example:…

Excerpt shown — open the source for the full document.