{"schema_version":"onlylabs.public_analysis_evidence.v1","title":"Together AI analysis evidence pack","description":"Public onlylabs evidence pack for cited agent analysis: captured pages, ranked public signals, and stored web-search provenance used by the background analysis workflow.","url":"https://onlylabs.fyi/labs/together-ai","json_url":"https://onlylabs.fyi/analysis/together-ai/evidence.json","generated_at":"2026-06-11T18:08:18.364Z","org":{"slug":"together-ai","name":"Together AI","category":"neocloud","category_label":"Neocloud","dossier_url":"https://onlylabs.fyi/labs/together-ai"},"analysis":null,"workflow":{"version":"onlylabs-deepagents-analysis-v3","provider":null,"model":null,"agent":null,"public_pack_mode":"local-pages-and-events","live_web_fetches":false,"note":"Public evidence exports do not trigger live Exa calls; stored Exa provenance is included when analysis metadata contains it."},"stats":{"pages":28,"events":140,"web":0,"evidence":88,"signal_desks":{"hiring":22,"forks":9,"releases":15,"talking":12,"repos":2},"data_radar_lanes":null,"data_radar_matches":null,"stored_analysis_evidence":null,"stored_analysis_web":null,"stored_analysis_signal_desks":null,"stored_analysis_data_radar_lanes":null,"stored_analysis_data_radar_matches":null},"stored_web_provenance":null,"evidence":[{"ref":"P1","kind":"page","title":"Building trust in enterprise AI: Together AI earns ISO 27001:2022 certification","date":"2026-06-11T07:04:02.334072+00:00","date_source":null,"source_url":"https://www.together.ai/blog/iso-27001-2022-certification","signal_url":null,"signal_json_url":null,"text":"Building trust in enterprise AI: Together AI earns ISO 27001:2022 certification \n\n🚀 Now serving MiniMax-M3 for efficient inference →\n\n⚡ On-demand B200s now available on Together GPU Clusters →\n\n📊 Delivering 31% more TPS than the next-fastest OSS engine for production coding agent workloads →\n\n💬 How Together built the world&#x27;s fastest speech-to-text stack →\n\n🇫🇷 Join us at RAISE 2026 in Paris →\n\nAll blog posts\n\nCompany\n\nPublished 6/10/2026 \n\nBuilding trust in enterprise AI: Together AI earns ISO 27001:2022 certification\n\nISO 27001:2022 builds on our existing compliance program and reinforces our commitment to helping customers run production-grade AI workloads on secure, well-governed infrastructure.\n\nAuthors\n\nLisa Ruggiero, Derek Chamorro\n\nTable of contents\n\n40+ Models Chosen for Production...40+ Models Chosen for Production...40+ Models Chosen for Production...\n\nLinks in this article\n\nTrust Center \n\nSummary\n\nTogether AI has received an ISO 27001:2022 certification from A-LIGN Compliance and Security, Inc., an ANAB-accredited certification body, confirming that our Information Security Management System (ISMS) meets the latest international standard for information security management. This milestone builds on our existing compliance program and reinforces our commitment to helping customers run production-grade AI workloads on secure, well-governed infrastructure.\nThe certification reflects a comprehensive, multi-month assessment of how we manage risk, secure data, and continuously improve security across our organization and platform.\n\nScope of the certification\nThe ISO 27001:2022 certification is scoped to the ISMS supporting Together AI’s global platform, including the systems, processes, and controls that protect customer data and platform operations. This scope covers our corporate headquarters as well as the security of third‑party data centers that provide hosting and colocation services for Together AI’s infrastructure.\nWhat ISO means for customers\nISO 27001:2022 is the leading international standard for establishing, implementing, maintaining, and continually improving an ISMS. By certifying our ISMS to this standard, an independent auditor has"},{"ref":"P2","kind":"page","title":"togethercomputer/Port_FasterTransformer repository metadata","date":"2026-06-11T04:19:31.952944+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/Port_FasterTransformer","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/Port_FasterTransformer\n\nDescription: Transformer related optimization, including BERT, GPT\n\nLanguage: C++\n\nLicense: Apache-2.0\n\nStars: 1\n\nForks: 1\n\nOpen issues: 1\n\nCreated: 2022-10-30T13:37:30Z\n\nPushed: 2023-05-28T04:48:28Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: NVIDIA/FasterTransformer\n\nArchived: no\n\nREADME:\n# Port_FasterTransformer \n\nTo bring up a standalone node:\n\n```console\nmkdir .together\ndocker run --rm --gpus all --ipc=host \\\n-e NUM_WORKERS=auto \\\n-e CUDA_VISIBLE_DEVICES=$CUDA_VISIBLE_DEVICES \\\n-v $PWD/.together:/home/user/.together \\\n-it togethercomputer/fastertransformer /usr/local/bin/together start \\\n--color --config /home/user/cfg.yaml --worker.model GPT-JT-6B-v1-tp1\n```\n\n```console\ndocker run --rm --gpus '\"device=3,4\"' --ipc=host \\\n-e NUM_WORKERS=auto \\\n-v $PWD/.together:/home/user/.together \\\n-it togethercomputer/fastertransformer /usr/local/bin/together start \\\n--color --config /home/user/cfg.yaml --worker.model opt-13b-tp2\n```\n\n# Development commands\n\n```console\ndocker build -t port_ft_gpt_jt -f GPT-JT-Dockerfile \n\nnvidia-docker run --ipc=host --network=host --name port_ft -ti -v /root/fm/models/ft_model:/workspace/Port_FasterTransformer/build/model -v /root/fm/dev/Port_FasterTransformer/examples/:/workspace/Port_FasterTransformer/examples -v /root/fm/dev/Port_FasterTransformer/src/fastertransformer:/workspace/Port_FasterTransformer/src/fastertransformer port_ft bash\n\nnvidia-docker run --ipc=host --network=host --name port_ft -ti -v /home/binhang/active/ft_model:/workspace/Port_FasterTransformer/build/model -v /home/binhang/active/Port_FasterTransformer/examples:/workspace/Port_FasterTransformer/examples -v /home/binhang/active/Port_FasterTransformer/src/fastertransformer:/workspace/Port_FasterTransformer/src/fastertransformer port_fasttransformer bash\n\nmpirun -n 8 --allow-run-as-root python /workspace/Port_FasterTransformer/examples/pytorch/gpt/port_opt_inference.py --weights_data_type fp16 --data_type fp16 --vocab_size 50272 --max_batch_size 1 --max_seq_len 2048 --tensor_para_size 8 --ckpt_path /workspace/Port_FasterTransformer/build/model/opt-66b-fp16-tp8/8-gpu --lib_path /workspace/Port_FasterTransformer/build"},{"ref":"P3","kind":"page","title":"togethercomputer/flash-attention repository metadata","date":"2026-06-11T04:19:31.909418+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/flash-attention","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/flash-attention\n\nDescription: Fast and memory-efficient exact attention\n\nLanguage: Python\n\nLicense: BSD-3-Clause\n\nStars: 1\n\nForks: 1\n\nOpen issues: 0\n\nCreated: 2022-11-22T23:05:11Z\n\nPushed: 2023-08-30T18:03:03Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: Dao-AILab/flash-attention\n\nArchived: no\n\nREADME:\n# FlashAttention\nThis repository provides the official implementation of FlashAttention and\nFlashAttention-2 from the\nfollowing papers.\n\n**FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness** \nTri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré \nPaper: https://arxiv.org/abs/2205.14135 \nIEEE Spectrum [article](https://spectrum.ieee.org/mlperf-rankings-2022) about our submission to the MLPerf 2.0 benchmark using FlashAttention.\n![FlashAttention](assets/flashattn_banner.jpg)\n\n**FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning** \nTri Dao\n\nPaper: https://tridao.me/publications/flash2/flash2.pdf\n\n![FlashAttention-2](assets/flashattention_logo.png)\n\n## Usage\n\nWe've been very happy to see FlashAttention being widely adopted in such a short\ntime after its release. This [page](https://github.com/Dao-AILab/flash-attention/blob/main/usage.md)\ncontains a partial list of places where FlashAttention is being used.\n\nFlashAttention and FlashAttention-2 are free to use and modify (see LICENSE).\nPlease cite and credit FlashAttention if you use it.\n\n## Installation and features\n\nRequirements:\n- CUDA 11.4 and above.\n- PyTorch 1.12 and above.\n\nWe recommend the\n[Pytorch](https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch)\ncontainer from Nvidia, which has all the required tools to install FlashAttention.\n\nTo install:\n1. Make sure that PyTorch is installed.\n2. Make sure that `packaging` is installed (`pip install packaging`)\n3. Make sure that `ninja` is installed and that it works correctly (e.g. `ninja\n--version` then `echo $?` should return exit code 0). If not (sometimes `ninja\n--version` then `echo $?` returns a nonzero exit code), uninstall then reinstall\n`ninja` (`pip uninstall -y ninja && pip install ninja`). Without `ninja`,\ncompiling can take a very long time (2h) since it "},{"ref":"P4","kind":"page","title":"togethercomputer/diffusers repository metadata","date":"2026-06-11T04:19:31.766164+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/diffusers","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/diffusers\n\nDescription: 🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch\n\nLanguage: Python\n\nLicense: Apache-2.0\n\nStars: 4\n\nForks: 3\n\nOpen issues: 2\n\nCreated: 2022-11-22T23:05:32Z\n\nPushed: 2026-05-08T13:54:43Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: HazyResearch/diffusers\n\nArchived: no\n\nREADME:\n# Diffusers + FlashAttention\n\nThis is a branch of [HuggingFace Diffusers](https://github.com/huggingface/diffusers) to incorporate FlashAttention, optimized for high throughput.\n\n**Update 10/31/22**: Easier install! You can either run from our Docker image, or install from source. Bonus: We no longer rely on the cutlass branch of FlashAttention!\n\n## Installation\n\n**From our Docker image:**\n\nYou can run from our [Docker image](https://hub.docker.com/layers/danfu09/diffusers/0.1/images/sha256-033c41564f01894e922f93018e87046c8719ce63bd942939fb1d38627577811e?context=explore):\n```\ndocker run -it --rm --gpus all danfu09/diffusers:0.1 zsh\nhuggingface-cli login\ncd diffusers\npython test.py --batch_size 1 # how many images to generate at once\n```\n\n**To install from source:**\n\nFlashAttention requires CUDA 11, NVCC, and a Turing or Ampere GPU.\nTo install FlashAttention:\n```\ngit clone https://github.com/HazyResearch/flash-attention.git\ncd flash-attention\ngit submodule init\ngit submodule update\npython setup.py install\ncd ..\n```\n\nTo install diffusers:\n```\ngit clone https://github.com/HazyResearch/diffusers.git\ncd diffusers\npip install -e .\n```\n\n## Running\n\nA sample benchmark, following HuggingFace's [benchmark](https://twitter.com/Nouamanetazi/status/1576959648912973826) of diffusers:\n```Python\nimport time\nimport torch\nfrom diffusers import StableDiffusionPipeline\nimport functools\n\n# torch disable grad\ntorch.set_grad_enabled(False)\n\ntorch.manual_seed(1231)\ntorch.cuda.manual_seed(1231)\n\nprompt = \"a photo of an astronaut riding a horse on mars\"\n\n# cudnn benchmarking\ntorch.backends.cudnn.benchmark = True\n\n# make sure you're logged in with `huggingface-cli login`\npipe = StableDiffusionPipeline.from_pretrained(\n\"CompVis/stable-diffusion-v1-4\", \nuse_auth_token=True,\nrevision=\"fp16\",\ntorch_dtype=torch.float16\n).to(\"cuda\")\n"},{"ref":"P5","kind":"page","title":"togethercomputer/together-chat repository metadata","date":"2026-06-11T04:19:31.658895+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/together-chat","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/together-chat\n\nDescription: Streamlit Component, for a Chatbot UI\n\nLicense: MIT\n\nStars: 2\n\nForks: 0\n\nOpen issues: 14\n\nCreated: 2023-01-17T16:51:32Z\n\nPushed: 2024-06-18T02:53:02Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: AI-Yash/st-chat\n\nArchived: no\n\nREADME:\n# st-chat\n\nStreamlit Component, for a Chat-bot UI, [example app](https://share.streamlit.io/ai-yash/st-chat/main/examples/chatbot.py)\n\nauthors - [@yashppawar](https://github.com/yashppawar) & [@YashVardhan-AI](https://github.com/yashvardhan-ai)\n\n## Installation\n\nInstall `streamlit-chat` with pip\n```bash\npip install streamlit-chat \n```\n\nusage, import the `message` function from `streamlit_chat`\n```py\nimport streamlit as st\nfrom streamlit_chat import message\n\nmessage(\"My message\") \nmessage(\"Hello bot!\", is_user=True) # align's the message to the right\n```"},{"ref":"P6","kind":"page","title":"togethercomputer/langchain repository metadata","date":"2026-06-11T04:19:31.398749+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/langchain","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/langchain\n\nDescription: ⚡ Building applications with LLMs through composability ⚡\n\nLicense: MIT\n\nStars: 1\n\nForks: 0\n\nOpen issues: 15\n\nCreated: 2023-01-26T13:32:35Z\n\nPushed: 2024-07-25T10:53:48Z\n\nDefault branch: master\n\nFork: yes\n\nParent repository: langchain-ai/langchain\n\nArchived: no\n\nREADME:\n# 🦜️🔗 LangChain\n\n⚡ Building applications with LLMs through composability ⚡\n\n[![lint](https://github.com/hwchase17/langchain/actions/workflows/lint.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/lint.yml) [![test](https://github.com/hwchase17/langchain/actions/workflows/test.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/test.yml) [![linkcheck](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml/badge.svg)](https://github.com/hwchase17/langchain/actions/workflows/linkcheck.yml) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Twitter](https://img.shields.io/twitter/url/https/twitter.com/langchainai.svg?style=social&label=Follow%20%40LangChainAI)](https://twitter.com/langchainai) [![](https://dcbadge.vercel.app/api/server/6adMQxSpJS?compact=true&style=flat)](https://discord.gg/6adMQxSpJS)\n\n## Quick Install\n\n`pip install langchain`\n\n## 🤔 What is this?\n\nLarge language models (LLMs) are emerging as a transformative technology, enabling\ndevelopers to build applications that they previously could not.\nBut using these LLMs in isolation is often not enough to\ncreate a truly powerful app - the real power comes when you can combine them with other sources of computation or knowledge.\n\nThis library is aimed at assisting in the development of those types of applications. Common examples of these types of applications include:\n\n**❓ Question Answering over specific documents**\n\n- [Documentation](https://langchain.readthedocs.io/en/latest/use_cases/question_answering.html)\n- End-to-end Example: [Question Answering over Notion Database](https://github.com/hwchase17/notion-qa)\n\n**💬 Chatbots**\n\n- [Documentation](https://langchain.readthedocs.io/en/latest/use_cases/chatbots.html)\n- End-to-end Example: [Chat-LangChain](https://github.com/hw"},{"ref":"P7","kind":"page","title":"togethercomputer/transformers_port repository metadata","date":"2026-06-11T04:19:31.059379+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/transformers_port","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/transformers_port\n\nDescription: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.\n\nLanguage: Python\n\nLicense: Apache-2.0\n\nStars: 7\n\nForks: 4\n\nOpen issues: 17\n\nCreated: 2023-03-03T18:48:12Z\n\nPushed: 2024-10-17T22:24:43Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: huggingface/transformers\n\nArchived: no\n\nREADME:\n<!---\nCopyright 2020 The HuggingFace Team. All rights reserved.\n\nLicensed under the Apache License, Version 2.0 (the \"License\");\nyou may not use this file except in compliance with the License.\nYou may obtain a copy of the License at\n\nhttp://www.apache.org/licenses/LICENSE-2.0\n\nUnless required by applicable law or agreed to in writing, software\ndistributed under the License is distributed on an \"AS IS\" BASIS,\nWITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\nSee the License for the specific language governing permissions and\nlimitations under the License.\n-->\n\n<p align=\"center\">\n<picture>\n<source media=\"(prefers-color-scheme: dark)\" srcset=\"https://huggingface.co/datasets/huggingface/documentation-images/raw/main/transformers-logo-dark.svg\">\n<source media=\"(prefers-color-scheme: light)\" srcset=\"https://huggingface.co/datasets/huggingface/documentation-images/raw/main/transformers-logo-light.svg\">\n<img alt=\"Hugging Face Transformers Library\" src=\"https://huggingface.co/datasets/huggingface/documentation-images/raw/main/transformers-logo-light.svg\" width=\"352\" height=\"59\" style=\"max-width: 100%;\">\n</picture>\n<br/>\n<br/>\n</p>\n\n<p align=\"center\">\n<a href=\"https://circleci.com/gh/huggingface/transformers\">\n<img alt=\"Build\" src=\"https://img.shields.io/circleci/build/github/huggingface/transformers/main\">\n</a>\n<a href=\"https://github.com/huggingface/transformers/blob/main/LICENSE\">\n<img alt=\"GitHub\" src=\"https://img.shields.io/github/license/huggingface/transformers.svg?color=blue\">\n</a>\n<a href=\"https://huggingface.co/docs/transformers/index\">\n<img alt=\"Documentation\" src=\"https://img.shields.io/website/http/huggingface.co/docs/transformers/index.svg?down_color=red&down_message=offline&up_message=online\">\n</a>\n<a href=\"https://github.com/huggingface/transformers/releases\">\n<img "},{"ref":"P8","kind":"page","title":"togethercomputer/Decentralized_Training repository metadata","date":"2026-06-11T04:19:30.905282+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/Decentralized_Training","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/Decentralized_Training\n\nStars: 4\n\nForks: 2\n\nOpen issues: 0\n\nCreated: 2023-01-27T09:48:05Z\n\nPushed: 2023-03-20T19:36:55Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: DS3Lab/Decentralized_FM_alpha\n\nArchived: no\n\nREADME:\n# GPT-home-private\n\n## Setup:\n\n- Use AWS Deep Learning Base AMI\n\n- Install PyTorch env (old): \n\npip3 install torch==1.9.0+cu111 torchtext -f https://download.pytorch.org/whl/torch_stable.html\npip3 install torch==1.9.0+cu111 -f https://download.pytorch.org/whl/torch_stable.html\npip install torch==1.10.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html\n\n# Magic, not sure why cupy-cuda111 would not work, it seems that cupy-cuda111 will use different PTX from torch.\npip3 install cupy-cuda111==8.6.0\npip3 install transformers\n\n- Install PyTorch env (latest):\n\npip3 install --pre torch==1.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.html\npip3 install cupy-cuda11x==11.0.0\npython3 -m cupyx.tools.install_library --cuda 11.x --library nccl\npip3 install transformers\n\n- Install PyTorch env (CPU-latest):\n\npip3 install --pre torch==1.12.0+cpu -f https://download.pytorch.org/whl/torch_stable.html\n\n- Install deepspeed for some micro-benchmark (optional)\n\npip install deepspeed\n\n- Clone this repo:\n\ngit clone https://github.com/BinhangYuan/GPT-home-private.git\n\n- set the github cache (Optional):\n\ngit config credential.helper 'cache --timeout=30000'\n\n- Download a tiny dataset:\n\nwget https://binhang-language-datasets.s3.us-west-2.amazonaws.com/glue_qqp_dataset/data.tar.xz -P ./glue_dataset/\n\ntar -xvf ./glue_dataset/data.tar.xz -C ./glue_dataset/\n\n- Setup network configuration:\n\nexport GLOO_SOCKET_IFNAME=ens3\n\nexport NCCL_SOCKET_IFNAME=ens3\n\n- Use TC scripts to control network delay and bandwidth:\n\n## Run Distributed Gpipe:\n\n- On each node, run:\n\npython dist_pipeline_runner.py --dist-url tcp://XXX.XXX.XXX.XXX:9000 --world-size N --rank i (i=0,...,N-1)\n\n## Run deepspeed benchmark:\n\n- Update public-ip and hostname in the ./scripts/ip_list.sh file\n\n- Update the host name with slots (number of GPUs) in ./scripts/ds_hostnames.sh file\n\n- Sync code to all nodes\n\n- Setup password free ssh cluster by executing (under the ./scrip"},{"ref":"P9","kind":"page","title":"togethercomputer/FT_Bloomchat repository metadata","date":"2026-06-11T04:19:30.894273+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/FT_Bloomchat","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/FT_Bloomchat\n\nDescription: Transformer related optimization, including BERT, GPT\n\nLanguage: C++\n\nLicense: Apache-2.0\n\nStars: 1\n\nForks: 1\n\nOpen issues: 2\n\nCreated: 2023-01-30T15:28:50Z\n\nPushed: 2023-05-24T11:45:43Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: NVIDIA/FasterTransformer\n\nArchived: no\n\nREADME:\n# FasterTransformer\n\nThis repository provides a script and recipe to run the highly optimized transformer-based encoder and decoder component, and it is tested and maintained by NVIDIA.\n\n## Table Of Contents\n\n- [FasterTransformer](#fastertransformer)\n- [Table Of Contents](#table-of-contents)\n- [Model overview](#model-overview)\n- [Support matrix](#support-matrix)\n- [Advanced](#advanced)\n- [Global Environment](#global-environment)\n- [Performance](#performance)\n- [BERT base performance](#bert-base-performance)\n- [BERT base performances of FasterTransformer new features](#bert-base-performances-of-fastertransformer-new-features)\n- [BERT base performance on TensorFlow](#bert-base-performance-on-tensorflow)\n- [BERT base performance on PyTorch](#bert-base-performance-on-pytorch)\n- [Decoding and Decoder performance](#decoding-and-decoder-performance)\n- [Decoder and Decoding end-to-end translation performance on TensorFlow](#decoder-and-decoding-end-to-end-translation-performance-on-tensorflow)\n- [Decoder and Decoding end-to-end translation performance on PyTorch](#decoder-and-decoding-end-to-end-translation-performance-on-pytorch)\n- [GPT performance](#gpt-performance)\n- [Release notes](#release-notes)\n- [Changelog](#changelog)\n- [Known issues](#known-issues)\n\n## Model overview\n\nIn NLP, encoder and decoder are two important components, with the transformer layer becoming a popular architecture for both components. FasterTransformer implements a highly optimized transformer layer for both the encoder and decoder for inference. On Volta, Turing and Ampere GPUs, the computing power of Tensor Cores are used automatically when the precision of the data and weights are FP16.\n\nFasterTransformer is built on top of CUDA, cuBLAS, cuBLASLt and C++. We provide at least one API of the following frameworks: TensorFlow, PyTorch and Triton backend. Users can "},{"ref":"P10","kind":"page","title":"togethercomputer/H3 repository metadata","date":"2026-06-11T04:19:30.790882+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/H3","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/H3\n\nDescription: Together port to run H3\n\nLanguage: Assembly\n\nLicense: Apache-2.0\n\nStars: 1\n\nForks: 1\n\nOpen issues: 1\n\nCreated: 2023-02-02T20:41:50Z\n\nPushed: 2024-04-30T22:26:33Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: HazyResearch/H3\n\nArchived: no\n\nREADME:\n# Hungry Hungry Hippos (H3)\n\nThis repository provides the official implementation of H3 from the\nfollowing paper.\n\n**Hungry Hungry Hippos: Towards Language Modeling with State Space Models** \nTri Dao\\*, Daniel Y. Fu\\*, Khaled K. Saab, Armin W. Thomas, Atri Rudra, Christopher Ré \nInternational Conference on Learning Representations, 2023. Notable top-25% (spotlight). \nPaper: https://arxiv.org/abs/2212.14052\n\n![H3](assets/banner.png)\n\n# Code & model release\n\nYou can find model weights on the Hugging Face Hub here (under \"Files and Versions\" for each model):\n* [125M](https://huggingface.co/danfu09/H3-125M)\n* [355M](https://huggingface.co/danfu09/H3-355M)\n* [1.3B](https://huggingface.co/danfu09/H3-1.3B)\n* [2.7B](https://huggingface.co/danfu09/H3-2.7B)\n\n## Loading weights and running inference\n\nExamples of how to load the weights and run inference are given in [benchmarks/benchmark_generation.py](benchmarks/benchmark_generation.py) and [examples/generate_text_h3.py](examples/generate_text_h3.py).\n\nHere's an example of how to download and run our 125M model (you may need to install FlashAttention):\n\n```\ngit lfs install\ngit clone https://huggingface.co/danfu09/H3-125M\n\ngit clone https://github.com/HazyResearch/H3.git\n\nPYTHONPATH=$(pwd)/H3 python H3/examples/generate_text_h3.py --ckpt H3-125M/model.pt --prompt \"Hungry Hungry Hippos: Towards Language Modeling With State Space Models is a new language model that\" --dmodel 768 --nlayer 12 --attn-layer-idx 6 --nheads=12\n```\n\nYou should get an output like this (may change due to sampling in the text generation):\n> Hungry Hungry Hippos: Towards Language Modeling With State Space Models is a new language model that uses state-space models to create a human-like vocabulary that can help improve human understanding and judgment of language. It takes a human's past experience of language, and tries to capture their cognitive patterns. State Spac"},{"ref":"P11","kind":"page","title":"togethercomputer/UniversalSD repository metadata","date":"2026-06-11T04:19:30.503148+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/UniversalSD","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/UniversalSD\n\nDescription: Universal Stable Diffusion Pipeline(s) with Flash Attention\n\nLanguage: Python\n\nStars: 3\n\nForks: 0\n\nOpen issues: 1\n\nCreated: 2023-02-14T18:37:38Z\n\nPushed: 2024-07-25T10:52:49Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: ResearchComputer/UniversalSD\n\nArchived: no\n\nREADME: none published or not readable through the GitHub API."},{"ref":"P12","kind":"page","title":"togethercomputer/native_hf_models-slim repository metadata","date":"2026-06-11T04:19:29.955725+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/native_hf_models-slim","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/native_hf_models-slim\n\nLanguage: Python\n\nStars: 1\n\nForks: 2\n\nOpen issues: 0\n\nCreated: 2023-04-20T04:19:24Z\n\nPushed: 2023-06-09T04:44:38Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: csris/Quick_Deployment_HELM\n\nArchived: no\n\nREADME:\n# csm_llsmserve\n\nThis repo is a WIP image for running HF models.\n\nIt was forked from Quick_Deployment_HELM then mutilated with a chainsaw.\n\nRemoved: \n* CONDA\n* All testing/job submission scripts\n\nUpdated:\n* CUDA 11.3 -> 11.8\n* Package layout"},{"ref":"P13","kind":"page","title":"togethercomputer/gpt-neox repository metadata","date":"2026-06-11T04:19:29.930845+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/gpt-neox","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/gpt-neox\n\nDescription: An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.\n\nLanguage: Python\n\nLicense: Apache-2.0\n\nStars: 2\n\nForks: 0\n\nOpen issues: 0\n\nCreated: 2023-04-24T11:26:48Z\n\nPushed: 2023-04-24T16:59:51Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: EleutherAI/gpt-neox\n\nArchived: no\n\nREADME:\n[![GitHub issues](https://img.shields.io/github/issues/EleutherAI/gpt-neox)](https://github.com/EleutherAI/gpt-neox/issues)\n[<img src=\"https://raw.githubusercontent.com/wandb/assets/main/wandb-github-badge-28.svg\" alt=\"Weights & Biases monitoring\" height=20>](https://wandb.ai/eleutherai/neox)\n\n# GPT-NeoX\n\nThis repository records [EleutherAI](https://www.eleuther.ai)'s library for training large-scale language models on GPUs. Our current framework is based on NVIDIA's [Megatron Language Model](https://github.com/NVIDIA/Megatron-LM) and has been augmented with techniques from [DeepSpeed](https://www.deepspeed.ai) as well as some novel optimizations. We aim to make this repo a centralized and accessible place to gather techniques for training large-scale autoregressive language models, and accelerate research into large-scale training.\n\nFor those looking for a TPU-centric codebase, we recommend [Mesh Transformer JAX](https://github.com/kingoflolz/mesh-transformer-jax).\n\n**If you are not looking to train models with billions of parameters from scratch, this is likely the wrong library to use. For generic inference needs, we recommend you use the Hugging Face `transformers` library instead which supports GPT-NeoX models.**\n\n## GPT-NeoX 2.0\n\nPrior to 3/9/2023, GPT-NeoX relied on [DeeperSpeed](https://github.com/EleutherAI/DeeperSpeed), which was based on an old version of DeepSpeed (0.3.15). In order to migrate to the latest upstream DeepSpeed version while allowing users to access the old versions of GPT-NeoX and DeeperSpeed, we have introduced two versioned releases for both libraries:\n\n- Version 1.0 of [GPT-NeoX](https://github.com/EleutherAI/gpt-neox/releases/tag/v1.0) and [DeeperSpeed](https://github.com/EleutherAI/DeeperSpeed/releases/tag/v1.0) maintain snapshots of the old stable versions "},{"ref":"P14","kind":"page","title":"togethercomputer/redpajama.cpp repository metadata","date":"2026-06-11T04:19:29.801436+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/redpajama.cpp","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/redpajama.cpp\n\nDescription: Extend the original llama.cpp repo to support redpajama model.\n\nLanguage: C\n\nLicense: MIT\n\nStars: 117\n\nForks: 14\n\nOpen issues: 3\n\nCreated: 2023-05-03T14:14:29Z\n\nPushed: 2024-09-03T22:47:38Z\n\nDefault branch: master\n\nFork: yes\n\nParent repository: ggml-org/llama.cpp\n\nArchived: no\n\nREADME:\n# llama.cpp\n\n![llama](https://user-images.githubusercontent.com/1991296/230134379-7181e485-c521-4d23-a0d6-f7b3b61ba524.png)\n\n[![Actions Status](https://github.com/ggerganov/llama.cpp/workflows/CI/badge.svg)](https://github.com/ggerganov/llama.cpp/actions)\n[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)\n\nInference of [LLaMA](https://arxiv.org/abs/2302.13971) model in pure C/C++\n\n**Hot topics:**\n\n- [Roadmap May 2023](https://github.com/ggerganov/llama.cpp/discussions/1220)\n- [New quantization methods](https://github.com/ggerganov/llama.cpp#quantization)\n\n## RedPajama Support\n\nFor RedPajama Models, see [this example](https://github.com/togethercomputer/redpajama.cpp/tree/master/examples/redpajama).\n\n## Description\n\nThe main goal of `llama.cpp` is to run the LLaMA model using 4-bit integer quantization on a MacBook\n\n- Plain C/C++ implementation without dependencies\n- Apple silicon first-class citizen - optimized via ARM NEON and Accelerate framework\n- AVX, AVX2 and AVX512 support for x86 architectures\n- Mixed F16 / F32 precision\n- 4-bit, 5-bit and 8-bit integer quantization support\n- Runs on the CPU\n- OpenBLAS support\n- cuBLAS and CLBlast support\n\nThe original implementation of `llama.cpp` was [hacked in an evening](https://github.com/ggerganov/llama.cpp/issues/33#issuecomment-1465108022).\nSince then, the project has improved significantly thanks to many contributions. This project is for educational purposes and serves\nas the main playground for developing new features for the [ggml](https://github.com/ggerganov/ggml) library.\n\n**Supported platforms:**\n\n- [X] Mac OS\n- [X] Linux\n- [X] Windows (via CMake)\n- [X] Docker\n\n**Supported models:**\n\n- [X] LLaMA 🦙\n- [X] [Alpaca](https://github.com/ggerganov/llama.cpp#instruction-mode-with-alpaca)\n- [X] [GPT4All](https://github.com/ggerganov"},{"ref":"P15","kind":"page","title":"togethercomputer/FT_Redpajama repository metadata","date":"2026-06-11T04:19:29.679617+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/FT_Redpajama","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/FT_Redpajama\n\nDescription: Transformer related optimization, including BERT, GPT\n\nLanguage: C++\n\nLicense: Apache-2.0\n\nStars: 1\n\nForks: 1\n\nOpen issues: 0\n\nCreated: 2023-05-19T07:43:05Z\n\nPushed: 2023-07-20T06:34:52Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: NVIDIA/FasterTransformer\n\nArchived: no\n\nREADME:\n# Deploy FT Inference of RedPajama Models Under TogetherCompute Infra\n\n### Build the docker image:\n\nsudo docker build -t ft_redpajama --file Redpajama-Together-Dockerfile .\n\n### Convert RedPajama model to FT format:\n\n- Download the checkpoint of RedPajama model from Hugging Face (e.g., RedPajama-INCITE-Chat-7B-v0.1):\n\ngit lfs clone https://huggingface.co/togethercomputer/RedPajama-INCITE-Chat-7B-v0.1\n\n- Start the ft_redpajama container:\n\nsudo nvidia-docker run --ipc=host --network=host --name test_ft_redpajama -ti -v /PATH_TO_PARENT_DIR_OF_DOWNLOADED_HF_WEIGHTS:/workspace/FasterTransformer/build/model ft_redpajama bash\n\n- Run the converting script inside the container:\n\npython /workspace/FasterTransformer/examples/pytorch/gptneox/utils/huggingface_gptneox_convert.py -i /workspace/FasterTransformer/build/model/RedPajama-INCITE-Chat-7B-v0.1 -o /workspace/FasterTransformer/build/model/ft-RedPajama-INCITE-Chat-7B-v0.1 -i_g 1 -m_n RedPajama-INCITE-Chat-7B-v0.1 -weight_data_type fp16\n\n### To deploy the model:\n\n- Inside the container, start the together node:\n\n/usr/local/bin/together-node start\n\n- Inside the container, start the worker process (probably need to change some args to support different models):\n\npython /workspace/FasterTransformer/examples/pytorch/gptneox/serving_redpajama_single_gpu.py"},{"ref":"P16","kind":"page","title":"togethercomputer/FasterTransformer repository metadata","date":"2026-06-11T04:19:29.649426+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/FasterTransformer","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/FasterTransformer\n\nDescription: Transformer related optimization, including BERT, GPT\n\nLanguage: C++\n\nLicense: Apache-2.0\n\nStars: 0\n\nForks: 0\n\nOpen issues: 0\n\nCreated: 2023-05-25T23:52:42Z\n\nPushed: 2023-07-27T06:56:44Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: NVIDIA/FasterTransformer\n\nArchived: no\n\nREADME:\n# FasterTransformer\n\nThis repository provides a script and recipe to run the highly optimized transformer-based encoder and decoder component, and it is tested and maintained by NVIDIA.\n\n## Table Of Contents\n\n- [FasterTransformer](#fastertransformer)\n- [Table Of Contents](#table-of-contents)\n- [Model overview](#model-overview)\n- [Support matrix](#support-matrix)\n- [Advanced](#advanced)\n- [Global Environment](#global-environment)\n- [Performance](#performance)\n- [BERT base performance](#bert-base-performance)\n- [BERT base performances of FasterTransformer new features](#bert-base-performances-of-fastertransformer-new-features)\n- [BERT base performance on TensorFlow](#bert-base-performance-on-tensorflow)\n- [BERT base performance on PyTorch](#bert-base-performance-on-pytorch)\n- [Decoding and Decoder performance](#decoding-and-decoder-performance)\n- [Decoder and Decoding end-to-end translation performance on TensorFlow](#decoder-and-decoding-end-to-end-translation-performance-on-tensorflow)\n- [Decoder and Decoding end-to-end translation performance on PyTorch](#decoder-and-decoding-end-to-end-translation-performance-on-pytorch)\n- [GPT performance](#gpt-performance)\n- [Release notes](#release-notes)\n- [Changelog](#changelog)\n- [Known issues](#known-issues)\n\n## Model overview\n\nIn NLP, encoder and decoder are two important components, with the transformer layer becoming a popular architecture for both components. FasterTransformer implements a highly optimized transformer layer for both the encoder and decoder for inference. On Volta, Turing and Ampere GPUs, the computing power of Tensor Cores are used automatically when the precision of the data and weights are FP16.\n\nFasterTransformer is built on top of CUDA, cuBLAS, cuBLASLt and C++. We provide at least one API of the following frameworks: TensorFlow, PyTorch and Triton backend. Users"},{"ref":"P17","kind":"page","title":"togethercomputer/FT_Llama2 repository metadata","date":"2026-06-11T04:19:29.021742+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/FT_Llama2","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/FT_Llama2\n\nDescription: Transformer related optimization, including BERT, GPT\n\nLanguage: C++\n\nLicense: Apache-2.0\n\nStars: 3\n\nForks: 1\n\nOpen issues: 1\n\nCreated: 2023-07-20T15:21:47Z\n\nPushed: 2024-07-25T11:41:06Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: void-main/FasterTransformer\n\nArchived: no\n\nREADME:\n# FasterTransformer Support for Llama2"},{"ref":"P18","kind":"page","title":"togethercomputer/vllm repository metadata","date":"2026-06-11T04:19:28.947445+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/vllm","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/vllm\n\nDescription: A high-throughput and memory-efficient inference and serving engine for LLMs\n\nLanguage: Python\n\nLicense: Apache-2.0\n\nStars: 1\n\nForks: 0\n\nOpen issues: 0\n\nCreated: 2023-08-27T16:35:05Z\n\nPushed: 2025-06-13T17:16:06Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: vllm-project/vllm\n\nArchived: no\n\nREADME:\n<p align=\"center\">\n<picture>\n<source media=\"(prefers-color-scheme: dark)\" srcset=\"https://raw.githubusercontent.com/vllm-project/vllm/main/docs/source/assets/logos/vllm-logo-text-dark.png\">\n<img alt=\"vLLM\" src=\"https://raw.githubusercontent.com/vllm-project/vllm/main/docs/source/assets/logos/vllm-logo-text-light.png\" width=55%>\n</picture>\n</p>\n\n<h3 align=\"center\">\nEasy, fast, and cheap LLM serving for everyone\n</h3>\n\n<p align=\"center\">\n| <a href=\"https://vllm.readthedocs.io/en/latest/\"><b>Documentation</b></a> | <a href=\"https://vllm.ai\"><b>Blog</b></a> | <a href=\"https://github.com/vllm-project/vllm/discussions\"><b>Discussions</b></a> |\n\n</p>\n\n---\n\n*Latest News* 🔥\n- [2023/07] Added support for LLaMA-2! You can run and serve 7B/13B/70B LLaMA-2s on vLLM with a single command!\n- [2023/06] Serving vLLM On any Cloud with SkyPilot. Check out a 1-click [example](https://github.com/skypilot-org/skypilot/blob/master/llm/vllm) to start the vLLM demo, and the [blog post](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/) for the story behind vLLM development on the clouds.\n- [2023/06] We officially released vLLM! FastChat-vLLM integration has powered [LMSYS Vicuna and Chatbot Arena](https://chat.lmsys.org) since mid-April. Check out our [blog post](https://vllm.ai).\n\n---\n\nvLLM is a fast and easy-to-use library for LLM inference and serving.\n\nvLLM is fast with:\n\n- State-of-the-art serving throughput\n- Efficient management of attention key and value memory with **PagedAttention**\n- Continuous batching of incoming requests\n- Optimized CUDA kernels\n\nvLLM is flexible and easy to use with:\n\n- Seamless integration with popular HuggingFace models\n- High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more\n- Tensor parallelism support for distributed in"},{"ref":"P19","kind":"page","title":"togethercomputer/vllm-ttgi repository metadata","date":"2026-06-11T04:19:28.844821+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/vllm-ttgi","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/vllm-ttgi\n\nDescription: A high-throughput and memory-efficient inference and serving engine for LLMs\n\nLanguage: Python\n\nLicense: Apache-2.0\n\nStars: 2\n\nForks: 0\n\nOpen issues: 0\n\nCreated: 2023-09-06T11:41:48Z\n\nPushed: 2023-10-03T18:23:52Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: OlivierDehaene/vllm\n\nArchived: no\n\nREADME:\n<p align=\"center\">\n<picture>\n<source media=\"(prefers-color-scheme: dark)\" srcset=\"https://raw.githubusercontent.com/vllm-project/vllm/main/docs/source/assets/logos/vllm-logo-text-dark.png\">\n<img alt=\"vLLM\" src=\"https://raw.githubusercontent.com/vllm-project/vllm/main/docs/source/assets/logos/vllm-logo-text-light.png\" width=55%>\n</picture>\n</p>\n\n<h3 align=\"center\">\nEasy, fast, and cheap LLM serving for everyone\n</h3>\n\n<p align=\"center\">\n| <a href=\"https://vllm.readthedocs.io/en/latest/\"><b>Documentation</b></a> | <a href=\"https://vllm.ai\"><b>Blog</b></a> | <a href=\"https://github.com/vllm-project/vllm/discussions\"><b>Discussions</b></a> |\n\n</p>\n\n---\n\n*Latest News* 🔥\n- [2023/06] Serving vLLM On any Cloud with SkyPilot. Check out a 1-click [example](https://github.com/skypilot-org/skypilot/blob/master/llm/vllm) to start the vLLM demo, and the [blog post](https://blog.skypilot.co/serving-llm-24x-faster-on-the-cloud-with-vllm-and-skypilot/) for the story behind vLLM development on the clouds.\n- [2023/06] We officially released vLLM! FastChat-vLLM integration has powered [LMSYS Vicuna and Chatbot Arena](https://chat.lmsys.org) since mid-April. Check out our [blog post](https://vllm.ai).\n\n---\n\nvLLM is a fast and easy-to-use library for LLM inference and serving.\n\nvLLM is fast with:\n\n- State-of-the-art serving throughput\n- Efficient management of attention key and value memory with **PagedAttention**\n- Continuous batching of incoming requests\n- Optimized CUDA kernels\n\nvLLM is flexible and easy to use with:\n\n- Seamless integration with popular HuggingFace models\n- High-throughput serving with various decoding algorithms, including *parallel sampling*, *beam search*, and more\n- Tensor parallelism support for distributed inference\n- Streaming outputs\n- OpenAI-compatible API server\n\nvLLM seamlessly supports many Huggingface mod"},{"ref":"P20","kind":"page","title":"togethercomputer/lm-evaluation-harness repository metadata","date":"2026-06-11T04:19:28.715709+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/lm-evaluation-harness","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/lm-evaluation-harness\n\nDescription: A framework for few-shot evaluation of autoregressive language models.\n\nLicense: MIT\n\nStars: 0\n\nForks: 0\n\nOpen issues: 0\n\nCreated: 2023-10-30T16:26:08Z\n\nPushed: 2023-10-30T16:43:37Z\n\nDefault branch: master\n\nFork: yes\n\nParent repository: EleutherAI/lm-evaluation-harness\n\nArchived: no\n\nREADME:\n# Language Model Evaluation Harness\n\n## We're Refactoring LM-Eval!\n(as of 6/15/23)\nWe have a revamp of the Evaluation Harness library internals staged on the [big-refactor](https://github.com/EleutherAI/lm-evaluation-harness/tree/big-refactor) branch! It is far along in progress, but before we start to move the `master` branch of the repository over to this new design with a new version release, we'd like to ensure that it's been tested by outside users and there are no glaring bugs.\n\nWe’d like your help to test it out! you can help by:\n1. Trying out your current workloads on the big-refactor branch, and seeing if anything breaks or is counterintuitive,\n2. Porting tasks supported in the previous version of the harness to the new YAML configuration format. Please check out our [task implementation guide](https://github.com/EleutherAI/lm-evaluation-harness/blob/big-refactor/docs/new_task_guide.md) for more information.\n\nIf you choose to port a task not yet completed according to [our checklist](https://github.com/EleutherAI/lm-evaluation-harness/blob/big-refactor/lm_eval/tasks/README.md), then you can contribute it by opening a PR containing [Refactor] in the name with:\n- A shell command to run the task in the `master` branch, and what the score is\n- A shell command to run the task in your PR branch to `big-refactor`, and what the resulting score is, to show that we achieve equality between the two implementations.\n\nLastly, we'll no longer be accepting new feature requests beyond those that are already open to the master branch as we carry out this switch to the new version over the next week, though we will be accepting bugfixes to `master` branch and PRs to `big-refactor`. Feel free to reach out in the #lm-thunderdome channel of the EAI discord for more information.\n\n## Overview\n\nThis project provides a unified framework"},{"ref":"P21","kind":"page","title":"togethercomputer/llm-awq-ttgi repository metadata","date":"2026-06-11T04:19:28.663997+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/llm-awq-ttgi","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/llm-awq-ttgi\n\nDescription: AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration\n\nLanguage: Python\n\nLicense: MIT\n\nStars: 1\n\nForks: 1\n\nOpen issues: 0\n\nCreated: 2023-09-08T23:56:44Z\n\nPushed: 2023-09-08T23:57:53Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: mit-han-lab/llm-awq\n\nArchived: no\n\nREADME:\n# AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration [[Paper](https://arxiv.org/abs/2306.00978)]\n\n**Efficient and accurate** low-bit weight quantization (INT3/4) for LLMs, supporting **instruction-tuned** models and **multi-modal** LMs.\n\n![overview](figures/overview.png)\n\nThe current release supports: \n\n- AWQ search for accurate quantization. \n- Pre-computed AWQ model zoo for LLMs (LLaMA-1&2, OPT, Vicuna, LLaVA; load to generate quantized weights).\n- Memory-efficient 4-bit Linear in PyTorch.\n- Efficient CUDA kernel implementation for fast inference (support context and decoding stage).\n- Examples on 4-bit inference of an instruction-tuned model (Vicuna) and multi-modal LM (LLaVA).\n\n![TinyChat on Orin: W4A16 is 3.2x faster than FP16](./tinychat/figures/orin_example.gif)\n\nCheck out [TinyChat](tinychat), which delievers **30 tokens/second** inference performance (**3.2x faster** than FP16) for the **LLaMA-2** chatbot on the resource-constrained NVIDIA Jetson Orin! \n\nIt also offers a turn-key solution for **on-device inference** of LLMs on **resource-constrained edge platforms**. With TinyChat, it is now possible to run **large** models on **small** and **low-power** devices even without Internet connection.\n\n## News\n- [2023/09] ⚡ Check out our latest [**TinyChat**](tinychat), which is ~2x faster than the first release on Orin!\n- [2023/09] ⚡ Check out [**AutoAWQ**](https://github.com/casper-hansen/AutoAWQ), a third-party implementation to make AWQ easier to expand to new models, improve inference speed, and integrate into Huggingface.\n- [2023/07] 🔥 We released **TinyChat**, an efficient and lightweight chatbot interface based on AWQ. TinyChat enables efficient LLM inference on both cloud and edge GPUs. LLama-2-chat models are supported! Check out our implementation [here](tinychat).\n- [2023/07]"},{"ref":"P22","kind":"page","title":"togethercomputer/js-eventsource repository metadata","date":"2026-06-11T04:19:27.991429+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/js-eventsource","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/js-eventsource\n\nDescription: EventSource client for Node.js and Browser (polyfill)\n\nLanguage: JavaScript\n\nLicense: MIT\n\nStars: 0\n\nForks: 0\n\nOpen issues: 0\n\nCreated: 2023-11-07T00:16:04Z\n\nPushed: 2024-10-17T22:21:43Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: launchdarkly/js-eventsource\n\nArchived: no\n\nREADME:\n# EventSource [![npm version](http://img.shields.io/npm/v/launchdarkly-eventsource.svg?style=flat-square)](http://browsenpm.org/package/launchdarkly-eventsource)[![Circle CI](https://circleci.com/gh/launchdarkly/js-eventsource/tree/master.svg?style=svg)](https://circleci.com/gh/launchdarkly/js-eventsource/tree/master)[![NPM Downloads](https://img.shields.io/npm/dm/laumchdarkly-eventsource.svg?style=flat-square)](http://npm-stat.com/charts.html?package=launchdarkly-eventsource&from=2015-09-01)[![Dependencies](https://img.shields.io/david/launchdarkly/js-eventsource.svg?style=flat-square)](https://david-dm.org/launchdarkly/js-eventsource)\n\nThis library is a pure JavaScript implementation of the [EventSource](https://html.spec.whatwg.org/multipage/server-sent-events.html#server-sent-events) client. The API aims to be W3C compatible.\n\nYou can use it with Node.js, or as a browser polyfill for\n[browsers that don't have native `EventSource` support](http://caniuse.com/#feat=eventsource). However, the current implementation is inefficient in a browser due to the use of Node API shims, and is not recommended for use as a polyfill; a future release will improve this.\n\nThis is a fork of the original [EventSource](https://github.com/EventSource/eventsource) project by Aslak Hellesøy, with additions to support the requirements of the LaunchDarkly SDKs. Note that as described in the [changelog](CHANGELOG.md), the API is _not_ backward-compatible with the original package, although it can be used with minimal changes.\n\n## Install\n\nnpm install launchdarkly-eventsource\n\n## Example\n\nnpm install\nnode ./example/sse-server.js\nnode ./example/sse-client.js # Node.js client\nopen http://localhost:8080 # Browser client - both native and polyfill\ncurl http://localhost:8080/sse # Enjoy the simplicity of SSE\n\n## Browser Polyfill\n\nJust add `example/eventsourc"},{"ref":"P23","kind":"page","title":"togethercomputer/helm repository metadata","date":"2026-06-11T04:19:27.956548+00:00","date_source":null,"source_url":"https://github.com/togethercomputer/helm","signal_url":null,"signal_json_url":null,"text":"# togethercomputer/helm\n\nDescription: Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110).\n\nLanguage: Python\n\nLicense: Apache-2.0\n\nStars: 0\n\nForks: 0\n\nOpen issues: 15\n\nCreated: 2023-10-30T16:46:33Z\n\nPushed: 2024-07-25T10:40:15Z\n\nDefault branch: main\n\nFork: yes\n\nParent repository: stanford-crfm/helm\n\nArchived: no\n\nREADME:\n<!--intro-start-->\n\n# Holistic Evaluation of Language Models\n\n[comment]: <> (When using the img tag, which allows us to specify size, src has to be a URL.)\n<img src=\"https://github.com/stanford-crfm/helm/raw/main/src/helm/benchmark/static/images/helm-logo.png\" alt=\"\" width=\"800\"/>\n\nWelcome! The **`crfm-helm`** Python package contains code used in the **Holistic Evaluation of Language Models** project ([paper](https://arxiv.org/abs/2211.09110), [website](https://crfm.stanford.edu/helm/latest/)) by [Stanford CRFM](https://crfm.stanford.edu/). This package includes the following features:\n\n- Collection of datasets in a standard format (e.g., NaturalQuestions)\n- Collection of models accessible via a unified API (e.g., GPT-3, MT-NLG, OPT, BLOOM)\n- Collection of metrics beyond accuracy (efficiency, bias, toxicity, etc.)\n- Collection of perturbations for evaluating robustness and fairness (e.g., typos, dialect)\n- Modular framework for constructing prompts from datasets\n- Proxy server for managing accounts and providing unified interface to access models\n<!--intro-end-->\n\nTo get started, refer to [the documentation on Read the Docs](https://crfm-helm.readthedocs.io/) for how to install and run the package.\n\n## Directory Structure\n\nThe directory structure for this repo is as follows\n\n```\n├── docs # MD used to generate readthedocs\n│\n├── scripts # Python utility scripts for HELM\n│ ├── cache\n│ ├── data_overlap # Calculate train test overlap\n│ │ ├── common\n│ │ ├── scenarios\n│ │ └── test\n│ ├── efficiency\n│ ├── fact_completion\n│ ├── offline_eval\n│ └── scale\n└── src\n├── helm # Benchmarking Scripts for HELM\n│ │\n│ ├── benchmark # Main Python code for running HELM\n│ │ │\n│ │ └── static # Current JS (Jquery) code for rendering front-end\n│ │ │\n│ │ └── ...\n│ │\n│ ├── common # Addi"},{"ref":"P24","kind":"page","title":"Director, Data Center Operations","date":"2026-06-11T04:13:57.3134+00:00","date_source":null,"source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5101202007","signal_url":null,"signal_json_url":null,"text":"Job Application for Director, Data Center Operations at Together AI \nBack to jobs \nDirector, Data Center Operations\nSan Francisco\n\nApply \nAbout the Role \n\nTogether AI is scaling its physical AI infrastructure rapidly — and we're looking for a Director of Data Center Operations to help us build it right. This is a ground-floor opportunity to own the operational foundation of Together's growing data center portfolio across the US and Asia.\n\nYou'll be responsible for designing and commissioning white space deployments — taking pre-built environments and fitting them out with the power distribution, cooling distribution, and systems infrastructure needed to run high-density GPU workloads at scale. At the same time, you'll be building the break-fix and smart hands team from scratch: hiring, defining the playbook, and standing up the function that keeps our sites running around the clock.\n\nThis is not a steady-state operations role. It's a builder role. You'll be joining a small but fast-moving team, with real ownership over outcomes and the autonomy to shape how Together AI operates its physical infrastructure for years to come. If you've scaled data center infrastructure through hypergrowth before and want to do it again with more ownership — this is that opportunity.\n\nResponsibilities \n\nOwn the design, fit-out, and commissioning of white space sites across the US and Asia, with a focus on power distribution (PDUs), cooling distribution (CDUs), and IT-adjacent infrastructure\n\nBuild and lead a ~20-person break-fix and smart hands team from scratch — define the operating model, hire the initial team, and establish the processes and playbooks that keep sites running\n\nManage a portfolio of 5+ sites across two regions in various stages of deployment and live operation\n\nPartner with vendors, contractors, and equipment suppliers to drive site deployments to schedule and quality\n\nEstablish operational standards, runbooks, and escalation processes for a nascent but rapidly growing infrastructure function\n\nServe as the technical authority on data center infrastructure decisions — from white space evaluation through to live production operation\n\nTravel to Asia periodically to "},{"ref":"P25","kind":"page","title":"Sr. Partnerships Manager, Model Ecosystem","date":"2026-06-11T04:13:57.311111+00:00","date_source":null,"source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5100639007","signal_url":null,"signal_json_url":null,"text":"Job Application for Sr. Partnerships Manager, Model Ecosystem at Together AI \nBack to jobs \nSr. Partnerships Manager, Model Ecosystem\nSan Francisco\n\nApply \nAbout the Role \n\nAs the Partnerships Manager for our Model Ecosystem, you will be the primary architect of Together AI’s model library. This is a high-impact, cross-functional role focused on bringing the world’s leading proprietary and open-source models onto the Together platform. You will navigate an ever-evolving landscape to negotiate non-standard, creative deals that provide developers with the best possible building blocks for AI applications.\n\nYou are a \"deal-maker\" who thrives in ambiguity. You will sit at the intersection of Product, Finance, and Marketing, ensuring that our model roadmap is not only technically superior but commercially viable and market-facing.\n\nResponsibilities \n\nNegotiate & Manage Partnerships: Lead end-to-end deal cycles with model builders across all modalities (text, image, video, etc.). You will handle everything from initial outreach to negotiating complex, creative commercial terms for both proprietary and open-source models.\n\nCurate the Model Library: Act as the internal champion for our \"Model Library,\" identifying which frontier and specialized models we should bring to Together AI to maintain our competitive edge.\n\nProduct & Engineering Alignment: Work closely with the Product team to define \"bring-up\" requirements and ensure new models are integrated seamlessly into our inference and fine-tuning stacks.\n\nCommercial Strategy: Partner with Finance to construct deal structures that balance aggressive growth with sustainable unit economics, often involving non-standard incentive structures.\n\nGo-to-Market & Co-Selling: Collaborate with Partner Marketing to announce new integrations and create momentum. You will also engage in co-selling efforts, helping our Sales team understand how to leverage specific partner models to win enterprise deals.\n\nRequirements \n\nCreative Deal Maker: You have a track record of constructing partnership agreements that don’t follow a standard template. You can navigate complex IP, revenue-share, and exclusivity discussions with ease.\n\nHigh EQ & C"},{"ref":"P26","kind":"page","title":"Senior Software Engineer - Together Cloud Infrastructure","date":"2026-06-11T04:13:57.310235+00:00","date_source":null,"source_url":"https://job-boards.greenhouse.io/togetherai/jobs/4749787007","signal_url":null,"signal_json_url":null,"text":"Job Application for Senior Software Engineer - Together Cloud Infrastructure at Together AI \nBack to jobs \nSenior Software Engineer - Together Cloud Infrastructure\nSan Francisco\n\nApply \nAbout the Role\n\nTogether AI is building the AI Acceleration Cloud, an end-to-end platform for the full generative AI lifecycle, combining the fastest LLM inference engine with state-of-the-art AI cloud infrastructure.\n\nAs a Senior AI Infrastructure Engineer, you will play a key role in building the next generation AI cloud platform – a highly available, global, blazing-fast cloud infrastructure that virtualizes cutting-edge ML hardware (GB200s/GB300s, BlueField DPUs) and enables state-of-the-art ML practitioners with self-serve AI cloud services, such as on-demand + managed Kubernetes and Slurm clusters. This platform serves both our internal SaaS products (inference, fine-tuning) and our external cloud customers, spanning dozens of data centers across the world.\n\nResponsibilities \n\nDesign, build, and maintain performant, secure, and highly-available backend services/operators that run in our data centers and automate hardware management, such as Infiniband partitioning, in-DC parallel storage provisioning, and VM provisioning.\n\nDesign and build out the IaaS software layer for a new GB200 data center with thousands of GPUs.\n\nWork on a global multi-exabyte high-performance object store, serving massive datasets for pretraining.\n\nBuild advanced observability stacks for our customers with automated node lifecycle management for fault-tolerant distributed pretraining.\n\nPerform architecture and research work for decentralized AI workloads\n\nWork on the core, open-source Together AI platform\n\nCreate services, tools, and developer documentation\n\nCreate testing frameworks for robustness and fault-tolerance\n\nTo be successful, you’ll need to be deeply technical and possess excellent communication, collaboration, and diplomacy skills. You have strong fundamental software development skills. In addition, you have strong systems knowledge and troubleshooting abilities. \n\nRequirements \n\n5+ years of professional software development experience and proficiency in at least one backend programming "},{"ref":"P27","kind":"page","title":"Solutions Architect ","date":"2026-06-11T04:13:56.287896+00:00","date_source":null,"source_url":"https://job-boards.greenhouse.io/togetherai/jobs/4627491007","signal_url":null,"signal_json_url":null,"text":"page_title \nBack to jobs \nSolutions Architect \nSan Francisco\n\nApply \nAbout the Role \n\nAs a Solutions Architect at Together AI, you will work with customers and prospects to create business value through Generative AI applications. Solutions Architects at Together are trusted advisors to our customers that evaluate, identify and demonstrate how Together can solve their AI needs. As key contributors to our sales organization, Solution Engineers add tremendous value to the customer journey and directly impact company growth and revenue. This is an exciting opportunity for a deeply technical professional passionate about AI and customer success to make a significant impact in a fast-paced, innovative environment.\n\nResponsibilities \n\nAct as a technical advisor to our most strategic customers, deeply embedding with them to support the ideation and development of innovative applications using OSS models on Together AI\n\nRun complex demonstrations and POCs of Together’s entire stack, including both hardware and software solutions\n\nCollaborate with sales to qualify new prospects and support existing customers along their journey to build cutting-edge Generative AI solutions\n\nBuild and maintain strong relationships with customer leadership and stakeholders, ensuring the successful deployment and scaling of their applications\n\nDeliver high-value feedback to our Product, Engineering, and Research teams, ensuring our platform continues to evolve to meet customer needs\n\nBuild educational content and tooling for both internal and external use around Together’s solutions (i.e., playbooks, blogs, demos, etc.)\n\nQualifications \n\n5+ years of experience in a customer-facing technical role with at least 2 years in a pre-sales function \n\nExcellent communication and interpersonal skills, with the ability to explain complex technical concepts to non-technical stakeholders.\n\nAbility to consult with new and existing customers to map business needs to technical solutions\n\nStrong technical background, with knowledge of AI, ML, GPU technologies and their integration into high-performance computing (HPC) environments.\n\nStrong understanding of training, fine-tuning and inference in the context "},{"ref":"P28","kind":"page","title":"Customer Support Engineer (Inference), India","date":"2026-06-11T04:13:56.124299+00:00","date_source":null,"source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5069532007","signal_url":null,"signal_json_url":null,"text":"Job Application for Customer Support Engineer (Inference), India at Together AI \nBack to jobs \nCustomer Support Engineer (Inference), India\nIndia\n\nApply \nAbout the Role \n\nAs a Customer Support Engineer at a pioneering AI company, you'll be the first line of defense to support customers as they build out training, fine tuning, and inference solutions with Together AI. You'll dive deep into complex technical challenges, providing swift and effective solutions while serving as a product expert. As a part of the Customer Experience organization, you will collaborate closely with product and sales, driving continuous improvement of our offerings. This is an exciting opportunity for a deeply technical professional passionate about AI and customer success to make a significant impact in a fast-paced, innovative environment.\n\nResponsibilities \n\nEngage directly with customers to tackle and resolve complex technical challenges involving our cutting-edge GPU clusters and our inference and fine-tuning services; ensure swift and effective solutions every time.\n\nBecome a product expert in all of our Gen AI solutions, serving as the last line of technical defense before issues are escalated to Engineering and Product teams.\n\nCollaborate seamlessly across Engineering, Research, and Product teams to address customer concerns; collaborate with senior leaders both internally and externally to ensure the highest levels of customer satisfaction.\n\nTransform customer insights into action by identifying patterns in support cases and working with Engineering and Go-To-Market teams to drive Together’s roadmap (e.g., future models to support)\n\nMaintain detailed documentation of system configurations, procedures, troubleshooting guides, and FAQs to facilitate knowledge sharing with team and customers.\n\nBe flexible in providing support coverage during holidays, nights and weekends as required by business needs to ensure consistent and reliable service for our customers.\n\nRequirements \n\n5+ years of experience in a customer-facing technical role with at least 1 year in a support function in AI \n\nStrong technical background, with knowledge of AI, ML, GPU technologies and their integration into"},{"ref":"E1","kind":"event","title":"Mamba-3","date":"2026-03-17T00:00:00+00:00","date_source":"rss.item_date","source_url":"https://www.together.ai/blog/mamba-3","signal_url":"https://onlylabs.fyi/signals/fa16acab-08d4-467e-81a9-525afb8eb441","signal_json_url":"https://onlylabs.fyi/signals/fa16acab-08d4-467e-81a9-525afb8eb441/signal.json","text":"post_published · Mamba-3 · signal_desk=talking · occurred_at=2026-03-17T00:00:00+00:00 · url=https://www.together.ai/blog/mamba-3 · hn=300 points/55 comments · raw={\"excerpt\":\"Meet Mamba-3: the SSM built for inference. Faster than Transformers at decode, stronger than Mamba-2, and open-source from day one.\"}"},{"ref":"E2","kind":"event","title":"Serving MiniMax-M3 for efficient inference: Unlocking 1M-Token Context and Multimodality Without Regrets ","date":"2026-06-02T00:00:00+00:00","date_source":"rss.item_date","source_url":"https://www.together.ai/blog/serving-minimax-m3-for-efficient-inference-unlocking-1m-token-context-and-multimodality-without-regrets","signal_url":"https://onlylabs.fyi/signals/33644a67-d468-44ed-8255-6990f9054eec","signal_json_url":"https://onlylabs.fyi/signals/33644a67-d468-44ed-8255-6990f9054eec/signal.json","text":"post_published · Serving MiniMax-M3 for efficient inference: Unlocking 1M-Token Context and Multimodality Without Regrets  · signal_desk=talking · occurred_at=2026-06-02T00:00:00+00:00 · url=https://www.together.ai/blog/serving-minimax-m3-for-efficient-inference-unlocking-1m-token-context-and-multimodality-without-regrets · hn=1 points/0 comments · raw={\"excerpt\":\"How Together served MiniMax-M3 efficiently with KV-block-major sparse attention, paged MSA decode, optimized index scoring, and a Rust-based multimodal gateway.\"}"},{"ref":"E3","kind":"event","title":"Solutions Architect (Inference)","date":"2026-06-10T14:53:46+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/4946442007","signal_url":"https://onlylabs.fyi/signals/42799da3-9b08-4e49-b1c8-4d7de7277bf3","signal_json_url":"https://onlylabs.fyi/signals/42799da3-9b08-4e49-b1c8-4d7de7277bf3/signal.json","text":"job_opened · Solutions Architect (Inference) · signal_desk=hiring · occurred_at=2026-06-10T14:53:46+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/4946442007 · raw={\"location\":\"London\",\"ats\":\"greenhouse\"}"},{"ref":"E4","kind":"event","title":"Building trust in enterprise AI: Together AI earns ISO 27001:2022 certification","date":"2026-06-10T00:00:00+00:00","date_source":"rss.item_date","source_url":"https://www.together.ai/blog/iso-27001-2022-certification","signal_url":"https://onlylabs.fyi/signals/9294f377-1f3d-4b21-8078-53ecff3e7406","signal_json_url":"https://onlylabs.fyi/signals/9294f377-1f3d-4b21-8078-53ecff3e7406/signal.json","text":"post_published · Building trust in enterprise AI: Together AI earns ISO 27001:2022 certification · signal_desk=talking · occurred_at=2026-06-10T00:00:00+00:00 · url=https://www.together.ai/blog/iso-27001-2022-certification · raw={\"excerpt\":\"Together AI has earned ISO 27001:2022 certification, validating our commitment to enterprise-grade security for production AI workloads.\"}"},{"ref":"E5","kind":"event","title":"Technical Account Manager (TAM), GPU Cluster ","date":"2026-06-08T21:52:07+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5123203007","signal_url":"https://onlylabs.fyi/signals/76333246-15a1-424d-bccf-71649d537f2a","signal_json_url":"https://onlylabs.fyi/signals/76333246-15a1-424d-bccf-71649d537f2a/signal.json","text":"job_opened · Technical Account Manager (TAM), GPU Cluster  · signal_desk=hiring · occurred_at=2026-06-08T21:52:07+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5123203007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E6","kind":"event","title":"Staff Engineer, Distributed Storage and HPC & AI Infrastructure","date":"2026-06-04T16:37:35+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5155722007","signal_url":"https://onlylabs.fyi/signals/57b9ecaf-5735-4acf-bcc8-20cb554e9f40","signal_json_url":"https://onlylabs.fyi/signals/57b9ecaf-5735-4acf-bcc8-20cb554e9f40/signal.json","text":"job_opened · Staff Engineer, Distributed Storage and HPC & AI Infrastructure · signal_desk=hiring · occurred_at=2026-06-04T16:37:35+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5155722007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E7","kind":"event","title":"Senior Software Engineer Together Cloud Infrastructure ","date":"2026-06-03T13:11:56+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5028862007","signal_url":"https://onlylabs.fyi/signals/70922365-2270-424e-b8c5-3ef9d42e86ed","signal_json_url":"https://onlylabs.fyi/signals/70922365-2270-424e-b8c5-3ef9d42e86ed/signal.json","text":"job_opened · Senior Software Engineer Together Cloud Infrastructure  · signal_desk=hiring · occurred_at=2026-06-03T13:11:56+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5028862007 · raw={\"location\":\"Amsterdam\",\"ats\":\"greenhouse\"}"},{"ref":"E8","kind":"event","title":"Lead/Manager Together Cloud Infrastructure ","date":"2026-06-03T07:45:52+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5145183007","signal_url":"https://onlylabs.fyi/signals/a54d8fd9-0554-4b8e-8a27-748da3d38a9a","signal_json_url":"https://onlylabs.fyi/signals/a54d8fd9-0554-4b8e-8a27-748da3d38a9a/signal.json","text":"job_opened · Lead/Manager Together Cloud Infrastructure  · signal_desk=hiring · occurred_at=2026-06-03T07:45:52+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5145183007 · raw={\"location\":\"Amsterdam\",\"ats\":\"greenhouse\"}"},{"ref":"E9","kind":"event","title":"Manager, Infrastructure Strategy & Operations","date":"2026-06-02T19:35:43+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5152193007","signal_url":"https://onlylabs.fyi/signals/eaba4e90-796e-43da-8556-2063baa6eb86","signal_json_url":"https://onlylabs.fyi/signals/eaba4e90-796e-43da-8556-2063baa6eb86/signal.json","text":"job_opened · Manager, Infrastructure Strategy & Operations · signal_desk=hiring · occurred_at=2026-06-02T19:35:43+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5152193007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E10","kind":"event","title":"togethercomputer/together-sandbox together-sandbox-workspace-v2.0.0","date":"2026-06-02T08:57:34+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-sandbox/releases/tag/together-sandbox-workspace-v2.0.0","signal_url":"https://onlylabs.fyi/signals/8febc70f-8c38-45b4-b078-4163db722996","signal_json_url":"https://onlylabs.fyi/signals/8febc70f-8c38-45b4-b078-4163db722996/signal.json","text":"release · togethercomputer/together-sandbox together-sandbox-workspace-v2.0.0 · signal_desk=releases · occurred_at=2026-06-02T08:57:34+00:00 · url=https://github.com/togethercomputer/together-sandbox/releases/tag/together-sandbox-workspace-v2.0.0 · raw={\"repo\":\"togethercomputer/together-sandbox\"}"},{"ref":"E11","kind":"event","title":"togethercomputer/tinker-cookbook","date":"2026-06-01T08:29:09+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/tinker-cookbook","signal_url":"https://onlylabs.fyi/signals/6367550f-bd25-4483-8fda-eba8b039b892","signal_json_url":"https://onlylabs.fyi/signals/6367550f-bd25-4483-8fda-eba8b039b892/signal.json","text":"repo_forked · togethercomputer/tinker-cookbook · signal_desk=forks · occurred_at=2026-06-01T08:29:09+00:00 · url=https://github.com/togethercomputer/tinker-cookbook · raw={\"repo\":\"togethercomputer/tinker-cookbook\",\"parent\":\"thinking-machines-lab/tinker-cookbook\"}"},{"ref":"E12","kind":"event","title":"togethercomputer/xorl-wheels tilelang_0.1.10_cu131","date":"2026-05-31T20:55:20+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/xorl-wheels/releases/tag/tilelang_0.1.10_cu131","signal_url":"https://onlylabs.fyi/signals/688f8bcd-26e2-4d14-89f4-40bd8877ea2a","signal_json_url":"https://onlylabs.fyi/signals/688f8bcd-26e2-4d14-89f4-40bd8877ea2a/signal.json","text":"release · togethercomputer/xorl-wheels tilelang_0.1.10_cu131 · signal_desk=releases · occurred_at=2026-05-31T20:55:20+00:00 · url=https://github.com/togethercomputer/xorl-wheels/releases/tag/tilelang_0.1.10_cu131 · raw={\"repo\":\"togethercomputer/xorl-wheels\"}"},{"ref":"E13","kind":"event","title":"togethercomputer/detect_agent detect_agent-v0.3.0","date":"2026-05-29T19:08:51+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/detect_agent/releases/tag/detect_agent-v0.3.0","signal_url":"https://onlylabs.fyi/signals/cd8b0818-971f-4eba-8365-f61bb100577b","signal_json_url":"https://onlylabs.fyi/signals/cd8b0818-971f-4eba-8365-f61bb100577b/signal.json","text":"release · togethercomputer/detect_agent detect_agent-v0.3.0 · signal_desk=releases · occurred_at=2026-05-29T19:08:51+00:00 · url=https://github.com/togethercomputer/detect_agent/releases/tag/detect_agent-v0.3.0 · raw={\"repo\":\"togethercomputer/detect_agent\"}"},{"ref":"E14","kind":"event","title":"togethercomputer/together-sandbox together-sandbox-workspace-v1.12.0","date":"2026-05-29T08:38:53+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-sandbox/releases/tag/together-sandbox-workspace-v1.12.0","signal_url":"https://onlylabs.fyi/signals/d1eac417-81a5-4ad3-a773-2d5afb577f5a","signal_json_url":"https://onlylabs.fyi/signals/d1eac417-81a5-4ad3-a773-2d5afb577f5a/signal.json","text":"release · togethercomputer/together-sandbox together-sandbox-workspace-v1.12.0 · signal_desk=releases · occurred_at=2026-05-29T08:38:53+00:00 · url=https://github.com/togethercomputer/together-sandbox/releases/tag/together-sandbox-workspace-v1.12.0 · raw={\"repo\":\"togethercomputer/together-sandbox\"}"},{"ref":"E15","kind":"event","title":"togethercomputer/archipelago","date":"2026-05-29T00:15:21+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/archipelago","signal_url":"https://onlylabs.fyi/signals/0413a558-9c7b-4220-bf92-34bcbce16a0f","signal_json_url":"https://onlylabs.fyi/signals/0413a558-9c7b-4220-bf92-34bcbce16a0f/signal.json","text":"repo_new · togethercomputer/archipelago · signal_desk=repos · occurred_at=2026-05-29T00:15:21+00:00 · url=https://github.com/togethercomputer/archipelago · raw={\"repo\":\"togethercomputer/archipelago\",\"description\":\"Archipelago: eval framework for AI agents on professional services tasks\",\"language\":\"Python\"}"},{"ref":"E16","kind":"event","title":"How Together AI built the world’s fastest speech-to-text stack","date":"2026-05-29T00:00:00+00:00","date_source":"rss.item_date","source_url":"https://www.together.ai/blog/how-together-ai-built-the-worlds-fastest-speech-to-text-stack","signal_url":"https://onlylabs.fyi/signals/56ba412f-f785-4495-a0c4-bec800f64fd3","signal_json_url":"https://onlylabs.fyi/signals/56ba412f-f785-4495-a0c4-bec800f64fd3/signal.json","text":"post_published · How Together AI built the world’s fastest speech-to-text stack · signal_desk=talking · occurred_at=2026-05-29T00:00:00+00:00 · url=https://www.together.ai/blog/how-together-ai-built-the-worlds-fastest-speech-to-text-stack · raw={\"excerpt\":\"Together AI built the fastest speech-to-text stack on Artificial Analysis by treating ASR as a full-path systems problem, not just a GPU inference problem.\"}"},{"ref":"E17","kind":"event","title":"Customer Support Engineer (Inference)","date":"2026-05-28T17:27:04+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5147747007","signal_url":"https://onlylabs.fyi/signals/1a44a1f8-276f-4c6c-b599-3c97553fbaf0","signal_json_url":"https://onlylabs.fyi/signals/1a44a1f8-276f-4c6c-b599-3c97553fbaf0/signal.json","text":"job_opened · Customer Support Engineer (Inference) · signal_desk=hiring · occurred_at=2026-05-28T17:27:04+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5147747007 · raw={\"location\":\"San Francisco, CA\",\"ats\":\"greenhouse\"}"},{"ref":"E18","kind":"event","title":"Senior Technical Recruiter, AI/ML Research","date":"2026-05-28T17:08:56+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5135941007","signal_url":"https://onlylabs.fyi/signals/b1ff86c8-39c4-4964-8875-53dce4af32d7","signal_json_url":"https://onlylabs.fyi/signals/b1ff86c8-39c4-4964-8875-53dce4af32d7/signal.json","text":"job_opened · Senior Technical Recruiter, AI/ML Research · signal_desk=hiring · occurred_at=2026-05-28T17:08:56+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5135941007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E19","kind":"event","title":"Director, Data Center Strategy and Site Selection","date":"2026-05-28T16:37:08+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/4949454007","signal_url":"https://onlylabs.fyi/signals/db85f7f5-bbe1-4c94-be2b-47737f1a0b87","signal_json_url":"https://onlylabs.fyi/signals/db85f7f5-bbe1-4c94-be2b-47737f1a0b87/signal.json","text":"job_opened · Director, Data Center Strategy and Site Selection · signal_desk=hiring · occurred_at=2026-05-28T16:37:08+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/4949454007 · raw={\"location\":\"Remote\",\"ats\":\"greenhouse\"}"},{"ref":"E20","kind":"event","title":"togethercomputer/together-sandbox together-sandbox-workspace-v1.11.0","date":"2026-05-28T09:01:08+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-sandbox/releases/tag/together-sandbox-workspace-v1.11.0","signal_url":"https://onlylabs.fyi/signals/b50366e3-3cfd-44f0-8336-a160cf3ec0e2","signal_json_url":"https://onlylabs.fyi/signals/b50366e3-3cfd-44f0-8336-a160cf3ec0e2/signal.json","text":"release · togethercomputer/together-sandbox together-sandbox-workspace-v1.11.0 · signal_desk=releases · occurred_at=2026-05-28T09:01:08+00:00 · url=https://github.com/togethercomputer/together-sandbox/releases/tag/together-sandbox-workspace-v1.11.0 · raw={\"repo\":\"togethercomputer/together-sandbox\"}"},{"ref":"E21","kind":"event","title":"togethercomputer/together-sandbox together-sandbox-v1.10.0","date":"2026-05-26T12:56:48+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-sandbox/releases/tag/together-sandbox-v1.10.0","signal_url":"https://onlylabs.fyi/signals/d61f6fa1-6e62-456c-80e1-2e7e00598331","signal_json_url":"https://onlylabs.fyi/signals/d61f6fa1-6e62-456c-80e1-2e7e00598331/signal.json","text":"release · togethercomputer/together-sandbox together-sandbox-v1.10.0 · signal_desk=releases · occurred_at=2026-05-26T12:56:48+00:00 · url=https://github.com/togethercomputer/together-sandbox/releases/tag/together-sandbox-v1.10.0 · raw={\"repo\":\"togethercomputer/together-sandbox\"}"},{"ref":"E22","kind":"event","title":"togethercomputer/together-sandbox together-sandbox-v1.9.0","date":"2026-05-25T23:52:17+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-sandbox/releases/tag/together-sandbox-v1.9.0","signal_url":"https://onlylabs.fyi/signals/c11d4202-5922-413d-8c8d-987b3745816b","signal_json_url":"https://onlylabs.fyi/signals/c11d4202-5922-413d-8c8d-987b3745816b/signal.json","text":"release · togethercomputer/together-sandbox together-sandbox-v1.9.0 · signal_desk=releases · occurred_at=2026-05-25T23:52:17+00:00 · url=https://github.com/togethercomputer/together-sandbox/releases/tag/together-sandbox-v1.9.0 · raw={\"repo\":\"togethercomputer/together-sandbox\"}"},{"ref":"E23","kind":"event","title":"togethercomputer/together-typescript v0.41.1","date":"2026-05-23T15:19:39+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-typescript/releases/tag/v0.41.1","signal_url":"https://onlylabs.fyi/signals/e66d1c28-1b2d-49d4-88de-62cc2539a87a","signal_json_url":"https://onlylabs.fyi/signals/e66d1c28-1b2d-49d4-88de-62cc2539a87a/signal.json","text":"release · togethercomputer/together-typescript v0.41.1 · signal_desk=releases · occurred_at=2026-05-23T15:19:39+00:00 · url=https://github.com/togethercomputer/together-typescript/releases/tag/v0.41.1 · raw={\"repo\":\"togethercomputer/together-typescript\"}"},{"ref":"E24","kind":"event","title":"togethercomputer/together-py v2.16.0","date":"2026-05-23T15:19:11+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-py/releases/tag/v2.16.0","signal_url":"https://onlylabs.fyi/signals/b1c8a173-95cf-485b-a33a-85573a7b7c43","signal_json_url":"https://onlylabs.fyi/signals/b1c8a173-95cf-485b-a33a-85573a7b7c43/signal.json","text":"release · togethercomputer/together-py v2.16.0 · signal_desk=releases · occurred_at=2026-05-23T15:19:11+00:00 · url=https://github.com/togethercomputer/together-py/releases/tag/v2.16.0 · raw={\"repo\":\"togethercomputer/together-py\"}"},{"ref":"E25","kind":"event","title":"Staff Engineer, API Core Platform","date":"2026-05-22T03:43:37+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5056185007","signal_url":"https://onlylabs.fyi/signals/79e40c26-beea-47fa-8be8-87137096d758","signal_json_url":"https://onlylabs.fyi/signals/79e40c26-beea-47fa-8be8-87137096d758/signal.json","text":"job_opened · Staff Engineer, API Core Platform · signal_desk=hiring · occurred_at=2026-05-22T03:43:37+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5056185007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E26","kind":"event","title":"Junior Technical Program Manager — Infrastructure Operations","date":"2026-05-20T20:53:51+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5139762007","signal_url":"https://onlylabs.fyi/signals/1eb4a445-fc2b-416a-a787-41e1b5750ca7","signal_json_url":"https://onlylabs.fyi/signals/1eb4a445-fc2b-416a-a787-41e1b5750ca7/signal.json","text":"job_opened · Junior Technical Program Manager — Infrastructure Operations · signal_desk=hiring · occurred_at=2026-05-20T20:53:51+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5139762007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E27","kind":"event","title":"togethercomputer/together-go v0.10.0","date":"2026-05-20T19:03:09+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-go/releases/tag/v0.10.0","signal_url":"https://onlylabs.fyi/signals/3e20c07d-e5a6-4119-95d4-9201795f5a1b","signal_json_url":"https://onlylabs.fyi/signals/3e20c07d-e5a6-4119-95d4-9201795f5a1b/signal.json","text":"release · togethercomputer/together-go v0.10.0 · signal_desk=releases · occurred_at=2026-05-20T19:03:09+00:00 · url=https://github.com/togethercomputer/together-go/releases/tag/v0.10.0 · raw={\"repo\":\"togethercomputer/together-go\"}"},{"ref":"E28","kind":"event","title":"togethercomputer/together-typescript v0.41.0","date":"2026-05-20T18:59:03+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-typescript/releases/tag/v0.41.0","signal_url":"https://onlylabs.fyi/signals/303c34f4-2bbb-455a-b4ad-cbcbae127de7","signal_json_url":"https://onlylabs.fyi/signals/303c34f4-2bbb-455a-b4ad-cbcbae127de7/signal.json","text":"release · togethercomputer/together-typescript v0.41.0 · signal_desk=releases · occurred_at=2026-05-20T18:59:03+00:00 · url=https://github.com/togethercomputer/together-typescript/releases/tag/v0.41.0 · raw={\"repo\":\"togethercomputer/together-typescript\"}"},{"ref":"E29","kind":"event","title":"togethercomputer/together-py v2.15.0","date":"2026-05-20T18:58:05+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-py/releases/tag/v2.15.0","signal_url":"https://onlylabs.fyi/signals/3e68fdfc-89d5-4173-b53b-d88bbba2cda6","signal_json_url":"https://onlylabs.fyi/signals/3e68fdfc-89d5-4173-b53b-d88bbba2cda6/signal.json","text":"release · togethercomputer/together-py v2.15.0 · signal_desk=releases · occurred_at=2026-05-20T18:58:05+00:00 · url=https://github.com/togethercomputer/together-py/releases/tag/v2.15.0 · raw={\"repo\":\"togethercomputer/together-py\"}"},{"ref":"E30","kind":"event","title":"Staff Platform Engineer, Voice AI","date":"2026-05-19T19:32:06+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5142176007","signal_url":"https://onlylabs.fyi/signals/94ec0b40-e767-4912-aca2-00df17d622b3","signal_json_url":"https://onlylabs.fyi/signals/94ec0b40-e767-4912-aca2-00df17d622b3/signal.json","text":"job_opened · Staff Platform Engineer, Voice AI · signal_desk=hiring · occurred_at=2026-05-19T19:32:06+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5142176007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E31","kind":"event","title":"Staff Machine Learning Engineer, Voice AI ","date":"2026-05-19T18:19:46+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5140763007","signal_url":"https://onlylabs.fyi/signals/6c4a6446-97e3-4d63-8e3c-0027403c5056","signal_json_url":"https://onlylabs.fyi/signals/6c4a6446-97e3-4d63-8e3c-0027403c5056/signal.json","text":"job_opened · Staff Machine Learning Engineer, Voice AI  · signal_desk=hiring · occurred_at=2026-05-19T18:19:46+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5140763007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E32","kind":"event","title":"Benchmarking inference at scale: coding agents","date":"2026-05-19T00:00:00+00:00","date_source":"rss.item_date","source_url":"https://www.together.ai/blog/coding-agent-benchmarks","signal_url":"https://onlylabs.fyi/signals/3c08a1c0-235e-42b0-b347-d52e39d12ee1","signal_json_url":"https://onlylabs.fyi/signals/3c08a1c0-235e-42b0-b347-d52e39d12ee1/signal.json","text":"post_published · Benchmarking inference at scale: coding agents · signal_desk=talking · occurred_at=2026-05-19T00:00:00+00:00 · url=https://www.together.ai/blog/coding-agent-benchmarks · raw={\"excerpt\":\"Real-world inference benchmarks for coding agents: 31% more TPS than TensorRT-LLM, 2× better TTFT at saturation, and 76% lower cost than Claude Opus 4.6.\"}"},{"ref":"E33","kind":"event","title":"Customer Support Engineer (GPU Cluster), India ","date":"2026-05-15T20:46:21+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/4840844007","signal_url":"https://onlylabs.fyi/signals/77ce4f10-0d50-4628-afd8-a8792e948e0f","signal_json_url":"https://onlylabs.fyi/signals/77ce4f10-0d50-4628-afd8-a8792e948e0f/signal.json","text":"job_opened · Customer Support Engineer (GPU Cluster), India  · signal_desk=hiring · occurred_at=2026-05-15T20:46:21+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/4840844007 · raw={\"location\":\"India\",\"ats\":\"greenhouse\"}"},{"ref":"E34","kind":"event","title":"AI Infrastructure Engineer","date":"2026-05-15T03:06:33+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5138540007","signal_url":"https://onlylabs.fyi/signals/66cc7094-4645-4c25-8b3c-b7d70e29ba17","signal_json_url":"https://onlylabs.fyi/signals/66cc7094-4645-4c25-8b3c-b7d70e29ba17/signal.json","text":"job_opened · AI Infrastructure Engineer · signal_desk=hiring · occurred_at=2026-05-15T03:06:33+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5138540007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E35","kind":"event","title":"Together AI and Pearl Research Labs Team Up to Reduce the Cost of AI Inference","date":"2026-05-15T00:00:00+00:00","date_source":"rss.item_date","source_url":"https://www.together.ai/blog/together-ai-partners-with-pearl-research-labs","signal_url":"https://onlylabs.fyi/signals/49734867-446a-4524-963f-4812d706b5eb","signal_json_url":"https://onlylabs.fyi/signals/49734867-446a-4524-963f-4812d706b5eb/signal.json","text":"post_published · Together AI and Pearl Research Labs Team Up to Reduce the Cost of AI Inference · signal_desk=talking · occurred_at=2026-05-15T00:00:00+00:00 · url=https://www.together.ai/blog/together-ai-partners-with-pearl-research-labs · raw={\"excerpt\":\"Together AI partners with Pearl Research Labs to launch a discounted Pearl-powered inference endpoint for Gemma-4-31B-it-pearl, using Proof of Useful Work to turn AI workloads into crypto emissions.\"}"},{"ref":"E36","kind":"event","title":"Violin: An open-source video translation skill that breaks language barriers","date":"2026-05-14T00:00:00+00:00","date_source":"rss.item_date","source_url":"https://www.together.ai/blog/violin-open-source-translation-skill","signal_url":"https://onlylabs.fyi/signals/558e6d06-9f96-454a-a3bf-e34988a0e832","signal_json_url":"https://onlylabs.fyi/signals/558e6d06-9f96-454a-a3bf-e34988a0e832/signal.json","text":"post_published · Violin: An open-source video translation skill that breaks language barriers · signal_desk=talking · occurred_at=2026-05-14T00:00:00+00:00 · url=https://www.together.ai/blog/violin-open-source-translation-skill · raw={\"excerpt\":\"Violin is an open-source AI video translation tool that combines speech recognition, LLM translation, and text-to-speech to make video content accessible across languages.\"}"},{"ref":"E37","kind":"event","title":"Infrastructure Design Engineer","date":"2026-05-13T22:11:16+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5135876007","signal_url":"https://onlylabs.fyi/signals/79bc42b2-57aa-442a-87ff-df65bf1d0ee0","signal_json_url":"https://onlylabs.fyi/signals/79bc42b2-57aa-442a-87ff-df65bf1d0ee0/signal.json","text":"job_opened · Infrastructure Design Engineer · signal_desk=hiring · occurred_at=2026-05-13T22:11:16+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5135876007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E38","kind":"event","title":"togethercomputer/SearchScales","date":"2026-05-13T21:44:06+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/SearchScales","signal_url":"https://onlylabs.fyi/signals/0854485e-65c6-489c-b097-9d8051e24014","signal_json_url":"https://onlylabs.fyi/signals/0854485e-65c6-489c-b097-9d8051e24014/signal.json","text":"repo_new · togethercomputer/SearchScales · signal_desk=repos · occurred_at=2026-05-13T21:44:06+00:00 · url=https://github.com/togethercomputer/SearchScales · raw={\"repo\":\"togethercomputer/SearchScales\"}"},{"ref":"E39","kind":"event","title":"Sr. Revenue Accountant","date":"2026-05-13T19:36:33+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5135637007","signal_url":"https://onlylabs.fyi/signals/9d4d629f-c3e5-435d-9709-730f3d8bf309","signal_json_url":"https://onlylabs.fyi/signals/9d4d629f-c3e5-435d-9709-730f3d8bf309/signal.json","text":"job_opened · Sr. Revenue Accountant · signal_desk=hiring · occurred_at=2026-05-13T19:36:33+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5135637007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E40","kind":"event","title":"Infrastructure Accounting Manager","date":"2026-05-13T19:36:00+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5134279007","signal_url":"https://onlylabs.fyi/signals/0cedd640-2f57-4fec-89ab-6495493e26b3","signal_json_url":"https://onlylabs.fyi/signals/0cedd640-2f57-4fec-89ab-6495493e26b3/signal.json","text":"job_opened · Infrastructure Accounting Manager · signal_desk=hiring · occurred_at=2026-05-13T19:36:00+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5134279007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E41","kind":"event","title":"Data Warehouse Engineer","date":"2026-05-13T17:52:35+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5074064007","signal_url":"https://onlylabs.fyi/signals/d6fde7bc-969d-4bd3-853c-65830e37beca","signal_json_url":"https://onlylabs.fyi/signals/d6fde7bc-969d-4bd3-853c-65830e37beca/signal.json","text":"job_opened · Data Warehouse Engineer · signal_desk=hiring · occurred_at=2026-05-13T17:52:35+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5074064007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E42","kind":"event","title":"Analytics Engineer — Data Warehouse","date":"2026-05-13T17:52:17+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5101651007","signal_url":"https://onlylabs.fyi/signals/ab917d66-fdb1-45ed-ace0-7c7a7333a24f","signal_json_url":"https://onlylabs.fyi/signals/ab917d66-fdb1-45ed-ace0-7c7a7333a24f/signal.json","text":"job_opened · Analytics Engineer — Data Warehouse · signal_desk=hiring · occurred_at=2026-05-13T17:52:17+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5101651007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E43","kind":"event","title":"togethercomputer/together-py v2.14.0","date":"2026-05-12T19:23:25+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-py/releases/tag/v2.14.0","signal_url":"https://onlylabs.fyi/signals/dc7c3332-3ed7-440e-8340-4c1528e6c594","signal_json_url":"https://onlylabs.fyi/signals/dc7c3332-3ed7-440e-8340-4c1528e6c594/signal.json","text":"release · togethercomputer/together-py v2.14.0 · signal_desk=releases · occurred_at=2026-05-12T19:23:25+00:00 · url=https://github.com/togethercomputer/together-py/releases/tag/v2.14.0 · raw={\"repo\":\"togethercomputer/together-py\"}"},{"ref":"E44","kind":"event","title":"Introducing voice finder — a new tool to quickly find the right voice for your app from over 600+ voices ","date":"2026-05-12T00:00:00+00:00","date_source":"rss.item_date","source_url":"https://www.together.ai/blog/introducing-voice-finder-a-new-tool-to-quickly-find-the-right-voice-for-your-app-from-over-600-voices","signal_url":"https://onlylabs.fyi/signals/eb4dd7b9-04a8-47e9-afa1-ca27b235f938","signal_json_url":"https://onlylabs.fyi/signals/eb4dd7b9-04a8-47e9-afa1-ca27b235f938/signal.json","text":"post_published · Introducing voice finder — a new tool to quickly find the right voice for your app from over 600+ voices  · signal_desk=talking · occurred_at=2026-05-12T00:00:00+00:00 · url=https://www.together.ai/blog/introducing-voice-finder-a-new-tool-to-quickly-find-the-right-voice-for-your-app-from-over-600-voices · raw={\"excerpt\":\"Voice finder helps developers search, match, filter, and audition 600+ voices across Together AI TTS models using natural-language prompts or uploaded audio samples.\"}"},{"ref":"E45","kind":"event","title":"togethercomputer/together-py v2.13.0","date":"2026-05-11T23:48:35+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-py/releases/tag/v2.13.0","signal_url":"https://onlylabs.fyi/signals/97e83577-0c45-4614-ad7b-0c72548f4636","signal_json_url":"https://onlylabs.fyi/signals/97e83577-0c45-4614-ad7b-0c72548f4636/signal.json","text":"release · togethercomputer/together-py v2.13.0 · signal_desk=releases · occurred_at=2026-05-11T23:48:35+00:00 · url=https://github.com/togethercomputer/together-py/releases/tag/v2.13.0 · raw={\"repo\":\"togethercomputer/together-py\"}"},{"ref":"E46","kind":"event","title":"Senior Technical Recruiter","date":"2026-05-11T20:39:43+00:00","date_source":"greenhouse.updated_at","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/4961121007","signal_url":"https://onlylabs.fyi/signals/64deacd3-7b08-4122-9bc1-ff26aa6d1851","signal_json_url":"https://onlylabs.fyi/signals/64deacd3-7b08-4122-9bc1-ff26aa6d1851/signal.json","text":"job_opened · Senior Technical Recruiter · signal_desk=hiring · occurred_at=2026-05-11T20:39:43+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/4961121007 · raw={\"location\":\"San Francisco\",\"ats\":\"greenhouse\"}"},{"ref":"E47","kind":"event","title":"togethercomputer/together-typescript v0.40.0","date":"2026-05-11T14:54:43+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-typescript/releases/tag/v0.40.0","signal_url":"https://onlylabs.fyi/signals/9a216147-3f2c-4d9c-9a9f-f5308b13f926","signal_json_url":"https://onlylabs.fyi/signals/9a216147-3f2c-4d9c-9a9f-f5308b13f926/signal.json","text":"release · togethercomputer/together-typescript v0.40.0 · signal_desk=releases · occurred_at=2026-05-11T14:54:43+00:00 · url=https://github.com/togethercomputer/together-typescript/releases/tag/v0.40.0 · raw={\"repo\":\"togethercomputer/together-typescript\"}"},{"ref":"E48","kind":"event","title":"Engineering Manager / Tech Lead ","date":"2026-05-11T09:35:54+00:00","date_source":"source","source_url":"https://job-boards.greenhouse.io/togetherai/jobs/5062462007","signal_url":"https://onlylabs.fyi/signals/ea1f05f3-1f5f-4e7c-9d97-ae24781920b6","signal_json_url":"https://onlylabs.fyi/signals/ea1f05f3-1f5f-4e7c-9d97-ae24781920b6/signal.json","text":"job_opened · Engineering Manager / Tech Lead  · signal_desk=hiring · occurred_at=2026-05-11T09:35:54+00:00 · url=https://job-boards.greenhouse.io/togetherai/jobs/5062462007 · raw={\"location\":\"Amsterdam\",\"ats\":\"greenhouse\"}"},{"ref":"E49","kind":"event","title":"Serving DeepSeek-V4: why million-token context is an inference systems problem","date":"2026-05-11T00:00:00+00:00","date_source":"rss.item_date","source_url":"https://www.together.ai/blog/serving-deepseek-v4-why-million-token-context-is-an-inference-systems-problem","signal_url":"https://onlylabs.fyi/signals/acc3bbe8-7204-4369-9fab-77561527ceef","signal_json_url":"https://onlylabs.fyi/signals/acc3bbe8-7204-4369-9fab-77561527ceef/signal.json","text":"post_published · Serving DeepSeek-V4: why million-token context is an inference systems problem · signal_desk=talking · occurred_at=2026-05-11T00:00:00+00:00 · url=https://www.together.ai/blog/serving-deepseek-v4-why-million-token-context-is-an-inference-systems-problem · raw={\"excerpt\":\"DeepSeek-V4 makes million-token context a serving-systems problem. Together AI explores the inference work behind V4 on NVIDIA HGX B200, including compressed KV layouts, prefix caching, kernel maturity, and endpoint profiles for long-context workloads.\"}"},{"ref":"E50","kind":"event","title":"Deploy and inference any model from HuggingFace","date":"2026-05-08T00:00:00+00:00","date_source":"rss.item_date","source_url":"https://www.together.ai/blog/deploy-and-inference-any-model-from-huggingface","signal_url":"https://onlylabs.fyi/signals/bf48f114-a6a8-4fc5-a087-2bc7d861230d","signal_json_url":"https://onlylabs.fyi/signals/bf48f114-a6a8-4fc5-a087-2bc7d861230d/signal.json","text":"post_published · Deploy and inference any model from HuggingFace · signal_desk=talking · occurred_at=2026-05-08T00:00:00+00:00 · url=https://www.together.ai/blog/deploy-and-inference-any-model-from-huggingface · raw={\"excerpt\":\"Learn how to deploy any Hugging Face model in one session using Goose and Together's Dedicated Container Inference. Skip the setup complexity — one prompt gets your model running in a production-grade GPU environment on release day.\"}"},{"ref":"E51","kind":"event","title":"Parcae: Doing more with fewer parameters using stable looped models","date":"2026-04-15T00:00:00+00:00","date_source":"rss.item_date","source_url":"https://www.together.ai/blog/parcae","signal_url":"https://onlylabs.fyi/signals/4b6752d0-efd3-40d9-a892-c03fc06f5133","signal_json_url":"https://onlylabs.fyi/signals/4b6752d0-efd3-40d9-a892-c03fc06f5133/signal.json","text":"post_published · Parcae: Doing more with fewer parameters using stable looped models · signal_desk=talking · occurred_at=2026-04-15T00:00:00+00:00 · url=https://www.together.ai/blog/parcae · hn=2 points/0 comments · raw={\"excerpt\":\"Parcae is a stable looped language model that matches the quality of a Transformer twice its size — a 770M model reaching 1.3B-level performance. We introduce the first scaling laws for looping and show that increasing recurrence, not just data, is a compute-efficient path to bet\"}"},{"ref":"E52","kind":"event","title":"Foundational research powering efficient inference at scale","date":"2026-05-04T00:00:00+00:00","date_source":"rss.item_date","source_url":"https://www.together.ai/blog/foundational-research-powering-efficient-inference-at-scale","signal_url":"https://onlylabs.fyi/signals/0db265d4-c781-4c15-8385-f324440ef8c1","signal_json_url":"https://onlylabs.fyi/signals/0db265d4-c781-4c15-8385-f324440ef8c1/signal.json","text":"post_published · Foundational research powering efficient inference at scale · signal_desk=talking · occurred_at=2026-05-04T00:00:00+00:00 · url=https://www.together.ai/blog/foundational-research-powering-efficient-inference-at-scale · raw={\"excerpt\":\"As AI moves from research to production, the challenge for AI-native teams shifts from building models to running them — efficiently, reliably, and at scale.\"}"},{"ref":"E53","kind":"event","title":"togethercomputer/k8s-netperf","date":"2026-04-27T23:08:39+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/k8s-netperf","signal_url":"https://onlylabs.fyi/signals/e6175547-cf8b-4774-9c10-ad0876c3f14e","signal_json_url":"https://onlylabs.fyi/signals/e6175547-cf8b-4774-9c10-ad0876c3f14e/signal.json","text":"repo_forked · togethercomputer/k8s-netperf · signal_desk=forks · occurred_at=2026-04-27T23:08:39+00:00 · url=https://github.com/togethercomputer/k8s-netperf · raw={\"repo\":\"togethercomputer/k8s-netperf\",\"parent\":\"cloud-bulldozer/k8s-netperf\"}"},{"ref":"E54","kind":"event","title":"togethercomputer/DeepGEMM","date":"2026-04-24T20:40:07+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/DeepGEMM","signal_url":"https://onlylabs.fyi/signals/e1ad880b-e8c2-4eed-96b9-9286706c6932","signal_json_url":"https://onlylabs.fyi/signals/e1ad880b-e8c2-4eed-96b9-9286706c6932/signal.json","text":"repo_forked · togethercomputer/DeepGEMM · signal_desk=forks · occurred_at=2026-04-24T20:40:07+00:00 · url=https://github.com/togethercomputer/DeepGEMM · raw={\"repo\":\"togethercomputer/DeepGEMM\",\"parent\":\"deepseek-ai/DeepGEMM\"}"},{"ref":"E55","kind":"event","title":"togethercomputer/sglang-mla-rotation","date":"2026-04-08T00:10:20+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/sglang-mla-rotation","signal_url":"https://onlylabs.fyi/signals/8cd07691-5f2d-4997-abfe-64a635021377","signal_json_url":"https://onlylabs.fyi/signals/8cd07691-5f2d-4997-abfe-64a635021377/signal.json","text":"repo_forked · togethercomputer/sglang-mla-rotation · signal_desk=forks · occurred_at=2026-04-08T00:10:20+00:00 · url=https://github.com/togethercomputer/sglang-mla-rotation · raw={\"repo\":\"togethercomputer/sglang-mla-rotation\",\"parent\":\"sgl-project/sglang\"}"},{"ref":"E56","kind":"event","title":"togethercomputer/Mooncake","date":"2026-03-19T17:53:41+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/Mooncake","signal_url":"https://onlylabs.fyi/signals/f3addf6d-624e-4096-a3f3-7fb62dbfb766","signal_json_url":"https://onlylabs.fyi/signals/f3addf6d-624e-4096-a3f3-7fb62dbfb766/signal.json","text":"repo_forked · togethercomputer/Mooncake · signal_desk=forks · occurred_at=2026-03-19T17:53:41+00:00 · url=https://github.com/togethercomputer/Mooncake · raw={\"repo\":\"togethercomputer/Mooncake\",\"parent\":\"kvcache-ai/Mooncake\"}"},{"ref":"E57","kind":"event","title":"togethercomputer/ssd","date":"2026-03-18T16:20:10+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/ssd","signal_url":"https://onlylabs.fyi/signals/48657822-5db2-498c-9a30-dfd49e1da6a6","signal_json_url":"https://onlylabs.fyi/signals/48657822-5db2-498c-9a30-dfd49e1da6a6/signal.json","text":"repo_forked · togethercomputer/ssd · signal_desk=forks · occurred_at=2026-03-18T16:20:10+00:00 · url=https://github.com/togethercomputer/ssd · stars=2 · raw={\"repo\":\"togethercomputer/ssd\",\"parent\":\"tanishqkumar/ssd\"}"},{"ref":"E58","kind":"event","title":"togethercomputer/together-VeOmni","date":"2026-03-09T23:27:09+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-VeOmni","signal_url":"https://onlylabs.fyi/signals/d61be9ce-07c9-45e1-9035-a9f73a4850ed","signal_json_url":"https://onlylabs.fyi/signals/d61be9ce-07c9-45e1-9035-a9f73a4850ed/signal.json","text":"repo_forked · togethercomputer/together-VeOmni · signal_desk=forks · occurred_at=2026-03-09T23:27:09+00:00 · url=https://github.com/togethercomputer/together-VeOmni · raw={\"repo\":\"togethercomputer/together-VeOmni\",\"parent\":\"ByteDance-Seed/VeOmni\"}"},{"ref":"E59","kind":"event","title":"togethercomputer/together-nccl-tests","date":"2026-03-05T23:15:35+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-nccl-tests","signal_url":"https://onlylabs.fyi/signals/c3ed06e0-93d7-4d98-b1de-eab75be0a067","signal_json_url":"https://onlylabs.fyi/signals/c3ed06e0-93d7-4d98-b1de-eab75be0a067/signal.json","text":"repo_forked · togethercomputer/together-nccl-tests · signal_desk=forks · occurred_at=2026-03-05T23:15:35+00:00 · url=https://github.com/togethercomputer/together-nccl-tests · raw={\"repo\":\"togethercomputer/together-nccl-tests\",\"parent\":\"NVIDIA/nccl-tests\"}"},{"ref":"E60","kind":"event","title":"togethercomputer/together-dgxc-benchmarking","date":"2026-03-05T23:02:19+00:00","date_source":"source","source_url":"https://github.com/togethercomputer/together-dgxc-benchmarking","signal_url":"https://onlylabs.fyi/signals/0f04e0f7-382a-4784-93c4-aa46dc61188c","signal_json_url":"https://onlylabs.fyi/signals/0f04e0f7-382a-4784-93c4-aa46dc61188c/signal.json","text":"repo_forked · togethercomputer/together-dgxc-benchmarking · signal_desk=forks · occurred_at=2026-03-05T23:02:19+00:00 · url=https://github.com/togethercomputer/together-dgxc-benchmarking · raw={\"repo\":\"togethercomputer/together-dgxc-benchmarking\",\"parent\":\"NVIDIA/dgxc-benchmarking\"}"}]}