togethercomputer/flyte
forked from flyteorg/flyte
Captured source
source ↗togethercomputer/flyte
Description: Scalable and flexible workflow orchestration platform that seamlessly unifies data, ML and analytics stacks.
License: Apache-2.0
Stars: 0
Forks: 1
Open issues: 0
Created: 2025-01-28T17:22:10Z
Pushed: 2025-01-30T00:50:44Z
Default branch: master
Fork: yes
Parent repository: flyteorg/flyte
Archived: no
README:
Flyte
:building_construction: :rocket: :chart_with_upwards_trend:
Flyte is an open-source orchestrator that facilitates building production-grade data and ML pipelines. It is built for scalability and reproducibility, leveraging Kubernetes as its underlying platform. With Flyte, user teams can construct pipelines using the Python SDK, and seamlessly deploy them on both cloud and on-premises environments, enabling distributed processing and efficient resource utilization.
Build
Write code in Python or any other language and leverage a robust type engine.
Deploy & Scale
Either locally or on a remote cluster, execute your models with ease.
Get Started · Documentation · Resources
Table of contents
- [Quick start](#quick-start)
- [Tutorials](#tutorials)
- [Features](#features)
- [Who uses Flyte](#whos-using-flyte)
- [How to stay involved](#how-to-stay-involved)
- [How to contribute](#how-to-contribute)
---
Quick start
1. Install Flyte's Python SDK
pip install flytekit
2. Create a workflow (see example) 3. Run it locally with:
pyflyte run hello_world.py hello_world_wf
Ready to try a Flyte cluster?
1. Create a new sandbox cluster, running as a Docker container:
flytectl demo start
2. Now execute your workflows on the cluster:
pyflyte run --remote hello_world.py hello_world_wf
Do you want to see more but don't want to install anything?
Head over to https://sandbox.union.ai/. It allows you to experiment with Flyte's capabilities from a hosted Jupyter notebook.
Ready to productionize?
Go to the Deployment guide for instructions to install Flyte on different environments
Tutorials
- Fine-tune Code Llama on the Flyte codebase
- Forecast sales with Horovod and Spark
- Nucleotide Sequence Querying with BLASTX
Features
🚀 Strongly typed interfaces: Validate your data at every step of the workflow by defining data guardrails using Flyte types.
🌐 Any language: Write code in any language using raw containers, or choose Python, Java, Scala or JavaScript SDKs to develop your Flyte workflows.
🔒 Immutability: Immutable executions help ensure reproducibility by preventing any changes to the state of an execution.
🧬 Data lineage: Track the movement and transformation of data throughout the lifecycle of your data and ML workflows.
📊 Map tasks: Achieve parallel code execution with minimal configuration using map tasks.
🌎 Multi-tenancy: Multiple users can share the same platform while maintaining their own distinct data and configurations.
🌟 Dynamic workflows: Build flexible and adaptable workflows that can change and evolve as needed, making it easier to respond to changing requirements.
⏯️ Wait for external inputs before proceeding with the execution.
🌳 Branching: Selectively execute branches of your workflow based on static or dynamic data produced by other tasks or input data.
📈 Data visualization: Visualize data, monitor models and view training history through plots.
📂 FlyteFile & FlyteDirectory: Transfer files and directories between local and cloud storage.
🗃️ Structured dataset: Convert dataframes between types and enforce column-level type checking using the abstract 2D representation provided by Structured Dataset.
🛡️ Recover from failures: Recover only the failed tasks.
🔁 Rerun a single task: Rerun workflows at the most granular level without modifying the previous state of a data/ML workflow.
🔍 Cache outputs: Cache task outputs by passing cache=True to the task decorator.
🚩 Intra-task checkpointing: Checkpoint progress within a task execution.
⏰ Timeout: Define a timeout period, after which the task is marked as failure.
🏭 Dev to prod: As simple as changing your domain from development or staging to production.
💸 Spot or preemptible instances: Schedule your workflows on spot instances by setting interruptible to True in the task decorator.
☁️ Cloud-native deployment: Deploy Flyte on AWS, GCP, Azure and other cloud services.
📅 Scheduling: Schedule your data and ML workflows to run at a specific time.
📢 Notifications: Stay informed about changes to your workflow's state by configuring notifications through Slack, PagerDuty or email.
⌛️ Timeline view: Evaluate the duration of each of your Flyte tasks and identify potential bottlenecks.
💨 GPU acceleration: Enable and control your tasks’ GPU demands by requesting resources in the task decorator.
🐳 Dependency isolation via containers: Maintain separate sets of…
Excerpt shown — open the source for the full document.
Notability
notability 1.0/10Routine internal fork