RepoCerebrasCerebraspublished Aug 16, 2024seen 5d

Cerebras/inference-examples

Python

Open original ↗

Captured source

source ↗
published Aug 16, 2024seen 5dcaptured 12hhttp 200method plain

Cerebras/inference-examples

Description: Inference examples

Language: Python

Stars: 70

Forks: 28

Open issues: 0

Created: 2024-08-16T14:45:55Z

Pushed: 2026-04-20T09:53:06Z

Default branch: main

Fork: no

Archived: no

README:

Cerebras Inference API Demos

Welcome to the Cerebras Inference API demo repository! This repository contains various examples showcasing the power of the Cerebras Wafer-Scale Engines and CS-3 systems for AI model inference.

🚀 Introduction

The Cerebras Inference API offers developers a low-latency solution for AI model inference powered by Cerebras Wafer-Scale Engines and CS-3 systems. We invite developers to explore the new possibilities that our high-speed inferencing solution unlocks.

The Cerebras Inference API provides access to models such as OpenAI's GPT-OSS, Meta's Llama family of models, and Alibaba's Qwen models. For the full details of supported models, see the supported models documentation.

📚 Resources

📁 Projects Overview

This repository contains multiple example projects, each demonstrating different capabilities of the Cerebras Inference API. Each project is located in its own folder and contains a detailed README.

![Open Val Town Template](https://www.val.town/v/stevekrouse/cerebrasTemplate)

🔗 Example Projects

  • [Getting Started with Cerebras Inference API](./getting-started/README.md)
  • Learn how to get started with the Cerebras Inference API for your AI projects.
  • [Conversational Memory with Langchain](./conversational-memory-langchain/README.md)
  • Explore how to build conversational memory for LLMs using Langchain.
  • [RAG with Pinecone + Docker](./rag-pinecone-docker/README.md)
  • Implement Retrieval-Augmented Generation (RAG) using Pinecone and Docker.
  • [RAG with Weaviate + HuggingFace](./rag-weaviate-huggingface/README.md)
  • Implement Retrieval-Augmented Generation (RAG) using Weaviate and HuggingFace.
  • [Getting Started with Cerebras + Streamlit](./cerebras-streamlit/README.md)
  • Learn how to integrate Cerebras with Streamlit to build interactive applications.
  • [AI Agentic Workflow with LlamaIndex](./ai-workflow-llamaindex/README.md)
  • Build an AI agentic workflow using LlamaIndex.
  • [AI Agentic Workflow with Langchain](./ai-workflow-langchain/README.md)
  • Build an AI agentic workflow using Langchain.
  • [Multi AI Agentic Workflow](./multi-ai-workflow/README.md)
  • Create a multi-agentic AI workflow with Langgraph and LangSmith.
  • [Payload Compression (msgpack + gzip)](./payload-compression/README.md)
  • Minimal example + benchmark showing msgpack+gzip request-body compression on /v1/chat/completions and the TTFT / E2E latency gain on a ~30k-token prompt.

---

🌟 Getting Started

To explore each project, simply navigate to the corresponding folder and follow the instructions in the README. Happy coding!

🛠️ Requirements

  • Python 3.7+
  • Docker (for RAG examples)
  • Streamlit (for Cerebras + Streamlit example)
  • Other dependencies as noted in each project’s README.

📝 License

This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.

👥 Contributors

We welcome contributions! Feel free to submit a pull request or open an issue.

---

© 2024 Cerebras Systems

Notability

notability 3.0/10

Routine repo with low traction