databricks/llama-hub-llilac
forked from run-llama/llama-hub
Captured source
source ↗databricks/llama-hub-llilac
Description: A library of data loaders for LLMs made by the community -- to be used with GPT Index and/or LangChain
Language: Jupyter Notebook
License: MIT
Stars: 0
Forks: 2
Open issues: 0
Created: 2023-10-05T12:44:18Z
Pushed: 2023-10-31T14:10:47Z
Default branch: main
Fork: yes
Parent repository: run-llama/llama-hub
Archived: no
README:
LlamaHub 🦙
Original creator: Jesse Zhang (GH: emptycrown, Twitter: @thejessezhang), who courteously donated the repo to LlamaIndex!
This is a simple library of all the data loaders / readers / tools that have been created by the community. The goal is to make it extremely easy to connect large language models to a large variety of knowledge sources. These are general-purpose utilities that are meant to be used in LlamaIndex and LangChain.
Loaders and readers allow you to easily ingest data for search and retrieval by a large language model, while tools allow the models to both read and write to third party data services and sources. Ultimately, this allows you to create your own customized data agent to intelligently work with you and your data to unlock the full capability of next level large language models.
For a variety of examples of data agents, see the notebooks directory. You can find example Jupyter notebooks for creating data agents that can load and parse data from Google Docs, SQL Databases, Notion, and Slack, and also manage your Google Calendar, and Gmail inbox, or read and use OpenAPI specs.
For an easier way to browse the integrations available, check out the website here: https://llamahub.ai/.
Usage (Use llama-hub as PyPI package)
These general-purpose loaders are designed to be used as a way to load data into LlamaIndex and/or subsequently used in LangChain.
Installation
pip install llama-hub
LlamaIndex
from llama_index import GPTVectorStoreIndex
from llama_hub.google_docs import GoogleDocsReader
gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']
loader = GoogleDocsReader()
documents = loader.load_data(document_ids=gdoc_ids)
index = GPTVectorStoreIndex.from_documents(documents)
index.query('Where did the author go to school?')LlamaIndex Data Agent
from llama_index.agent import OpenAIAgent
import openai
openai.api_key = 'sk-api-key'
from llama_hub.tools.google_calendar import GoogleCalendarToolSpec
tool_spec = GoogleCalendarToolSpec()
agent = OpenAIAgent.from_tools(tool_spec.to_tool_list())
agent.chat('what is the first thing on my calendar today')
agent.chat("Please create an event for tomorrow at 4pm to review pull requests")For a variety of examples of creating and using data agents, see the notebooks directory.
LangChain
Note: Make sure you change the description of the Tool to match your use case.
from llama_index import GPTVectorStoreIndex from llama_hub.google_docs import GoogleDocsReader from langchain.llms import OpenAI from langchain.chains.question_answering import load_qa_chain # load documents gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec'] loader = GoogleDocsReader() documents = loader.load_data(document_ids=gdoc_ids) langchain_documents = [d.to_langchain_format() for d in documents] # initialize sample QA chain llm = OpenAI(temperature=0) qa_chain = load_qa_chain(llm) question="" answer = qa_chain.run(input_documents=langchain_documents, question=question)
Loader Usage (Use download_loader from LlamaIndex)
You can also use the loaders with download_loader from LlamaIndex in a single line of code.
For example, see the code snippets below using the Google Docs Loader.
from llama_index import GPTVectorStoreIndex, download_loader
GoogleDocsReader = download_loader('GoogleDocsReader')
gdoc_ids = ['1wf-y2pd9C878Oh-FmLH7Q_BQkljdm6TQal-c1pUfrec']
loader = GoogleDocsReader()
documents = loader.load_data(document_ids=gdoc_ids)
index = GPTVectorStoreIndex.from_documents(documents)
index.query('Where did the author go to school?')How to add a loader or tool
Adding a loader or tool simply requires forking this repo and making a Pull Request. The Llama Hub website will update automatically. However, please keep in mind the following guidelines when making your PR.
Step 0: Setup virtual environment, install Poetry and dependencies
Create a new Python virtual environment. The command below creates an environment in .venv, and activates it:
python -m venv .venv source .venv/bin/activate
if you are in windows, use the following to activate your virtual environment:
.venv\scripts\activate
Install poetry:
pip install poetry
Install the required dependencies (this will also install llama_index):
poetry install
This will create an editable install of llama-hub in your venv.
Step 1: Create a new directory
For loaders, create a new directory in llama_hub, and for tools create a directory in llama_hub/tools It can be nested within another, but name it something unique because the name of the directory will become the identifier for your loader (e.g. google_docs). Inside your new directory, create a __init__.py file specifying the module's public interface with __all__, a base.py file which will contain your loader implementation, and, if needed, a requirements.txt file to list the package dependencies of your loader. Those packages will automatically be installed when your loader is used, so no need to worry about that anymore!
If you'd like, you can create the new directory and files by running the following script in the llama_hub directory. Just remember to put your dependencies into a requirements.txt file.
./add_loader.sh [NAME_OF_NEW_DIRECTORY]
Step 2: Write your README
Inside your new directory, create a README.md that mirrors that of the existing ones. It should have a summary of what your loader or tool does, its inputs, and how it is used in the context of LlamaIndex and LangChain.
Step 3: Add your loader to the library.json…
Excerpt shown — open the source for the full document.