LG-AI-EXAONE/EXAONE-3.5
Captured source
source ↗LG-AI-EXAONE/EXAONE-3.5
Description: Official repository for EXAONE 3.5 built by LG AI Research
License: NOASSERTION
Stars: 208
Forks: 23
Open issues: 7
Created: 2024-12-01T11:15:28Z
Pushed: 2024-12-16T08:19:52Z
Default branch: main
Fork: no
Archived: no
README:
EXAONE 3.5
🤗 Hugging Face   |   📝 Blog   |   📑 Technical Report
Introduction
We introduce EXAONE 3.5, a collection of instruction-tuned bilingual (English and Korean) generative models ranging from 2.4B to 32B parameters, developed and released by LG AI Research. EXAONE 3.5 language models include: 1) 2.4B model optimized for deployment on small or resource-constrained devices, 2) 7.8B model matching the size of its predecessor but offering improved performance, and 3) 32B model delivering powerful performance. All models support long-context processing of up to 32K tokens. Each model demonstrates state-of-the-art performance in real-world use cases and long-context understanding, while remaining competitive in general domains compared to recently released models of similar sizes.
Our documentation consists of the following sections:
- [Performance](#performance): Experimental results of EXAONE 3.5 models.
- [Quickstart](#quickstart): A basic guide to using EXAONE 3.5 models with Transformers.
- [Quantized Models](#quantized-models): An explanation of quantized EXAONE 3.5 weights in
AWQandGGUFformat. - [Run Locally](#run-locally): A guide to running EXAONE 3.5 models locally with
llama.cppandOllamaframeworks. - [Deployment](#deployment): A guide to running EXAONE 3.5 models with
TensorRT-LLM,vLLM, andSGLangdeployment frameworks.
News
- 2024.12.11: EXAONE 3.5 is now avaiable on Ollama model library.
You can now install AutoAWQ library via pip without using the git repository.
- 2024.12.10: We update the EXAONE Modelfile for Ollama. Please use the new one.
- 2024.12.09: We release the EXAONE 3.5 language model series including 2.4B, 7.8B, and 32B instruction-tuned models. Check out the 📑 Technical Report!
Performance
Some experimental results are shown below. The full evaluation results can be found in the Technical Report.
Models MT-Bench LiveBench Arena-Hard AlpacaEval IFEval KoMT-Bench[1] LogicKor
EXAONE 3.5 32B 8.51 43.0 78.6 60.6 81.7 8.05 9.06
Qwen 2.5 32B 8.49 50.6 67.0 41.0 78.7 7.75 8.89
C4AI Command R 32B 7.38 29.7 17.0 25.9 26.1 6.72 8.24
Gemma 2 27B 8.28 40.0 57.5 52.2 59.7 7.19 8.56
Yi 1.5 34B 7.64 26.2 23.1 34.8 55.5 4.88 6.33
EXAONE 3.5 7.8B 8.29 39.8 68.7 54.2 78.9 7.96 9.08
Qwen 2.5 7B 6.48 35.6 48.9 31.7 72.5 5.19 6.38
Llama 3.1 8B 7.59 28.3 27.7 25.7 74.5 4.85 5.99
Gemma 2 9B 7.64 32.1 43.6 47.3 54.7 7.10 8.05
Phi 3 small (7B) 7.63 27.9 26.8 29.2 59.5 3.22 3.99
EXAONE 3.5 2.4B 7.81 33.0 48.2 37.1 73.6 7.24 8.51
Qwen 2.5 3B 7.21 25.7 26.4 17.4 60.8 5.68 5.21
Qwen 2.5 1.5B 5.72 19.2 10.6 8.4 40.7 3.87 3.60
Llama 3.2 3B 6.94 24.0 14.2 18.7 70.1 3.16 2.86
Gemma 2 2B 7.20 20.0 19.1 29.1 50.5 4.83 5.29
- [1] KoMT-Bench is a dataset created by translating MT-Bench into Korean; see README for more details.
Quickstart
- You need to install
transformers>=4.43.0for the EXAONE 3.5 models. The Latest version is recommended to use.
Here is the example code to show how to use EXAONE 3.5 models.
> [!Tip] > In all examples below, you can use another size model by changing 7.8B to 32B or 2.4B.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)
# Choose your prompt
prompt = "Explain how wonderful you are" # English example
prompt = "스스로를 자랑해 봐" # Korean example
messages = [
{"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)
output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128,
do_sample=False,
)
print(tokenizer.decode(output[0]))> [!Note] > The EXAONE 3.5 instruction-tuned language models were trained to utilize the system prompt, > so we highly recommend using the system prompts provided in the code snippet above.
Quantized Models
We introduce a series of quantized weights of EXAONE 3.5 models.
AWQ
We provide AWQ-quantized weights of EXAONE 3.5 models, quantized using AutoAWQ library. Please refer to the AutoAWQ documentation for more details.
You need to install the latest version of AutoAWQ library (autoawq>=0.2.7.post3) to load the AWQ-quantized version of EXAONE 3.5 models.
pip install autoawq
You can load the model in similar ways to the original models, only changing the model name. It automatically loads with AWQ configuration of the model. Please check the [Quickstart section](#quickstart) above for more details.
GGUF
We provide weights in BF16 format and quantized weights in Q8_0, Q6_K, Q5_K_M, Q4_K_M, IQ4_XS.
The example below is for the 7.8B model in BF16 format. Please refer to the EXAONE 3.5 collection to find quantized models. You may need to install huggingface_hub to download the GGUF weights.
# (optional) install huggingface_hub pip install huggingface_hub # Download the GGUF weights huggingface-cli download LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct-GGUF \ --include "EXAONE-3.5-7.8B-Instruct-BF16*.gguf" \ --local-dir .
Run Locally
For end users, we introduce two ways to run EXAONE 3.5 models locally.
> [!Note] > We highly recommend to use repetition penalty not exceeding 1.0 for better generation quality.
llama.cpp
You can run EXAONE models with llama.cpp as follows:
1. Install llama.cpp. Please refer to the llama.cpp repository for more details.
2. Download EXAONE 3.5 model in GGUF format.
huggingface-cli download LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct-GGUF \ --include…
Excerpt shown — open the source for the full document.
Notability
notability 5.0/10New model release, moderate stars.