RepoLG AI Research (EXAONE)LG AI Research (EXAONE)published Dec 1, 2024seen 5d

LG-AI-EXAONE/EXAONE-3.5

Open original ↗

Captured source

source ↗
published Dec 1, 2024seen 5dcaptured 9hhttp 200method plain

LG-AI-EXAONE/EXAONE-3.5

Description: Official repository for EXAONE 3.5 built by LG AI Research

License: NOASSERTION

Stars: 208

Forks: 23

Open issues: 7

Created: 2024-12-01T11:15:28Z

Pushed: 2024-12-16T08:19:52Z

Default branch: main

Fork: no

Archived: no

README:

EXAONE 3.5

🤗 Hugging Face &nbsp | &nbsp 📝 Blog &nbsp | &nbsp 📑 Technical Report

Introduction

We introduce EXAONE 3.5, a collection of instruction-tuned bilingual (English and Korean) generative models ranging from 2.4B to 32B parameters, developed and released by LG AI Research. EXAONE 3.5 language models include: 1) 2.4B model optimized for deployment on small or resource-constrained devices, 2) 7.8B model matching the size of its predecessor but offering improved performance, and 3) 32B model delivering powerful performance. All models support long-context processing of up to 32K tokens. Each model demonstrates state-of-the-art performance in real-world use cases and long-context understanding, while remaining competitive in general domains compared to recently released models of similar sizes.

Our documentation consists of the following sections:

  • [Performance](#performance): Experimental results of EXAONE 3.5 models.
  • [Quickstart](#quickstart): A basic guide to using EXAONE 3.5 models with Transformers.
  • [Quantized Models](#quantized-models): An explanation of quantized EXAONE 3.5 weights in AWQ and GGUF format.
  • [Run Locally](#run-locally): A guide to running EXAONE 3.5 models locally with llama.cpp and Ollama frameworks.
  • [Deployment](#deployment): A guide to running EXAONE 3.5 models with TensorRT-LLM, vLLM, and SGLang deployment frameworks.

News

  • 2024.12.11: EXAONE 3.5 is now avaiable on Ollama model library.

You can now install AutoAWQ library via pip without using the git repository.

  • 2024.12.10: We update the EXAONE Modelfile for Ollama. Please use the new one.
  • 2024.12.09: We release the EXAONE 3.5 language model series including 2.4B, 7.8B, and 32B instruction-tuned models. Check out the 📑 Technical Report!

Performance

Some experimental results are shown below. The full evaluation results can be found in the Technical Report.

Models MT-Bench LiveBench Arena-Hard AlpacaEval IFEval KoMT-Bench[1] LogicKor

EXAONE 3.5 32B 8.51 43.0 78.6 60.6 81.7 8.05 9.06

Qwen 2.5 32B 8.49 50.6 67.0 41.0 78.7 7.75 8.89

C4AI Command R 32B 7.38 29.7 17.0 25.9 26.1 6.72 8.24

Gemma 2 27B 8.28 40.0 57.5 52.2 59.7 7.19 8.56

Yi 1.5 34B 7.64 26.2 23.1 34.8 55.5 4.88 6.33

EXAONE 3.5 7.8B 8.29 39.8 68.7 54.2 78.9 7.96 9.08

Qwen 2.5 7B 6.48 35.6 48.9 31.7 72.5 5.19 6.38

Llama 3.1 8B 7.59 28.3 27.7 25.7 74.5 4.85 5.99

Gemma 2 9B 7.64 32.1 43.6 47.3 54.7 7.10 8.05

Phi 3 small (7B) 7.63 27.9 26.8 29.2 59.5 3.22 3.99

EXAONE 3.5 2.4B 7.81 33.0 48.2 37.1 73.6 7.24 8.51

Qwen 2.5 3B 7.21 25.7 26.4 17.4 60.8 5.68 5.21

Qwen 2.5 1.5B 5.72 19.2 10.6 8.4 40.7 3.87 3.60

Llama 3.2 3B 6.94 24.0 14.2 18.7 70.1 3.16 2.86

Gemma 2 2B 7.20 20.0 19.1 29.1 50.5 4.83 5.29

  • [1] KoMT-Bench is a dataset created by translating MT-Bench into Korean; see README for more details.

Quickstart

  • You need to install transformers>=4.43.0 for the EXAONE 3.5 models. The Latest version is recommended to use.

Here is the example code to show how to use EXAONE 3.5 models.

> [!Tip] > In all examples below, you can use another size model by changing 7.8B to 32B or 2.4B.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct"

model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
trust_remote_code=True,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Choose your prompt
prompt = "Explain how wonderful you are" # English example
prompt = "스스로를 자랑해 봐" # Korean example

messages = [
{"role": "system", "content": "You are EXAONE model from LG AI Research, a helpful assistant."},
{"role": "user", "content": prompt}
]
input_ids = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
)

output = model.generate(
input_ids.to("cuda"),
eos_token_id=tokenizer.eos_token_id,
max_new_tokens=128,
do_sample=False,
)
print(tokenizer.decode(output[0]))

> [!Note] > The EXAONE 3.5 instruction-tuned language models were trained to utilize the system prompt, > so we highly recommend using the system prompts provided in the code snippet above.

Quantized Models

We introduce a series of quantized weights of EXAONE 3.5 models.

AWQ

We provide AWQ-quantized weights of EXAONE 3.5 models, quantized using AutoAWQ library. Please refer to the AutoAWQ documentation for more details.

You need to install the latest version of AutoAWQ library (autoawq>=0.2.7.post3) to load the AWQ-quantized version of EXAONE 3.5 models.

pip install autoawq

You can load the model in similar ways to the original models, only changing the model name. It automatically loads with AWQ configuration of the model. Please check the [Quickstart section](#quickstart) above for more details.

GGUF

We provide weights in BF16 format and quantized weights in Q8_0, Q6_K, Q5_K_M, Q4_K_M, IQ4_XS.

The example below is for the 7.8B model in BF16 format. Please refer to the EXAONE 3.5 collection to find quantized models. You may need to install huggingface_hub to download the GGUF weights.

# (optional) install huggingface_hub
pip install huggingface_hub

# Download the GGUF weights
huggingface-cli download LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct-GGUF \
--include "EXAONE-3.5-7.8B-Instruct-BF16*.gguf" \
--local-dir .

Run Locally

For end users, we introduce two ways to run EXAONE 3.5 models locally.

> [!Note] > We highly recommend to use repetition penalty not exceeding 1.0 for better generation quality.

llama.cpp

You can run EXAONE models with llama.cpp as follows:

1. Install llama.cpp. Please refer to the llama.cpp repository for more details.

2. Download EXAONE 3.5 model in GGUF format.

huggingface-cli download LGAI-EXAONE/EXAONE-3.5-7.8B-Instruct-GGUF \
--include…

Excerpt shown — open the source for the full document.

Notability

notability 5.0/10

New model release, moderate stars.