What does this model signal mean?

Tencent Hunyuan published tencent/Sequential-Hidden-Decoding-8B-n8-Instruct. This model signal is evidence of what shipped on model infrastructure and how the release is positioned. High-signal details: license other · 41 HF downloads · Tencent's 8B instruct model using sequential hidden decoding for efficient generation.. onlylabs links this event to 1 captured evidence page and 6 related model signals.

Tencent Hunyuan Model: tencent/Sequential-Hidden-Decoding-8B-n8-Instruct

Captured source

source ↗

Hugging Face/huggingface.co/tencent/Sequential-Hidden-Decoding-8B-n8-Instruct

tencent/Sequential-Hidden-Decoding-8B-n8-Instruct model card

Source ↗

published Mar 31, 2026seen Jun 6captured Jun 11http 200method plaintask text-generationlicense otherparams 13Bdownloads 41likes 8

Sequential-Hidden-Decoding-8B-n8-Instruct

This is the instruction-tuned variant of Sequential Hidden Decoding 8B n=8, designed for conversational and instruction-following use cases.

Base model: Sequential-Hidden-Decoding-8B-n8
Underlying architecture: Qwen3-8B-Base
Scale: 8x
Context Length: 131072
Dtype: bfloat16

Key Idea

Sequential Hidden Decoding scales sequence length by preparing multiple Embedding matrices for the same token sequence, interleaving the results, and feeding the expanded sequence into the same Transformer. This model is the instruction-tuned release of the 8B n=8 variant.

Serving (SGLang)

This model requires a patched version of SGLang for inference. See the project page for installation options.

python -m sglang.launch_server \
--model-path tencent/Sequential-Hidden-Decoding-8B-n8-Instruct \
--trust-remote-code \
--tp-size 1 \
--port 30000 --host 0.0.0.0 \
--chunked-prefill-size -1 \
--attention-backend fa3 \
--mem-fraction-static 0.82 \
--max-running-requests 32 \
--context-length 131072 \
--cuda-graph-max-bs 128 \
--cuda-graph-bs 1 2 4 8 16 32 64 128

> Note: Sequential Hidden Decoding models process n×-length sequences internally, so --chunked-prefill-size -1, --attention-backend fa3, and conservative batch sizing are important for stability and performance.

Chat Usage

This is an instruction-tuned model. Use the /v1/chat/completions endpoint:

from openai import OpenAI

client = OpenAI(base_url="http://localhost:30000/v1", api_key="EMPTY")
response = client.chat.completions.create(
model="tencent/Sequential-Hidden-Decoding-8B-n8-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain the idea of hidden decoding in simple terms."},
],
max_tokens=512,
temperature=0.7,
)
print(response.choices[0].message.content)

Files

This repository includes the custom architecture files required by trust_remote_code:

configuration_qwen3_scale_seq.py
modeling_qwen3_scale_seq.py

Related Models

| Model | Type | Notes | |-------|:----:|-------| | Sequential-Hidden-Decoding-8B-n2 | Base | 2x scale base model | | Sequential-Hidden-Decoding-8B-n4 | Base | 4x scale base model | | Sequential-Hidden-Decoding-8B-n8 | Base | 8x scale base model | | Sequential-Hidden-Decoding-8B-n8-Instruct | Instruct | Instruction-tuned 8x scale model |

Citation

@article{hidden_decoding_2026,
title = {Hidden Decoding: Scaling Sequence Length in Pretraining},
year = {2026},
url = {https://welm.weixin.qq.com/posts/hidden_decoding/}
}

License

This model is released under the [License Terms of Sequential-Hidden-Decoding](LICENSE).

Notability

notability 5.0/10

Low traction research model from Tencent