XiaomiMiMo/MiMo-7B-MTPs
Captured source
source ↗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Unlocking the Reasoning Potential of Language Model From Pretraining to Posttraining
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
> This model repository is licensed under the MIT License.
I. Pretrained MTPs of MiMo-7B
This model repository contains the pretrained MTP weights of MiMo-7B (model.mtp_layers.1 and model.mtp_layers.2)
Currently, MiMo-7B model each has 1 MTP layer (model.mtp_layers.0). Users may load the weights of pretrained MTPs for potential rollout speedup (please refer to *Power Up Speculative Decoding In Reinforcement Learning*).
> [!IMPORTANT] > We tuned 1 MTP layer in SFT and freeze it in RL, and we HAVE NOT test the performance of posttrained models with 2 more pretrained MTP layers.
II. Contact
Please contact us at [mimo@xiaomi.com](mailto:mimo@xiaomi.com) or open an issue if you have any questions.
Notability
notability 4.0/10Low traction small model release