AI2’s new model aims to be open and powerful yet cost effective


Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More


The Allen Institute for AI (AI2) released a new open-source model that hopes to answer the need for a large language model (LLM) that is both a strong performer and cost-effective. 

The new model, which it calls OLMoE, leverages a sparse mixture of experts (MoE) architecture. It has 7 billion parameters but uses only 1 billion parameters per input token. It has two versions: OLMoE-1B-7B, which is more general purpose and OLMoE-1B-7B-Instruct for instruction tuning. 

AI2 emphasized OLMoE is fully open-source, unlike other mixture of experts models.

“Most MoE models, however, are closed source: while some have publicly released model weights, they offer limited to no information about their training data, code, or recipes,” AI2 said in its paper. “The lack of open resources and findings about these details prevents the field from building cost-efficient open MoEs that approach the capabilities of closed-source frontier models.”

This makes most MoE models inaccessible to many academics and other researchers. 

Nathan Lambert, AI2 research scientist, posted on X (formerly Twitter) that OLMOE will “help policy…this can be a starting point as academic H100 clusters come online.”

Lambert added that the models are part of AI2’s goal of making open-sourced models that perform as well as closed models. 

“We haven’t changed our organization or goals at all since our first OLMo models. We’re just slowly making our open-source infrastructure and data better. You can use this too. We released an actual state-of-the-art model fully, not just one that is best on one or two evaluations,” he said. 

How is OLMoE built

AI2 said it decided to use a fine-grained routing of 64 small experts when designing OLMoE and only activated eight at a time. Its experiments showed the model performs as well as other models but with significantly lower inference costs and memory storage.  

OLMOE builds on AI2’s previous open-source model OLMO 1.7-7B, which supported a context window of 4,096 tokens, including the training dataset Dolma 1.7 AI2 developed for OLMO. OLMoE trained on a mix of data from DCLM and Dolma, which included a filtered subset of Common Crawl, Dolma CC, Refined Web, StarCoder, C4, Stack Exchange, OpenWebMath, Project Gutenberg, Wikipedia and others. 

AI2 said OLMoE “outperforms all available models with similar active parameters, even surpassing larger ones like Llama2-13B-Chat and DeepSeekMoE-16B.” In benchmark tests, OLMoE-1B-7B often performed close to other models with 7B parameters or more like Mistral-7B, Llama 3.1-B and Gemma 2. However, in benchmarks against models with 1B parameters, OLMoE-1B-7B smoked other open-source models like Pythia, TinyLlama and even AI2’s OLMO. 

Chart from AI2 on OLMoE-1B-7B's performance

Open-sourcing mixture of experts

One of AI2’s goals is to provide more fully open-source AI models to researchers, including for MoE, which is fast becoming a popular model architecture among developers. 

Many AI model developers have been using the MoE architecture to build models. For example, Mistral’s Mixtral 8x22B used a sparse MoE system. Grok, the AI model from X.ai, also used the same system, while rumors that GPT4 also tapped MoE persist.  

But AI2 insists not many of these other AI models offer full openness and do not offer information about training data or their source code. 

“This comes despite MoEs requiring more openness as they add complex new design questions to LMs, such as how many total versus active parameters to use, whether to use many small or few large experts if experts should be shared, and what routing algorithm to use,” the company said. 

The Open Source Initiative, which defines what makes something open source and promotes it, has begun tackling what open source means for AI models. 



Source link

About The Author

Scroll to Top