Member-only story

Bringing MegaBlocks to Databricks

Caio Moreno
4 min readApr 11, 2024

--

At Databricks, we’re committed to building the most efficient and performant training tools for large-scale AI models. With the recent release of DBRX, we’ve highlighted the power of Mixture-of-Experts (MoE) models, which provide a substantial improvement in training and inference efficiency. We’re excited to announce that MegaBlocks, the open-source library used to train DBRX, is becoming an official Databricks project. We are also releasing our MegaBlocks integration into our open source training stack, LLMFoundry. Along with these open source releases, we’re onboarding customers to our optimized internal versions who are ready to get peak performance at scale.

What is a Mixture of Experts Model?

A Mixture of Experts (MoE) model is a machine learning model that combines the outputs of multiple expert networks, or “experts,” to make a prediction. Each expert specializes in a specific region of the input space, and a gating network determines how to combine the experts’ outputs for a given input.

In the context of transformer networks, each feed-forward block can be replaced with an MoE layer. This layer consists of multiple expert networks, each with its own set of parameters, and a gating network that determines how to weight the outputs of the experts for each input token. The gating network is typically a linear layer feed-forward network that takes in each token as input and produces a set of weights as output. A token assignment algorithm uses these weights to choose which…

--

--

Caio Moreno
Caio Moreno

Written by Caio Moreno

Solutions Architect @databricks | Professor | PhD | Ex-Microsoft | Ex-Avanade/Accenture | Ex-Pentaho/Hitachi | Ex-AOL | Ex-IT4biz CEO. (Opinions are my own)

No responses yet