In this guide on Mixture of Experts, you will learn:
- What MoE is and how it differs from traditional models
- Benefits of using it
- A step-by-step tutorial on how to implement it
Let’s dive in!
What is MoE?
An MoE (Mixture of Experts) is a machine learning architecture that combines multiple specialized sub-models—the “experts”—within a larger system. Each expert learns to handle different aspects of a task or distinct types of data.
A fundamental component in this architecture is the “gating network” or “router”. This component decides which expert, or combination of experts, should process a specific input. The gating network also assigns weights to each expert’s output. Weights are like scores, as they show how much influence each expert’s result should have.
In simple terms, the gating network uses weights to adjust each expert’s contribution to the final answer. To do so, it considers the input’s specific features. This allows the system to handle many types of data better than a single model could.
Differences Between MoE and Traditional Dense Models
In the context of neural networks, a traditional dense model works in a different way compared to MoE. For any piece of information you feed into it, the dense model uses all of its internal parameters to perform calculations. Thus, every part of its computational machinery is engaged for every input.
The main point is that, in dense models, all parts are engaged for every task. This contrasts with MoE, which activates only relevant expert subsections.
Below are the key differences between Moe and dense models:
- Parameter usage:
- Dense model: For any given input, the model uses all the parameters in the computation.
- MoE model: For any given input, the model uses only the parameters of the selected expert(s) and the gating network. Thus, if an MoE model has a large number of parameters, it activates only a fraction of these parameters for any single computation.
- Computational cost:
- Dense model: The amount of computation for a dense layer is fixed for every input, as all its parts are always engaged.
- MoE model: The computational cost for processing an input through an MoE layer can be lower than a dense layer of comparable total parameter size. That is because only a subset of the model—the chosen experts—performs the work. This allows MoE models to scale to a much larger number of total parameters without a proportional increase in the computational cost for each individual input.
- Specialization and learning:
- Dense model: All parts of a dense layer learn to contribute to processing all types of inputs it encounters.
- MoE model: Different expert networks can learn to become specialized. For example, one expert might become good at processing questions about history, while another specializes in scientific concepts. The gating network learns to identify the type of input and route it to the most appropriate experts. This can lead to more nuanced and effective processing.
Benefits of the Mixture of Experts Architecture
The MoE architecture is highly relevant in modern AI, particularly when dealing with LLMs. The reason is that it offers a way to increase a model’s capacity, which is its ability to learn and store information, without a proportional increase in computational cost during use.
The main advantages of MoE in AI include:
- Reduced inference latency: MoE models can decrease the time required to generate a prediction or output—called inference latency. This happens thanks to its ability to activate only the most relevant experts.
- Enhanced training scalability and efficiency: You can take advantage of the parallelism in MoE architectures during the AI training process. Different experts can be trained concurrently on diverse data subsets or specialized tasks. This can lead to faster convergence and training time.
- Improved model modularity and maintainability: The discrete nature of expert subnetworks facilitates a modular approach to model development and maintenance. Individual experts can be independently updated, retrained, or replaced with improved versions without requiring a complete retraining of the entire model. This simplifies the integration of new knowledge or capabilities and allows for more targeted interventions if a specific expert’s performance degrades.
- Potential for increased interpretability: The specialization of experts may offer clearer insights into the model’s decision-making processes. Analyzing which experts are consistently activated for specific inputs can provide clues about how the model has learned to partition the problem space and attribute relevance. This characteristic offers a potential way for better understanding complex model behaviors compared to monolithic dense networks.
- Greater energy efficiency at scale: MoE-based models can achieve lower energy consumption per query compared to traditional dense models. That is due to the sparse activation of parameters during inference, as they use only a fraction of the available parameters per input.
How To Implement MoE: A Step-by-Step Guide
In this tutorial section, you will learn how to use MoE. In particular, you will use a dataset containing sports news. The MoE will leverage two experts based on the following models:
sshleifer/distilbart-cnn-6-6
: To summarize the content of each news.distilbert-base-uncased-finetuned-sst-2-english
: To calculate the sentiment of each news. In sentiment analysis, “sentiment” refers to the emotional tone, opinion, or attitude expressed in a text. The output can be:- Positive: Expresses favorable opinions, happiness, or satisfaction.
- Negative: Expresses unfavorable opinions, sadness, anger, or dissatisfaction.
- Neutral: Expresses no strong emotion or opinion, often factual.
At the end of the process, each news item will be saved in a JSON file containing:
- The ID, the headline, and the URL.
- The summary of the content.
- The sentiment of the content with the confidence score.
The dataset containing the news can be retrieved using Bright Data’s Web Scraper APIs, specialized scraping endpoints to retrieve structured web data from 100+ domains in real-time.
The dataset containing the input JSON data can be generated using the code in our guide “Understanding Vector Databases: The Engine Behind Modern AI.” Specifically, refer to step 1 in the “Practical Integration: A Step-by-Step Guide” chapter.
The input JSON dataset—called news-data.json
—contains an array of news items as below:
[
{
"id": "c787dk9923ro",
"url": "https://www.bbc.com/sport/tennis/articles/c787dk9923ro",
"author": "BBC",
"headline": "Wimbledon plans to increase 'Henman Hill' capacity and accessibility",
"topics": [
"Tennis"
],
"publication_date": "2025-04-03T11:28:36.326Z",
"content": "Wimbledon is planning to renovate its iconic 'Henman Hill' and increase capacity for the tournament's 150th anniversary. Thousands of fans have watched action on a big screen from the grass slope which is open to supporters without show-court tickets. The proposed revamp - which has not yet been approved - would increase the hill's capacity by 20% in time for the 2027 event and increase accessibility. It is the latest change planned for the All England Club, after a 39-court expansion was approved last year. Advertisement "It's all about enhancing this whole area, obviously it's become extremely popular but accessibility is difficult for everyone," said four-time Wimbledon semi-finalist Tim Henman, after whom the hill was named. "We are always looking to enhance wherever we are on the estate. This is going to be an exciting project."",
"videos": [],
"images": [
{
"image_url": "https://ichef.bbci.co.uk/ace/branded_sport/1200/cpsprodpb/31f9/live/0f5b2090-106f-11f0-b72e-6314f702e779.jpg",
"image_description": "Main image"
},
{
"image_url": "https://ichef.bbci.co.uk/ace/standard/2560/cpsprodpb/31f9/live/0f5b2090-106f-11f0-b72e-6314f702e779.jpg",
"image_description": "A render of planned improvements to Wimbledon's Henman Hill"
}
],
"related_articles": [
{
"article_title": "Live scores, results and order of playLive scores, results and order of play",
"article_url": "https://www.bbc.com/sport/tennis/scores-and-schedule"
},
{
"article_title": "Get tennis news sent straight to your phoneGet tennis news sent straight to your phone",
"article_url": "https://www.bbc.com/sport/articles/cl5q9dk9jl3o"
}
],
"keyword": null,
"timestamp": "2025-05-19T15:03:16.568Z",
"input": {
"url": "https://www.bbc.com/sport/tennis/articles/c787dk9923ro",
"keyword": ""
}
},
// omitted for brevity...
]
Follow the instructions below and build your MoE example!
Prerequisites and Dependencies
To replicate this tutorial, you must have Python 3.10.1 or higher installed on your machine.
Suppose you call the main folder of your project moe_project/
. At the end of this step, the folder will have the following structure:
moe_project/
├── venv/
├── news-data.json
└── moe_analysis.py
Where:
venv/
contains the Python virtual environment.news-data.json
is the input JSON file containing the news data you scraped with Web Scraper API.moe_analysis.py
is the Python file that contains the coding logic.
You can create the venv/
virtual environment directory like so:
python -m venv venv
To activate it, on Windows, run:
venvScriptsactivate
Equivalently, on macOS and Linux, execute:
source venv/bin/activate
In the activated virtual environment, install the dependencies with:
pip install transformers torch
These libraries are:
transformers
: Hugging Face’s library for state-of-the-art machine learning models.torch
: PyTorch, an open-source machine learning framework.
Step #1: Setup and Configuration
Initialize the moe_analysis.py
file by importing the required libraries and setting up some constants:
import json
from transformers import pipeline
# Define the input JSON file
JSON_FILE = "news-data.json"
# Specify the model for generating summaries
SUMMARIZATION_MODEL = "sshleifer/distilbart-cnn-6-6"
# Specify the model for analyzing sentiment
SENTIMENT_MODEL = "distilbert-base-uncased-finetuned-sst-2-english"
This code defines:
- The name of input JSON file with the news scraped.
- The models to ues for the experts.
Perfect! You have what it takes to get started with MoE in Python.
Step #2: Define the News Summarization Expert
This step involves creating a class that encapsulates the functionality of the expert for summarizing the news:
class NewsSummarizationLLMExpert:
def __init__(self, model_name=SUMMARIZATION_MODEL):
self.model_name = model_name
self.summarizer = None
# Initialize the summarization pipeline
self.summarizer = pipeline(
"summarization",
model=self.model_name,
tokenizer=self.model_name,
)
def analyze(self, article_content, article_headline=""):
# Call the summarizer pipeline with the article content
summary_outputs = self.summarizer(
article_content,
max_length=300,
min_length=30,
do_sample=False
)
# Extract the summary text from the pipeline's output
summary = summary_outputs[0]["summary_text"]
return { "summary": summary }
The above code:
- Initializes the summarization pipeline with the method
pipeline()
from Hugging Face. - Defines how the summarization expert has to process an article with the method
analyze()
.
Good! You just created the first expert in the MoE architecture that takes care of summarizing the news.
Step #3: Define the Sentiment Analysis Expert
Similar to the summarization expert, define a specialized class for performing sentiment analysis on the news:
class SentimentAnalysisLLMExpert:
def __init__(self, model_name=SENTIMENT_MODEL):
self.model_name = model_name
self.sentiment_analyzer = None
# Initialize the sentiment analysis pipeline
self.sentiment_analyzer = pipeline(
"sentiment-analysis",
model=self.model_name,
tokenizer=self.model_name,
)
def analyze(self, article_content, article_headline=""):
# Define max tokens
max_chars_for_sentiment = 2000
# Truncate the content if it exceeds the maximum limit
truncated_content = article_content[:max_chars_for_sentiment]
# Call the sentiment analyzer pipeline
sentiment_outputs = self.sentiment_analyzer(truncated_content)
# Extract the sentiment label
label = sentiment_outputs[0]["label"]
# Extract the sentiment score
score = sentiment_outputs[0]["score"]
return { "sentiment_label": label, "sentiment_score": score }
This snippet:
- Initializes the sentiment analysis pipeline with the method
pipeline()
. - Defines the method
analyze()
to perform sentiment analysis. It also returns the sentiment label—negative or positive—and the confidence score.
Very well! You now have another expert that calculates and expresses the sentiment of the text in the news.
Step #4: Implement the Gating Network
Now, you have to define the logic behind the gating network to route the experts:
def route_to_experts(item_data, experts_registry):
chosen_experts = []
# Select the summarizer and sentiment analyzer
chosen_experts.append(experts_registry["summarizer"])
chosen_experts.append(experts_registry["sentiment_analyzer"])
return chosen_experts
In this implementation, the gating network is simple. It always uses both experts for every news item, but it does so sequentially:
- It summarizes the text.
- It calculates the sentiment.
Note: The gating network is quite simple in this example. At the same time, if you wanted to achieve the same goal using a single, larger model, it would have required significantly more computation. In contrast, the two experts are leveraged only for the tasks that are relevant to them. This makes it a simple yet effective application of the Mixture of Experts architecture.
In other scenarios, this part of the process could be improved by training an ML model to learn how and when to activate a specific expert. This would allow the gating network to respond dynamically.
Fantastic! The gating network logic is set up and ready to operate.
Step #5: Main Orchestration Logic for Processing News Data
Define the core function that manages the entire workflow defined by the following task:
- Load the JSON dataset.
- Initialize the two experts.
- Iterate through the news items.
- Route them to the chosen experts.
- Collect the results.
You can do it with the following code:
def process_news_json_with_moe(json_filepath):
# Open and load news items from the JSON file
with open(json_filepath, "r", encoding="utf-8") as f:
news_items = json.load(f)
# Create a dictionary to hold instances of expert classes
experts_registry = {
"summarizer": NewsSummarizationLLMExpert(),
"sentiment_analyzer": SentimentAnalysisLLMExpert()
}
# List to store the analysis results
all_results = []
# Iterate through each news item in the loaded data
for i, news_item in enumerate(news_items):
print(f"n--- Processing Article {i+1}/{len(news_items)} ---")
# Extract relevant data from the news item
id = news_item.get("id")
headline = news_item.get("headline")
content = news_item.get("content")
url = news_item.get("url")
# Print progress
print(f"ID: {id}, Headline: {headline[:70]}...")
# Use the gating network to determine the expert to use
active_experts = route_to_experts(news_item, experts_registry)
# Prepare a dictionary to store the analysis results
news_item_analysis_results = {
"id": id,
"headline": headline,
"url": url,
"analyses": {}
}
# Iterate through the experts and apply their analysis
for expert_instance in active_experts:
expert_name = expert_instance.__class__.__name__ # Get the class name of the expert
try:
# Call the expert's analyze method
analysis_result = expert_instance.analyze(article_content=content, article_headline=headline)
# Store the result under the expert's name
news_item_analysis_results["analyses"][expert_name] = analysis_result
except Exception as e:
# Handle any errors during analysis by a specific expert
print(f"Error during analysis with {expert_name}: {e}")
news_item_analysis_results["analyses"][expert_name] = { "error": str(e) }
# Add the current item's results to the overall list
all_results.append(news_item_analysis_results)
return all_results
In this snippet:
- The
for
loop iterates over all the loaded news. - The
try-except
block performs the analysis and manages errors that can occur. In this case, the errors that can occur are mainly due to the parametersmax_length
andmax_chars_for_sentiment
defined in the previous functions. Since not all the content retrieved has the same length, error management is fundamental for handling exceptions effectively.
Here we go! You defined the orchestration function of the whole process.
Step #6: Launch the Processing Function
As a final part of the script, you have to execute the main processing function and then save the analyses to an output JSON file as follows:
# Call the main processing function with the input JSON file
final_analyses = process_news_json_with_moe(JSON_FILE)
print("nn--- MoE Analysis Complete ---")
# Write the final analysis results to a new JSON file
with open("analyzed_news_data.json", "w", encoding="utf-8") as f_out:
json.dump(final_analyses, f_out, indent=4, ensure_ascii=False)
In the above code:
- The
final_analyses
variable calls the function to process the data with MoE. - The analyzed data is stored in the
analyzed_news_data.json
output file.
Et voilà! The whole script is finalized, the data is analyzed, and saved.
Step #7: Put It All Together and Run the Code
Below is what the moe_analysis.py
file should now contain:
import json
from transformers import pipeline
# Define the input JSON file
JSON_FILE = "news-data.json"
# Specify the model for generating summaries
SUMMARIZATION_MODEL = "sshleifer/distilbart-cnn-6-6"
# Specify the model for analyzing sentiment
SENTIMENT_MODEL = "distilbert-base-uncased-finetuned-sst-2-english"
# Define a class representing an expert for news summarization
class NewsSummarizationLLMExpert:
def __init__(self, model_name=SUMMARIZATION_MODEL):
self.model_name = model_name
self.summarizer = None
# Initialize the summarization pipeline
self.summarizer = pipeline(
"summarization",
model=self.model_name,
tokenizer=self.model_name,
)
def analyze(self, article_content, article_headline=""):
# Call the summarizer pipeline with the article content
summary_outputs = self.summarizer(
article_content,
max_length=300,
min_length=30,
do_sample=False
)
# Extract the summary text from the pipeline's output
summary = summary_outputs[0]["summary_text"]
return { "summary": summary }
# Define a class representing an expert for sentiment analysis
class SentimentAnalysisLLMExpert:
def __init__(self, model_name=SENTIMENT_MODEL):
self.model_name = model_name
self.sentiment_analyzer = None
# Initialize the sentiment analysis pipeline
self.sentiment_analyzer = pipeline(
"sentiment-analysis",
model=self.model_name,
tokenizer=self.model_name,
)
def analyze(self, article_content, article_headline=""):
# Define max tokens
max_chars_for_sentiment = 2000
# Truncate the content if it exceeds the maximum limit
truncated_content = article_content[:max_chars_for_sentiment]
# Call the sentiment analyzer pipeline
sentiment_outputs = self.sentiment_analyzer(truncated_content)
# Extract the sentiment label
label = sentiment_outputs[0]["label"]
# Extract the sentiment score
score = sentiment_outputs[0]["score"]
return { "sentiment_label": label, "sentiment_score": score }
# Define a gating network
def route_to_experts(item_data, experts_registry):
chosen_experts = []
# Select the summarizer and sentiment analyzer
chosen_experts.append(experts_registry["summarizer"])
chosen_experts.append(experts_registry["sentiment_analyzer"])
return chosen_experts
# Main function to manage the orchestration process
def process_news_json_with_moe(json_filepath):
# Open and load news items from the JSON file
with open(json_filepath, "r", encoding="utf-8") as f:
news_items = json.load(f)
# Create a dictionary to hold instances of expert classes
experts_registry = {
"summarizer": NewsSummarizationLLMExpert(),
"sentiment_analyzer": SentimentAnalysisLLMExpert()
}
# List to store the analysis results
all_results = []
# Iterate through each news item in the loaded data
for i, news_item in enumerate(news_items):
print(f"n--- Processing Article {i+1}/{len(news_items)} ---")
# Extract relevant data from the news item
id = news_item.get("id")
headline = news_item.get("headline")
content = news_item.get("content")
url = news_item.get("url")
# Print progress
print(f"ID: {id}, Headline: {headline[:70]}...")
# Use the gating network to determine the expert to use
active_experts = route_to_experts(news_item, experts_registry)
# Prepare a dictionary to store the analysis results
news_item_analysis_results = {
"id": id,
"headline": headline,
"url": url,
"analyses": {}
}
# Iterate through the experts and apply their analysis
for expert_instance in active_experts:
expert_name = expert_instance.__class__.__name__ # Get the class name of the expert
try:
# Call the expert's analyze method
analysis_result = expert_instance.analyze(article_content=content, article_headline=headline)
# Store the result under the expert's name
news_item_analysis_results["analyses"][expert_name] = analysis_result
except Exception as e:
# Handle any errors during analysis by a specific expert
print(f"Error during analysis with {expert_name}: {e}")
news_item_analysis_results["analyses"][expert_name] = { "error": str(e) }
# Add the current item's results to the overall list
all_results.append(news_item_analysis_results)
return all_results
# Call the main processing function with the input JSON file
final_analyses = process_news_json_with_moe(JSON_FILE)
print("nn--- MoE Analysis Complete ---")
# Write the final analysis results to a new JSON file
with open("analyzed_news_data.json", "w", encoding="utf-8") as f_out:
json.dump(final_analyses, f_out, indent=4, ensure_ascii=False)
Great! In around 130 lines of code, you have just completed your first MoE project.
Run the code with the following command:
python moe_analysis.py
The output in the terminal should contain:
# Omitted for brevity...
--- Processing Article 6/10 ---
ID: cdrgdm4ye53o, Headline: Japanese Grand Prix: Lewis Hamilton says he has 'absolute 100% faith' ...
--- Processing Article 7/10 ---
ID: czed4jk7eeeo, Headline: F1 engines: A return to V10 or hybrid - what's the future?...
Error during analysis with NewsSummarizationLLMExpert: index out of range in self
--- Processing Article 8/10 ---
ID: cy700xne614o, Headline: Monte Carlo Masters: Novak Djokovic beaten as wait for 100th title con...
Error during analysis with NewsSummarizationLLMExpert: index out of range in self
# Omitted for brevity...
--- MoE Analysis Complete ---
When the execution completes, an analyzed_news_data.json output
file will appear in the project folder. Open it, and focus on one of the news items. The analyses
field will contain the summary and sentiment analysis results produced by the two experts:
As you can see, the MoE approach has:
- Summarized the content of the article and reported it under
summary
. - Defined a positive sentiment with a 0.99 confidence.
Mission complete!
Conclusion
In this article, you learned about MoE and how to implement it in a real-world scenario through a step-by-step section.
If you want to explore more MoE scenarios and you need some fresh data to do so, Bright Data offers a suite of powerful tools and services designed to retrieve updated, real-time data from web pages while overcoming scraping obstacles.
These solutions include:
- Web Unlocker: An API that bypasses anti-scraping protections and delivers clean HTML from any webpage with minimal effort.
- Scraping Browser: A cloud-based, controllable browser with JavaScript rendering. It automatically handles CAPTCHAs, browser fingerprinting, retries, and more for you.
- Web Scraper APIs: Endpoints for programmatic access to structured web data from dozens of popular domains.
For other machine learning scenarios, also explore our AI hub.
Sign up for Bright Data now and start your free trial to test our scraping solutions!