Usage
Installation
Install LangMARL:
$ pip install langmarl
For environment-specific dependencies:
$ pip install langmarl[pettingzoo] # Pistonball / PettingZoo support
$ pip install langmarl[all] # All optional dependencies
Set up your LLM API key as an environment variable:
$ export OPENAI_API_KEY="your-api-key"
For other providers, set the corresponding key:
$ export GOOGLE_API_KEY="your-key" # Google Gemini
$ export TOGETHER_API_KEY="your-key" # Together (Llama, Qwen)
$ export DEEPSEEK_API_KEY="your-key" # DeepSeek
Quick Start
One-line training
Train from a JSON config file with a single call:
import langmarl
langmarl.train("configs/language_task/qa_central_credit.json")
Programmatic usage
For full control over the training pipeline:
import langmarl
# 1. Configure
config = langmarl.LanguageTaskConfig(
task_type="qa",
paradigm="central_credit",
llm=langmarl.LLMConfig.from_preset("gpt-4o-mini"),
num_agents=2,
num_iterations=5,
trajectories_per_iteration=10,
)
# 2. Create components
env = langmarl.make_env("language", config)
critic = langmarl.CentralizedCritic(config)
optimizer = langmarl.PolicyGradientOptimizer(config.get_optimizer_llm())
# 3. Train
trainer = langmarl.MonteCarloTrainer(
config=config,
env=env,
critic=critic,
optimizer=optimizer,
)
metrics = trainer.train()
Custom environment
Register a custom environment by subclassing BaseEnvironment:
import langmarl
@langmarl.register_env("my_env")
class MyEnv(langmarl.BaseEnvironment):
def __init__(self, config):
self.num_agents = config.num_agents
self.llm_client = langmarl.LLMClient(config.get_actor_llm())
def reset(self, task: dict) -> dict:
return {"task": task}
def step(self, agent_id: str, action: str):
return {}, 0.0, False, {}
def sample_tasks(self, num_samples: int) -> list[dict]:
return [{"question": "What is 2+2?", "ground_truth": "4"}] * num_samples
def collect_trajectory(self, policies, task) -> langmarl.Trajectory:
steps = []
for i, (agent, policy) in enumerate(policies.items()):
response = self.llm_client.chat(policy, task["question"])
steps.append({"agent_id": agent, "input": task["question"], "output": response})
reward = 1.0 if task["ground_truth"] in steps[-1]["output"] else 0.0
return langmarl.Trajectory(task=task, steps=steps, reward=reward)
# Use the custom environment
config = langmarl.BaseConfig(
paradigm="central_credit",
llm=langmarl.LLMConfig.from_preset("gpt-4o-mini"),
)
env = langmarl.make_env("my_env", config)
Using callbacks
Add callbacks to hook into the training loop:
trainer = langmarl.MonteCarloTrainer(
config=config,
env=env,
critic=critic,
optimizer=optimizer,
callbacks=[
langmarl.LoggingCallback(),
langmarl.EarlyStoppingCallback(patience=3, min_delta=0.01),
],
)
trainer.train()
Using different LLMs per role
Assign different models to actors, critics, and optimizers:
config = langmarl.LanguageTaskConfig(
task_type="qa",
paradigm="central_credit",
actor_llm=langmarl.LLMConfig.from_preset("gpt-4o-mini"), # cheap for actors
critic_llm=langmarl.LLMConfig.from_preset("gpt-4o"), # strong for critic
optimizer_llm=langmarl.LLMConfig.from_preset("gpt-4o-mini"), # cheap for optimizer
num_agents=2,
num_iterations=5,
)
If only llm is set, it is used as the fallback for all three roles.
Configuration
JSON config files
Training runs can be configured via JSON files:
{
"exp_name": "qa_central_credit",
"paradigm": "central_credit",
"task_type": "qa",
"benchmark_path": "env/lang_benchmark/HotPotQA",
"num_agents": 2,
"num_iterations": 5,
"trajectories_per_iteration": 10,
"llm": "gpt-4o-mini",
"log_level": "INFO"
}
The llm field accepts either a predefined model name (string) or a full LLM config object:
{
"llm": {
"name": "my-model",
"model_string": "Qwen/Qwen2.5-72B-Instruct",
"base_url": "https://api.together.xyz/v1",
"api_key_env_var": "TOGETHER_API_KEY"
}
}
Supported Models
All providers use the OpenAI-compatible API format. Predefined models:
Name |
Model String |
Provider |
|---|---|---|
|
gpt-4o |
OpenAI |
|
gpt-4o-mini |
OpenAI |
|
gpt-5 |
OpenAI |
|
o1 |
OpenAI |
|
o1-mini |
OpenAI |
|
gemini-1.5-pro |
|
|
gemini-1.5-flash |
|
|
gemini-2.0-flash |
|
|
meta-llama/llama-3.1-70b-instruct |
Together |
|
meta-llama/llama-3.1-8b-instruct |
Together |
|
meta-llama/Llama-3.3-70B-Instruct-Turbo |
Together |
|
Qwen/Qwen2.5-72B-Instruct-Turbo |
Together |
|
Qwen/Qwen2.5-7B-Instruct-Turbo |
Together |
|
Qwen/Qwen2.5-Coder-32B-Instruct |
Together |
|
deepseek-chat |
DeepSeek |
|
deepseek-reasoner |
DeepSeek |
|
llama3 |
Local Ollama |
|
qwen2 |
Local Ollama |
You can also create a custom LLMConfig for any OpenAI-compatible endpoint:
custom_llm = langmarl.LLMConfig(
name="my-custom-model",
model_string="my-model-id",
base_url="http://localhost:8000/v1",
api_key="my-key",
max_tokens=4096,
input_price_per_million=0.0,
output_price_per_million=0.0,
)
Output Structure
Training runs produce the following directory structure:
experiments/
+-- {exp_name}_{timestamp}/
+-- config.json
+-- run.log
+-- metrics.json
+-- trajectories/
| +-- iter_0/
| | +-- episode_0.json
| | +-- episode_1.json
| +-- iter_1/
+-- checkpoints/
| +-- iter_0/
| | +-- agent_1.txt
| | +-- agent_2.txt
| | +-- metadata.json
| +-- iter_1/
+-- gradients/
| +-- iter_0/
| +-- agent_1_gradients.json
| +-- agent_1_aggregated.txt
+-- evaluations/
+-- iter_0/
+-- episode_0_eval.txt