LangMARL Documentation
LangMARL is a language-space multi-agent reinforcement learning library that applies credit assignment and policy gradient optimization from classical MARL into natural language space. It enables principled autonomous optimization of multi-agent LLM-based systems via Centralized Training with Decentralized Execution (CTDE).
import langmarl
langmarl.train("configs/language_task/qa_central_credit.json")
Core Concepts
LangMARL treats natural language as a first-class optimization space:
Policies are Language – Each agent’s policy is a natural language instruction (system prompt), not a numeric parameter vector.
Credits are Language – A centralized critic assigns agent-level credit via trajectory-level language analysis, producing causal and interpretable feedback.
Optimization is Language Evolution – Policies are updated via language gradients (improvement instructions) instead of numeric gradients.
Architecture
LangMARL implements the CTDE paradigm with four components:
LLM Actors – Each agent is an LLM whose behavior is governed by a natural language policy. During execution, each agent observes only its local information and acts independently.
Centralized Critic – Used only during training. Evaluates full episode trajectories and generates per-agent credit using LLM-as-judge.
Policy Gradient Optimizer – Converts credit signals into language gradients (concrete improvement instructions) and applies them to agent policies.
Monte Carlo Trainer – Orchestrates the training loop: collect trajectories, evaluate, generate gradients, aggregate, and update policies.
Training Paradigms
Paradigm |
Description |
|---|---|
|
A shared critic evaluates overall team performance. All agents receive the same shared gradient signal. |
|
A shared critic evaluates each agent’s individual contribution to team success. Each agent receives a targeted per-agent gradient. |
Supported Environments
Environment |
Agents |
Description |
|---|---|---|
Language Tasks |
2+ |
Sequential collaboration on QA (HotPotQA), Math, Creative Writing, and Coding (HumanEval). |
Overcooked-AI |
2 |
Cooperative cooking with sparse team rewards and role differentiation. |
Pistonball |
10–20 |
Large-scale cooperative control with partial observability. |
Custom environments can be registered via the @langmarl.register_env decorator.
Note
This project is under active development.
Contents
- Usage
- API Reference
- Convenience Functions
- Configuration
- Core Abstractions
- Concrete Implementations
- Training
- LLM Client
- Storage