LangMARL Documentation

LangMARL is a language-space multi-agent reinforcement learning library that applies credit assignment and policy gradient optimization from classical MARL into natural language space. It enables principled autonomous optimization of multi-agent LLM-based systems via Centralized Training with Decentralized Execution (CTDE).

import langmarl

langmarl.train("configs/language_task/qa_central_credit.json")

Core Concepts

LangMARL treats natural language as a first-class optimization space:

Policies are Language – Each agent’s policy is a natural language instruction (system prompt), not a numeric parameter vector.
Credits are Language – A centralized critic assigns agent-level credit via trajectory-level language analysis, producing causal and interpretable feedback.
Optimization is Language Evolution – Policies are updated via language gradients (improvement instructions) instead of numeric gradients.

Architecture

LangMARL implements the CTDE paradigm with four components:

LLM Actors – Each agent is an LLM whose behavior is governed by a natural language policy. During execution, each agent observes only its local information and acts independently.
Centralized Critic – Used only during training. Evaluates full episode trajectories and generates per-agent credit using LLM-as-judge.
Policy Gradient Optimizer – Converts credit signals into language gradients (concrete improvement instructions) and applies them to agent policies.
Monte Carlo Trainer – Orchestrates the training loop: collect trajectories, evaluate, generate gradients, aggregate, and update policies.

Training Paradigms

Paradigm	Description
`central_global`	A shared critic evaluates overall team performance. All agents receive the same shared gradient signal.
`central_credit`	A shared critic evaluates each agent’s individual contribution to team success. Each agent receives a targeted per-agent gradient.

Supported Environments

Environment	Agents	Description
Language Tasks	2+	Sequential collaboration on QA (HotPotQA), Math, Creative Writing, and Coding (HumanEval).
Overcooked-AI	2	Cooperative cooking with sparse team rewards and role differentiation.
Pistonball	10–20	Large-scale cooperative control with partial observability.

Custom environments can be registered via the @langmarl.register_env decorator.

Note

This project is under active development.