Skip to content

icon picker
Decentralized MARL





Decentralized MARL
Paper Name
Categories
Type
Concept
Recent issues
Motivations
Contributions
Evaluation List
Target
Conf./Jour.
Year
Link
Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition
Centralized Training Decentralized Execution (CTDE).
Coach with full-view // players have partial-views
Coach can distribute information → agents with limited amount.
Most deep CTDE for cooperative MARL limited to a fixed number of homogeneous agents (C1).
Computationally prohibitive to re-train the agents (C2).
Agent can only access to its own decisions and partial environmental observations at test-time (C3).
All agents to communicate is too expensive in many scenarios (C4).
Generalize zero-shot to new compositions.
C3: Introducing communication
C4: Centralized coach → periodically distributes strategic information (full view).
Communication through continuous strategy vector.
Variational objective → regularize learning of the strategy.
Adaptive policy → coach only communicate if needed.
Strategy vector is encoded using VAE-based Encoder.
Resource Collection
Rescue Game
Starcraft Multi-agent Challenge (SMAC).
De-confounded Value Decomposition for Multi-Agent Reinforcement Learning
Centralized Training Decentralized Execution (CTDE).
- Credit Assignment: deduce contributions of individual agents from overall success.
On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning
Decentralized Training.
Sample-efficient Model-free Algorithm.

Exponential dependence as it usually needs to exhaustively search the joint action space.
Computation bottleneck can be solved by communications → distributing workload.
Communication-based → communication overheads.
Stage-based V-Learning for General-Sum Markov Games.
Learning CCE
Learning CE
Learning NE in Markov Potential Games.
There are no rows in this table

Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.