JavaScript required
We’re sorry, but Coda doesn’t work properly without JavaScript enabled.
Skip to content
Skydvn-PNU
DG
FDG
Federated Segmentation
Federated Unlearning - Adversarial Recovery
Decentralized MARL
VLM
LLM-ComSoc
Diffusion Models
Protein Foldings
Reasoning/Symbolic: A Survey
Decentralized MARL: A survey
Resource Allocation - SemCom: A survey
Untitled page
SubGraph-FL
WorldModel-DRL
More
Share
Explore
Decentralized MARL
Decentralized MARL
Decentralized MARL
Paper Name
Categories
Type
Concept
Recent issues
Motivations
Contributions
Evaluation List
Target
Conf./Jour.
Year
Link
Paper Name
Categories
Type
Concept
Recent issues
Motivations
Contributions
Evaluation List
Target
Conf./Jour.
Year
Link
Coach-Player Multi-Agent Reinforcement Learning for Dynamic Team Composition
CTDE
Dec-POMDP
Centralized Training Decentralized Execution (CTDE).
Coach with full-view // players have partial-views
Coach can distribute information → agents with limited amount.
Most deep CTDE for cooperative MARL limited to a fixed number of homogeneous agents (
C1
).
Computationally prohibitive to re-train the agents (
C2
).
Agent can only access to its own decisions and partial environmental observations at test-time (
C3
).
All agents to communicate is too
expensive
in many scenarios (
C4
).
Generalize zero-shot to new
compositions
.
C3
: Introducing communication
C4
: Centralized coach → periodically distributes strategic information (full view).
Communication through continuous
strategy vector
.
Variational objective →
regularize learning of the strategy
.
Adaptive policy → coach only communicate if needed.
Strategy vector is encoded using VAE-based Encoder.
Resource Collection
Rescue Game
Starcraft Multi-agent Challenge (SMAC).
ICML
De-confounded Value Decomposition for Multi-Agent Reinforcement Learning
CTDE
Dec-POMDP
Centralized Training Decentralized Execution (CTDE).
-
Credit Assignment
: deduce contributions of individual agents from overall success.
On Improving Model-Free Algorithms for Decentralized Multi-Agent Reinforcement Learning
Dec-POMDP
DTDE
Decentralized Training.
Sample-efficient Model-free Algorithm.
Exponential dependence as it usually needs to exhaustively search the joint action space.
Computation bottleneck can be solved by communications → distributing workload.
Communication-based → communication overheads.
Stage-based V-Learning for General-Sum Markov Games.
Learning CCE
Learning CE
Learning NE in Markov Potential Games.
There are no rows in this table
Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
Ctrl
P
) instead.