Scratchpad

icon picker
DRAFT PAPER OUTLINE

SyncRite

Distributed Free Energy optimization in a dynamic environment with emergent multiagent communication via a protolanguage developed with and without ritualization

Abstract

Modified Multi Arm Bandit simulations (Multiple Agents, Epistemic and Restless) to analyze Active Inference policies, as proposed by Friston. We utilize an emergent protolanguage as proposed by Baroni sensitized by agent ritualization. We use the setup to explore evolutionary strategies for population/group adaptive fitness in a changing environment where symbolic communication facilitates social and intergeneration learning. Where possible, we relate findings to real-world findings using copy-trading and anthropological data from other cultures and experiments.

Simulation Setup

Populations of simulated Active Inference agents are evolved for fitness by an environment modeled as an rMAB to evaluate possible strategies and benchmark against Nash Equilibrium and other ESS predictions from prior work.
Environment: Two armed bandit, with time varying reward dynamics and the ability to offer hints
Agents: Active Inference agents with generative models capable of:
ask for private hints as symbolic responses,
receive private hints provided to other agents,
act on bandit,
sense private reward,
observe collective agent actions (not reward)
Agents have finite lifespans modulated by fitness that is proportional to aggregate reward and loss
Populations are collections of Agents with parameter settings described as A,B,C,D matrices using the pymdp Active Inference framework [3]
Environments have settings that influence reward context, accuracy of hints, time rate of change in rewards, etc.
Benchmark aggregate regret for populations over time Vs oracle performance to evaluate relative pros and cons of various policies and constraints in response to various environmental conditions

Real world data: e.g. Copy trading

Copy trading allows traders in social networks to receive information on the success of other agents in financial markets and to directly copy their trades. Internet platforms like eToro, , and Tradeo have attracted millions of users in recent years. The present paper studies the implications of copy trading for the risk taking of investors. Implementing a novel experimental financial asset market, we show that providing information on the success of others leads to a significant increase in risk taking of subjects. This increase in risk taking is even larger when subjects are provided with the option to directly copy others. We conclude that copy trading leads to excessive risk taking. [2]

Contributions

Active Inference simulations with multiple agents and symbolic communication
Relate Active Inference policy selection to insights from mathematical game theory
Quantify the effects of individual agent Vs multi-agent learning
Quantify the effects of symbolic communication on collective intelligence and social learning
Quantify the benefit of costs incurred in securing symbol-meaning fidelity across agents
Characterize optimal adaptive group behavior under various conditions of environmental change, e.g. Demonstrate the effects of inter-generational learning under various conditions of environmental change
Relate insights from simulation to the real-world phenomenon of copy-trading

Prior work

We study a simple model for social-learning agents in a restless multiarmed bandit (rMAB). The bandit has one good arm that changes to a bad one with a certain probability. Each agent stochastically selects one of the two methods, random search (individual learning) or copying information from other agents (social learning), using which he/she seeks the good arm. Fitness of an agent is the probability to know the good arm in the steady state of the agent system. In this model, we explicitly construct the unique Nash equilibrium state and show that the corresponding strategy for each agent is an evolutionarily stable strategy (ESS) in the sense of Thomas. It is shown that the fitness of an agent with ESS is superior to that of an asocial learner when the success probability of social learning is greater than a threshold determined from the probability of success of individual learning, the probability of change of state of the rMAB, and the number of agents. The ESS Nash equilibrium is a solution to Rogers’ paradox.[1]

Description of Experimental Scenarios

ENVIRONMENT: L/M/H Reward Entropy, L/M/H Noise in Hints, L/M/H Speed of Restlessness
AGENTS: Individual, Social without communication, Social with noisy communication, Social with secure communication
POPULATIONS: L/M/H Fitness Vs Reward curves, L/M/H Average lifespan Vs Environment Change

Discussion of Results

....

Conclusions

...

Future work and application areas

...

REFERENCES


Want to print your doc?
This is not the way.
Try clicking the ⋯ next to your doc name or using a keyboard shortcut (
CtrlP
) instead.