SyncRite
Distributed Free Energy optimization in a dynamic environment with emergent multiagent communication via a protolanguage developed with and without ritualization
Abstract
Modified Multi Arm Bandit simulations (Multiple Agents, Epistemic and Restless) to analyze Active Inference policies, as proposed by Friston. We utilize an emergent protolanguage as proposed by Baroni sensitized by agent ritualization. We use the setup to explore evolutionary strategies for population/group adaptive fitness in a changing environment where symbolic communication facilitates social and intergeneration learning. Where possible, we relate findings to real-world findings using copy-trading and anthropological data from other cultures and experiments.
Simulation Setup
Populations of simulated Active Inference agents are evolved for fitness by an environment modeled as an rMAB to evaluate possible strategies and benchmark against Nash Equilibrium and other ESS predictions from prior work.
Environment: Two armed bandit, with time varying reward dynamics and the ability to offer hints Agents: Active Inference agents with generative models capable of: ask for private hints as symbolic responses, receive private hints provided to other agents, observe collective agent actions (not reward) Agents have finite lifespans modulated by fitness that is proportional to aggregate reward and loss Populations are collections of Agents with parameter settings described as A,B,C,D matrices using the pymdp Active Inference framework [3] Environments have settings that influence reward context, accuracy of hints, time rate of change in rewards, etc. Benchmark aggregate regret for populations over time Vs oracle performance to evaluate relative pros and cons of various policies and constraints in response to various environmental conditions Real world data: e.g. Copy trading
Copy trading allows traders in social networks to receive information on the success of other agents in financial markets and to directly copy their trades. Internet platforms like eToro, , and Tradeo have attracted millions of users in recent years. The present paper studies the implications of copy trading for the risk taking of investors. Implementing a novel experimental financial asset market, we show that providing information on the success of others leads to a significant increase in risk taking of subjects. This increase in risk taking is even larger when subjects are provided with the option to directly copy others. We conclude that copy trading leads to excessive risk taking. [2]
Contributions
Active Inference simulations with multiple agents and symbolic communication Relate Active Inference policy selection to insights from mathematical game theory Quantify the effects of individual agent Vs multi-agent learning Quantify the effects of symbolic communication on collective intelligence and social learning Quantify the benefit of costs incurred in securing symbol-meaning fidelity across agents Characterize optimal adaptive group behavior under various conditions of environmental change, e.g. Demonstrate the effects of inter-generational learning under various conditions of environmental change Relate insights from simulation to the real-world phenomenon of copy-trading Prior work
We study a simple model for social-learning agents in a restless multiarmed bandit (rMAB). The bandit has one good arm that changes to a bad one with a certain probability. Each agent stochastically selects one of the two methods, random search (individual learning) or copying information from other agents (social learning), using which he/she seeks the good arm. Fitness of an agent is the probability to know the good arm in the steady state of the agent system. In this model, we explicitly construct the unique Nash equilibrium state and show that the corresponding strategy for each agent is an evolutionarily stable strategy (ESS) in the sense of Thomas. It is shown that the fitness of an agent with ESS is superior to that of an asocial learner when the success probability of social learning is greater than a threshold determined from the probability of success of individual learning, the probability of change of state of the rMAB, and the number of agents. The ESS Nash equilibrium is a solution to Rogers’ paradox.[1]
Description of Experimental Scenarios
ENVIRONMENT: L/M/H Reward Entropy, L/M/H Noise in Hints, L/M/H Speed of Restlessness AGENTS: Individual, Social without communication, Social with noisy communication, Social with secure communication POPULATIONS: L/M/H Fitness Vs Reward curves, L/M/H Average lifespan Vs Environment Change Discussion of Results
....
Conclusions
...
Future work and application areas
...
REFERENCES