Reinforcement Learning:
use camera connected to Laptop Laptop can run OpenCV, and potentially even RL (for parameter selection) Laptop communicates to Arduino using Serial Communication (can get Sensor values from Arduino as further input for RL) Serial: use python library and arduino library for each side respectively Lighter RL algorithms might work online (e.g. Thompson sampling) otherwise a wireless connection & Laptop could help? flight computer has computational power anyway Cost function of efficiency & speed?
General:
continuous state space (all changeable parameters + other parameters (speed, environmental conditions...) continuous action space (all changes of parameters)
Some research based on this video: robot servos have continuous output space
Parameters to tune: (Output) pitch form (sine or something else?)?? + independent parameters for turning
Rewards (Input), choose 1! Efficiency (thrust / consumption)
Policy Gradient Methods: Algorithms like Proximal Policy Optimization (PPO) or Trust Region Policy Optimization (TRPO) can handle continuous action spaces and will iteratively refine policies. Actor-Critic Algorithms: Soft Actor-Critic (SAC) combines the benefits of policy gradients and Q-learning. Evolutionary Strategies: If only small parameter adjustments are expected, these can explore efficiently using known initial estimates. - unfeasible!!, too slow In summary, if computational resources and accurate simulation data are available, SAC provides high-quality results efficiently. PPO is a solid balance between implementation simplicity and quality. Evolutionary strategies are simpler but typically slower and less precise.
Overall, TD3 is an excellent choice if your problem benefits from reduced Q-value overestimation and you have sufficient computational resources. It balances complexity and performance well in continuous control tasks.
Dheers recommendations:
Stellgrößen in U-Vektor (begrenzen, je nachdem was System kann), Statespace nutzen, Störgrößen Vielleicht macht ein Ricatti Regler Sinn? Sonst einfach linear quadratic control? Abstiegsverfahren (Optimierung) reicht auch → kein RL notwendig. wichtig: alle Größen normalisieren, sodass sie vergleichbar sind!
Optimization algorithms:
use a probabilistic method, because measurements might have errors (to avoid stuckness in local minima): Bayesian optimization with GP_minimize: incl. x0 and n_initial_points (basically the same, x0 is user-selected, n_i_p is random) provide the reward =func (params) dimensions: lower bound, upper bound, can include prior initial point generator (set to close to starting point!) list of input point (vector) verbosity (TRUE recommended for long runs) (data output on every step)