Homepage

Untitled sync page

Explore

Embedded System

Reinforcement Learning

Reinforcement Learning:

Test bench mode:

use camera connected to Laptop

Laptop can run OpenCV, and potentially even RL (for parameter selection)

Laptop communicates to Arduino using Serial Communication (can get Sensor values from Arduino as further input for RL)

Serial: use python library and arduino library for each side respectively

online mode

Lighter RL algorithms might work online (e.g. Thompson sampling)

otherwise a wireless connection & Laptop could help?

flight computer has computational power anyway

Cost function of efficiency & speed?

General:

continuous state space (all changeable parameters + other parameters (speed, environmental conditions...)

continuous action space (all changes of parameters)

continuous cost function

Some research based on this video:

https://www.youtube.com/watch?v=6qbW7Ki9NUc⁠

⁠

robot servos have continuous output space

there is

Parameters to tune: (Output)

heave height

frequency

pitch form (sine or something else?)??

maximum pitch

maximum camber

phase (pitch vs camber)

phase (heave vs pitch)

+ independent parameters for turning

Rewards (Input), choose 1!

Thrust (test stand)

Waterspeed (outside)

Efficiency (thrust / consumption)

Algorithm Selection:

Policy Gradient Methods: Algorithms like Proximal Policy Optimization (PPO) or Trust Region Policy Optimization (TRPO) can handle continuous action spaces and will iteratively refine policies.

Actor-Critic Algorithms: Soft Actor-Critic (SAC) combines the benefits of policy gradients and Q-learning.

Evolutionary Strategies: If only small parameter adjustments are expected, these can explore efficiently using known initial estimates. - unfeasible!!, too slow

In summary, if computational resources and accurate simulation data are available, SAC provides high-quality results efficiently. PPO is a solid balance between implementation simplicity and quality. Evolutionary strategies are simpler but typically slower and less precise.

Overall, TD3 is an excellent choice if your problem benefits from reduced Q-value overestimation and you have sufficient computational resources. It balances complexity and performance well in continuous control tasks.

Dheers recommendations:

Stellgrößen in U-Vektor (begrenzen, je nachdem was System kann), Statespace nutzen, Störgrößen

Vielleicht macht ein Ricatti Regler Sinn? Sonst einfach linear quadratic control?

Abstiegsverfahren (Optimierung) reicht auch → kein RL notwendig.

wichtig: alle Größen normalisieren, sodass sie vergleichbar sind!

Optimization algorithms:

use a probabilistic method, because measurements might have errors (to avoid stuckness in local minima):

https://www.youtube.com/watch?v=M-NTkxfd7-8⁠

⁠

Bayesian optimization with GP_minimize:

n_calls =total calls

incl. x0 and n_initial_points (basically the same, x0 is user-selected, n_i_p is random)

provide the reward =func (params)

dimensions: lower bound, upper bound, can include prior

initial point generator (set to close to starting point!)

list of input point (vector)

verbosity (TRUE recommended for long runs) (data output on every step)

This link can't be embedded.

Want to print your doc?
This is not the way.

Try clicking the ⋯ next to your doc name or using a keyboard shortcut (

CtrlP

) instead.