Code Files
The Network
Q-network returns the q-value just as by a q-function. It returns the q-value instead of the action to do. Another name of Q-network is DQN (Deep Q-Network) but it’s just Q-network, deep is of course in multiple layers.
Q-learning on Q-network
Based on the same q-value update formula as in q-table:
For each update:
Feed to the current q-network to get current q-value. Train the q-network to the new q-value. When to Train
Unlike q-value being updated after every action as in q-table. Q-network takes much much less RAM but the call to get output is slow compared to constant time to getting q-value from table, and the fit (training) is extremely slow compare to setting q-value in q-table.
There are 2 options of when to train the q-network:
• Train after an action
• Train after a run through the whole episode
Train after an action as in q-table shouldn't be a choice, it's very slow training, unless having enough hardware resources. Train after a whole single run is better, faster for training.