Optimal Control and Reinforcement Learning

Location: |
Room 6.020, Wegelerstr. 6 |

Time: |
Tuesday, 10:15 - 11:45 |

Thursday, 8:30 - 10:00 | |

Exercise: |
Thursday, 10:15-11:45 |

Office hours: |
on appointment, e-mail garckeins.uni-bonn.de |

- Theory and Numerics for Hamilton-Jacobi-Bellmann Equations
The first part of lecture concerns Semi-Lagrangian approximation schemes of first-order PDEs with a special focus on Hamilton-Jacobi equations, reviewing their construction and theory on model equations. The analysis of Hamilton–Jacobi equations will require the analytical tool of viscosity solutions, which we will introduce in the beginning. One of the most typical applications of the theory of HJ equations is in the field of optimal control problems and differential games. Via the Dynamic Programming Principle (DPP) many optimal control problems can be characterized by means of the associated value function, which can be shown in turn to be the unique viscosity solution of a PDE of convex HJ type, usually called the Bellman equation, the Dynamic Programming equation or the Hamilton-Jacobi-Bellmann Equation equation.

At the numerical level, the Semi-Lagrangian approximation mimics the method of characteristics looking for the foot of the characteristic curve passing through every node and following this curve for a single time step. In order to derive a numerical method from this general idea, several ingredients should be put together, mainly a technique for ODEs to track characteristics and a reconstruction technique to recover pointwise values of the numerical solution.

- Reinforcement Learning
In the reinforcement learning setting, we consider a system in interaction with some a priori (at least partially) unknown environment, which learns "from experience', i.e. the underlying first order PDE is not perfectly known, but its effects have to be approximated during learning. Reinforcement learning is in its basic form very general, it is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics, and genetic algorithms. We will address RL from the viewpoint of HJB equations, Semi-Lagrangian schemes and function approximation, if time allows including Deep Learning approaches.

The homework has to be handed in at the beginning of the following exercise. Not each student separately hands in his/her solutions, but you form groups of two students, which hand in one written report together. Each student needs to at least once present solutions in the exercise.

Nr: |
Link: |
Due |
Remarks and errata |

1 | sheet_1.pdf | 27.04.2017 | |

2 | sheet_2.pdf | 04.05.2017 | |

3 | sheet_3_2.pdf | 11.05.2017 | Exercise 1. Assume additionally the (uniform) continuity of the Hamiltonian H in the first and second variable |

4 | sheet_4.pdf | 18.05.2017 | |

5 | sheet_5.pdf | 30.05.2017 | To be handed in on Tuesday, 30.05.2017, after the lecture. |

6 | sheet_6_2.pdf | 13.06.2017 | Exercise 1 updated. Opportunity to reinsert it with Sheet 7. |

7 | sheet_7.pdf | 20.06.2017 | |

8 | sheet_8.pdf | 27.06.2017 | |

9 | sheet_9.pdf | 04.07.2017 | |

10 | sheet_10.pdf | 13.07.2017 | Exercise 2 updated. Assumption concering global extreme points added. |

11 | sheet_11.pdf | 20.07.2017 | Exercise 1 updated. |

- Falcone, M., & Ferretti, R. Semi – Lagrangian Approximation Schemes for Linear and Hamilton – Jacobi Equations, SIAM, 2014.
- Sutton, R., & Barto, A. Reinforcement Learning, MIT Press, 1998. Draft of the second edition.
- Bertsekas, D.P., Vol. II, 4th Edition: Approximate Dynamic Programming 2012.