Lecture SS 18 Numerical Simulation

Optimal Control and Reinforcement Learning

Lecturer: Prof. Jochen Garcke
Contact for exercises: Biagio Paparella
Location: Room 6.020, Wegelerstr. 6
Time: Tuesday, 10:15 - 11:45; Thursday, 8:30 - 10:00
Exercise: Wednesday, 14:15 - 15:45
Location for the exercise: Room 5.002, Wegelerstr. 6
Office hours: On appointment

Date and time of the oral exams (week of 30th of July) are now scheduled.
They are in my office, We. 6, Room 6.003
The second exam period will be in the last week of August (preferably) or first week of September (during that week the INS is moving, so unlikely).

Content of the lecture

Theory and Numerics for Hamilton-Jacobi-Bellmann Equations

The first part of lecture concerns Semi-Lagrangian approximation schemes of first-order PDEs with a special focus on Hamilton-Jacobi equations, reviewing their construction and theory on model equations. The analysis of Hamilton–Jacobi equations will require the analytical tool of viscosity solutions, which we will introduce in the beginning. One of the most typical applications of the theory of HJ equations is in the field of optimal control problems and differential games. Via the Dynamic Programming Principle (DPP) many optimal control problems can be characterized by means of the associated value function, which can be shown in turn to be the unique viscosity solution of a PDE of convex HJ type, usually called the Bellman equation, the Dynamic Programming equation or the Hamilton-Jacobi-Bellmann Equation equation.

At the numerical level, the Semi-Lagrangian approximation mimics the method of characteristics looking for the foot of the characteristic curve passing through every node and following this curve for a single time step. In order to derive a numerical method from this general idea, several ingredients should be put together, mainly a technique for ODEs to track characteristics and a reconstruction technique to recover pointwise values of the numerical solution.

Reinforcement Learning

In the reinforcement learning setting, we consider a system in interaction with some a priori (at least partially) unknown environment, which learns “from experience’, i.e. the underlying first order PDE is not perfectly known, but its effects have to be approximated during learning. Reinforcement learning is in its basic form very general, it is studied in many other disciplines, such as game theory, control theory, operations research, information theory, simulation-based optimization, multi-agent systems, swarm intelligence, statistics, and genetic algorithms. We will address RL from the viewpoint of HJB equations, Semi-Lagrangian schemes and function approximation, if time allows including Deep Learning approaches.

Prerequisites

The content of the two lectures on Numerische Mathematik from the second year of the bachelor studies are expected. In particular knowledge of (nonlinear) optimization and numerical methods for ODEs is recommended, the (German) lecture notes from the course in 2015 are available on request. Furthermore, (Lagrange) interpolation is expected, function discretization by finite elementes is helpful, although for HJB-equations one cannot use the mathematical ideas from the field of numerical solution of PDEs (e.g. Sobolev spaces or Galerkin methods do not play a role here). Parts of the prerequisites will be freshened up in the exercises, which must be solved in groups of at most 2 people. In the second half we might do some numerical exercises / experiments for reinforcement learning using existing python-based frameworks.

Exams

The oral exams will be in the week of the 30th of July. To collect at least 50% of the exercise points is required.

Selected Literature:

Falcone, M., & Ferretti, R. Semi – Lagrangian Approximation Schemes for Linear and Hamilton – Jacobi Equations, SIAM, 2014.
Sutton, R., & Barto, A. Reinforcement Learning, MIT Press, 1998. Draft of the second edition.
Bertsekas, D. Dynamic Programming and Optimal Control Vol. II, Approximate Dynamic Programming, 4th Edition, Athena Scientific, 2012.