Optimal Control and Reinforcement Learning

(WIP)

This post tries to sketch the structure and main features of two related fields - Optimal Control and Reinforcement Learning.

Optimal Control is an older field that originated in the Calculus of Variations and classic Control Theory that solves the problem of minimizing an objective function while controlling an agent. The problem is fully defined by two objects:

  1. The system dynamics equation (or the system model): This equation models how the system, that is to be controlled, evolves. The standard notation for the discrete version of this problem is \(x_{t+1} = f(x_t, u_t)\)

  2. A cost function \(J = \sum_1^N L(x_t, u_t)\) to be minimized.

The most well-studied optimal control problem is the Linear Quadratic Regulator (LQR) which assumes a linear dynamics equation and a quadratic cost. It is all well and good if both these functions are known. However, things start getting complicated as we attack problems in which we know less and less about these functions, or the simplifying assumptions do not hold.

Reinforcement Learning is a loose term that identifies a set of techniques that can attack Optimal Control problems in which we know very little about the problem structure. In particular, RL assumes only that the state is Markov (which is not a very restrictive assumption) and that the agent receives a reward/cost upon executing an action. What makes RL algorithms powerful is the fact that neither the cost function, nor the system dynamics (or transition function) is required. It will sure make the problem easier if they were known, but then that will be closer to the domain of Optimal Control.

Comments

Comments powered by Disqus