.. image:: https://travis-ci.org/anassinator/ilqr.svg?branch=master :target: https://travis-ci.org/anassinator/ilqr

This is an implementation of the Iterative Linear Quadratic Regulator (iLQR)
for non-linear trajectory optimization based on Yuval Tassa's
`paper <https://homes.cs.washington.edu/~todorov/papers/TassaIROS12.pdf>`

_.

It is compatible with both Python 2 and 3 and has built-in support for
auto-differentiating both the dynamics model and the cost function using
`Theano <http://deeplearning.net/software/theano/>`

_.

To install, clone and run:

.. code-block:: bash

python setup.py install

You may also install the dependencies with `pipenv`

as follows:

.. code-block:: bash

pipenv install

After installing, :code:`import`

as follows:

.. code-block:: python

from ilqr import iLQR

You can see the `examples <examples/>`

_ directory for
`Jupyter <https://jupyter.org>`

_ notebooks to see how common control problems
can be solved through iLQR.

Dynamics model ^^^^^^^^^^^^^^

You can set up your own dynamics model by either extending the :code:`Dynamics`

class and hard-coding it and its partial derivatives. Alternatively, you can
write it up as a `Theano`

expression and use the :code:`AutoDiffDynamics`

class
for it to be auto-differentiated. Finally, if all you have is a function, you
can use the :code:`FiniteDiffDynamics`

class to approximate the derivatives
with finite difference approximation.

This section demonstrates how to implement the following dynamics model:

.. math::

m \dot{v} = F - \alpha v

where :math:`m`

is the object's mass in :math:`kg`

, :math:`alpha`

is the
friction coefficient, :math:`v`

is the object's velocity in :math:`m/s`

,
:math:`\dot{v}`

is the object's acceleration in :math:`m/s^2`

, and :math:`F`

is
the control (or force) you're applying to the object in :math:`N`

.

Automatic differentiation """""""""""""""""""""""""

.. code-block:: python

import theano.tensor as T from ilqr.dynamics import AutoDiffDynamics

x = T.dscalar("x") # Position. x_dot = T.dscalar("x_dot") # Velocity. F = T.dscalar("F") # Force.

dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient.

x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m

f = T.stack([ x + x_dot * dt, x_dot + x_dot_dot * dt, ])

x_inputs = [x, x_dot] # State vector. u_inputs = [F] # Control vector.

dynamics = AutoDiffDynamics(f, x_inputs, u_inputs)

*Note*: If you want to be able to use the Hessians (:code:`f_xx`

, :code:`f_ux`

,
and :code:`f_uu`

), you need to pass the :code:`hessians=True`

argument to the
constructor. This will increase compilation time. Note that :code:`iLQR`

does
not require second-order derivatives to function.

Batch automatic differentiation """""""""""""""""""""""""""""""

.. code-block:: python

import theano.tensor as T from ilqr.dynamics import BatchAutoDiffDynamics

state_size = 2 # [position, velocity] action_size = 1 # [force]

dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient.

def f(x, u, i): """Batched implementation of the dynamics model.

```
Args:
x: State vector [*, state_size].
u: Control vector [*, action_size].
i: Current time step [*, 1].
Returns:
Next state vector [*, state_size].
"""
x_ = x[..., 0]
x_dot = x[..., 1]
F = u[..., 0]
# Acceleration.
x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m
# Discrete dynamics model definition.
return T.stack([
x_ + x_dot * dt,
x_dot + x_dot_dot * dt,
]).T
```

dynamics = BatchAutoDiffDynamics(f, state_size, action_size)

*Note*: This is a faster version of :code:`AutoDiffDynamics`

that doesn't
support Hessians.

Finite difference approximation """""""""""""""""""""""""""""""

.. code-block:: python

from ilqr.dynamics import FiniteDiffDynamics

state_size = 2 # [position, velocity] action_size = 1 # [force]

dt = 0.01 # Discrete time-step in seconds. m = 1.0 # Mass in kg. alpha = 0.1 # Friction coefficient.

def f(x, u, i): """Dynamics model function.

```
Args:
x: State vector [state_size].
u: Control vector [action_size].
i: Current time step.
Returns:
Next state vector [state_size].
"""
[x, x_dot] = x
[F] = u
# Acceleration.
x_dot_dot = x_dot * (1 - alpha * dt / m) + F * dt / m
return np.array([
x + x_dot * dt,
x_dot + x_dot_dot * dt,
])
```

dynamics = FiniteDiffDynamics(f, state_size, action_size)

*Note*: It is possible you might need to play with the epsilon values
(:code:`x_eps`

and :code:`u_eps`

) used when computing the approximation if you
run into numerical instability issues.

Usage """""

Regardless of the method used for constructing your dynamics model, you can use them as follows:

.. code-block:: python

curr_x = np.array([1.0, 2.0]) curr_u = np.array([0.0]) i = 0 # This dynamics model is not time-varying, so this doesn't matter.

dynamics.f(curr_x, curr_u, i) ... array([ 1.02 , 2.01998]) dynamics.f_x(curr_x, curr_u, i) ... array([[ 1. , 0.01 ], [ 0. , 1.00999]]) dynamics.f_u(curr_x, curr_u, i) ... array([[ 0. ], [ 0.0001]])

Comparing the output of the :code:`AutoDiffDynamics`

and the
:code:`FiniteDiffDynamics`

models should generally yield consistent results,
but the auto-differentiated method will always be more accurate. Generally, the
finite difference approximation will be faster unless you're also computing the
Hessians: in which case, Theano's compiled derivatives are more optimized.

Cost function ^^^^^^^^^^^^^

Similarly, you can set up your own cost function by either extending the
:code:`Cost`

class and hard-coding it and its partial derivatives.
Alternatively, you can write it up as a `Theano`

expression and use the
:code:`AutoDiffCost`

class for it to be auto-differentiated. Finally, if all
you have are a loss functions, you can use the :code:`FiniteDiffCost`

class to
approximate the derivatives with finite difference approximation.

The most common cost function is the quadratic format used by Linear Quadratic Regulators:

.. math::

(x - x_{goal})^T Q (x - x_{goal}) + (u - u_{goal})^T R (u - u_{goal})

where :math:`Q`

and :math:`R`

are matrices defining your quadratic state error
and quadratic control errors and :math:`x_{goal}`

is your target state. For
convenience, an implementation of this cost function is made available as the
:code:`QRCost`

class.

:code:`QRCost`

class
""""""""""""""""""""

.. code-block:: python

import numpy as np from ilqr.cost import QRCost

state_size = 2 # [position, velocity] action_size = 1 # [force]

Q = 100 * np.eye(state_size) R = 0.01 * np.eye(action_size)

Q_terminal = np.array([[100.0, 0.0], [0.0, 0.1]])

x_goal = np.array([1.0, 0.0])

cost = QRCost(Q, R, Q_terminal=Q_terminal, x_goal=x_goal)

Automatic differentiation """""""""""""""""""""""""

.. code-block:: python

import theano.tensor as T from ilqr.cost import AutoDiffCost

x_inputs = [T.dscalar("x"), T.dscalar("x_dot")] u_inputs = [T.dscalar("F")]

x = T.stack(x_inputs) u = T.stack(u_inputs)

x_diff = x - x_goal l = x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u) l_terminal = x_diff.T.dot(Q_terminal).dot(x_diff)

cost = AutoDiffCost(l, l_terminal, x_inputs, u_inputs)

Batch automatic differentiation """""""""""""""""""""""""""""""

.. code-block:: python

import theano.tensor as T from ilqr.cost import BatchAutoDiffCost

def cost_function(x, u, i, terminal): """Batched implementation of the quadratic cost function.

```
Args:
x: State vector [*, state_size].
u: Control vector [*, action_size].
i: Current time step [*, 1].
terminal: Whether to compute the terminal cost.
Returns:
Instantaneous cost [*].
"""
Q_ = Q_terminal if terminal else Q
l = x.dot(Q_).dot(x.T)
if l.ndim == 2:
l = T.diag(l)
if not terminal:
l_u = u.dot(R).dot(u.T)
if l_u.ndim == 2:
l_u = T.diag(l_u)
l += l_u
return l
```

cost = BatchAutoDiffCost(cost_function, state_size, action_size)

Finite difference approximation """""""""""""""""""""""""""""""

.. code-block:: python

from ilqr.cost import FiniteDiffCost

def l(x, u, i): """Instantaneous cost function.

```
Args:
x: State vector [state_size].
u: Control vector [action_size].
i: Current time step.
Returns:
Instantaneous cost [scalar].
"""
x_diff = x - x_goal
return x_diff.T.dot(Q).dot(x_diff) + u.T.dot(R).dot(u)
```

def l_terminal(x, i): """Terminal cost function.

```
Args:
x: State vector [state_size].
i: Current time step.
Returns:
Terminal cost [scalar].
"""
x_diff = x - x_goal
return x_diff.T.dot(Q_terminal).dot(x_diff)
```

cost = FiniteDiffCost(l, l_terminal, state_size, action_size)

*Note*: It is possible you might need to play with the epsilon values
(:code:`x_eps`

and :code:`u_eps`

) used when computing the approximation if you
run into numerical instability issues.

Usage """""

Regardless of the method used for constructing your cost function, you can use them as follows:

.. code-block:: python

cost.l(curr_x, curr_u, i) ... 400.0 cost.l_x(curr_x, curr_u, i) ... array([ 0., 400.]) cost.l_u(curr_x, curr_u, i) ... array([ 0.]) cost.l_xx(curr_x, curr_u, i) ... array([[ 200., 0.], [ 0., 200.]]) cost.l_ux(curr_x, curr_u, i) ... array([[ 0., 0.]]) cost.l_uu(curr_x, curr_u, i) ... array([[ 0.02]])

Putting it all together ^^^^^^^^^^^^^^^^^^^^^^^

.. code-block:: python

N = 1000 # Number of time-steps in trajectory. x0 = np.array([0.0, -0.1]) # Initial state. us_init = np.random.uniform(-1, 1, (N, 1)) # Random initial action path.

ilqr = iLQR(dynamics, cost, N) xs, us = ilqr.fit(x0, us_init)

:code:`xs`

and :code:`us`

now hold the optimal state and control trajectory
that reaches the desired goal state with minimum cost.

Finally, a :code:`RecedingHorizonController`

is also bundled with this package
to use the :code:`iLQR`

controller in Model Predictive Control.

Important notes ^^^^^^^^^^^^^^^

To quote from Tassa's paper: "Two important parameters which have a direct
impact on performance are the simulation time-step :code:`dt`

and the horizon
length :code:`N`

. Since speed is of the essence, the goal is to choose those
values which minimize the number of steps in the trajectory, i.e. the largest
possible time-step and the shortest possible horizon. The size of :code:`dt`

is limited by our use of Euler integration; beyond some value the simulation
becomes unstable. The minimum length of the horizon :code:`N`

is a
problem-dependent quantity which must be found by trial-and-error."

Contributions are welcome. Simply open an issue or pull request on the matter.

We use `YAPF <https://github.com/google/yapf>`

_ for all Python formatting
needs. You can auto-format your changes with the following command:

.. code-block:: bash

yapf --recursive --in-place --parallel .

You may install the linter as follows:

.. code-block:: bash

pipenv install --dev

See `LICENSE <LICENSE>`

_.

This implementation was partially based on Yuval Tassa's :code:`MATLAB`

`implementation <https://www.mathworks.com/matlabcentral/fileexchange/52069>`

*,
and navigator8972 <https://github.com/navigator8972>*'s

`implementation <https://github.com/navigator8972/pylqr>`

_.Get A Weekly Email With Trending Projects For These Topics

No Spam. Unsubscribe easily at any time.

python (50,919)

theano (75)

mpc (19)

control-systems (18)

trajectory-optimization (16)