Lecture 1: Introduction
Thursday April 23, 12-14
Introduction to nonlinear programming: formal definition, basics, examples.
What is Nonlinear Programming?
Nonlinear programming is the numerical/computational minimization of differential functions on $\mathbb{R}^n$ with or without equality or inequality constraints.
Formal definition
Let $f:U\to \overline{\mathbb{R}}$, $U \subset \mathbb{R}^n$. Find $x_*\in U$ with $\forall x\in U: \, f(x_*) \le f(x)$. $f$ is called objective, cost function, or loss function, and $U$ the admissible set. $U$ can be defined in different ways. If $U=\mathbb{R}^n$, we call the problem unconstrained optimization, in case $U=\{x\in\mathbb{R}^n\mid c(x)=0 \}$ for some $c:\mathbb{R}^n \to \mathbb{R}^{m_c}$, we speak of equality constrained optimization, and if $U=\{x\in\mathbb{R}^n \mid g(x)\le 0 \}$, of inequality constrained optimization. In general, we formulate this as $$ \min f(x) \quad \text{subject to} \quad c(x) = 0, \; g(x)\le 0. $$
Examples
Toy example
A trivial one-dimensional example $$ \min -x \quad\text{ subject to } x\le 0$$
Portfolio optimization
Composition of a stock portfolio for minimal risk by given return. A simple market model:
- $n$ different stocks
- $r_i$ the relative return of stock $i$
- $x_i$ fraction of budget invested in stock $i$
total return: $r = \sum_{i=1}^n x_i r_i = x^T r$
Let $E(r_i)$ denote the expected return of stock $i$. Then the expected total return is $E(r) = x^T E(r)$. The risk (variance) is $V(r) = E((r-E(r))^2) = x^T C x$ with $C_{ij} = \mathrm{Cov}(r_i,r_j) = E\big((r_i-E(r_i))(r_j-E(r_j))\big)$ being the covariance.
Now minimize the risk: $\min x^T C x $
subject to given return: $x^T E(r) = R$
using the whole budget (and not more :): $\sum_{i=1}^n x_i = 1$
and maybe we don’t want to go short: $x_i \ge 0$
Image denoising
Removing noise from underexposed digital camera pictures. Let $f: [0,N]\cap \mathbb{Z} \to \mathbb{R}$ denote the image and $u$ its reconstruction. We aim at: $$ \min \sum_i (u_i-f_i)^2 + \alpha r(u) $$ The first term aims at data fidelity, the second penalizes noise in the image, e.g., as $$ r(u) = \sum_i (u_i-u_{i-1})^2 + (u_i-u_{i+1})^2 $$
Lecture videos
- Basics (12:30)
- Existence & uniqueness (24:30)
- Optimality conditions (18:15)
External resources
Michel Bierlaire on Optimization: https://www.youtube.com/playlist?list=PL10NOnsbP5Q5D1gxjA08NiSYByuhL2mOG