Solve exercise 1 in the lecture notes.
In order to apply logistic regression we need to know how to optimize functions - in our case the logistic regression loss (3.11) in the lecture notes. If you already have experience in optimization you may not need the following two assignments.
a) Calculate the gradients of the following functions
$$f(x, y) = \frac{1}{x^2+y^2}$$and $$f(x, y) = x^2y.$$
b) A standard way to computationally find a minimum is gradient descent.
Start at some (possibly random) point $ \overrightarrow{p}=(x,y)^T $ and move downwards, i.e. in negative gradient direction. The stepsize $\lambda$ should be controlled or small enough. When a Loss function is optimized in Machine Learning context $\lambda$ is also called the Learning Rate.
The update equation
$$ \overrightarrow{p_{i+1}}= \overrightarrow{p_{i}} - \lambda \cdot \nabla f(x,y)$$is then iterated until the norm of the gradient is below some threshold.
Write down the update equations for the two functions in a)!
a) For $f(x, y) = \frac{1}{x^2+y^2}$ follows
$$\nabla f(x, y) = \begin{pmatrix}-\frac{2x}{(x^2+y^2)^2} \\ -\frac{2y}{(x^2+y^2)^2}\end{pmatrix}$$.
$f(x, y) = x^2y$ results in
$$\nabla f(x,y) = \begin{pmatrix}2xy \\ x^2\end{pmatrix}$$b) The Update-Equations for $f(x, y) = \frac{1}{x^2+y^2}$ are $$x \leftarrow x + \alpha \frac{2x}{(x^2+y^2)^2}$$ $$x \leftarrow y + \alpha \frac{2y}{(x^2+y^2)^2}$$
The Update-Equations for $f(x, y) = x^2y$ are
$$x \leftarrow x - 2\alpha xy$$$$y \leftarrow y - \alpha x^2$$For this task we use the double well potential
$$V(x) = ax^4 + bx^2 + cx + d$$with $a = 1$, $b = -3$, $c =1$ and $d = 3.514$.
We seek to find the global minimum $x_{min}$ of this function with gradient descent. (In 1D the gradient is just the derivative.)
a) Calculate the derivative of $V(x)$ and the update equation for $x$ with learning rate $\lambda$.
b) Complete the code below.
c) Test the different starting points and $\lambda$:
$$(x_0, \lambda) = (-1.75, 0.001)$$$$(x_0, \lambda) = (-1.75, 0.19) $$$$(x_0, \lambda) = (-1.75, 0.1) $$$$(x_0, \lambda) = (-1.75, 0.205)$$d) How to actually find a compromize between $(x_0, \lambda) = (-1.75, 0.001)$ and $(x_0, \lambda) = (-1.75, 0.19)$ ?
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
def update2(x, a, b, c, d, lam):
x = x - lam*(4*a*x**3 + 2*b*x + c)
return x
def V(x, a, b, c, d):
return a*x**4 + b*x**2 + c*x + d
# TODO: Change to right parameters.
a = 1
b = -3
c = 1
d = 3.514
x0 = -1.75
iterations = 101
lams = np.array([0.001, 0.19, 0.1, 0.205])
losses = np.empty(shape=(iterations, len(lams)))
results = np.empty(len(lams))
for j in range(len(lams)):
x = x0
lam = lams[j]
for i in range(iterations):
losses[i, j] = V(x, a, b, c, d)
if i != iterations - 1:
x = update2(x, a, b, c, d, lam)
results[j] = x
for j in range(len(lams)):
print(100*"-")
print("lambdas: ", lams[j])
print("xmin: ", results[j])
print("Loss: ", V(results[j], a, b, c, d))
colors = {
0.001: "blue",
0.19: "red",
0.1: "black",
0.205: "orange"
}
plt.figure(figsize=(8, 8))
plt.title("Lernkurven")
plt.xlabel("Epoche")
plt.ylabel("Loss V")
plt.xlim(0, iterations)
for i in range(len(lams)):
lam = lams[i]
plt.plot(range(iterations), losses[:, i], label=str(lam), color=colors[lam])
plt.legend()
plt.ylim(bottom=0)
plt.show()
plt.figure(figsize=(8, 8))
plt.title("Funktion V und Minima")
plt.xlabel("x")
plt.ylabel("V(x)")
xs = np.linspace(-2, 2, 100)
ys = V(xs, a, b, c, d)
plt.plot(xs, ys)
for j in range(len(lams)):
lam = lams[j]
xmin = results[j]
vxmin = V(xmin, a, b, c, d)
plt.plot(xmin, vxmin, marker='.', linestyle="None", label=str(lam), color=colors[lam], ms=10)
plt.legend()
plt.show()
a) The derivative is
$$\partial_x V(x) = 4ax^3 + 2bx + c$$.
The update equation thus is
$$x \leftarrow x - \lambda \left(4ax^3 + 2bx + c\right)$$c)
$(x_0, \lambda) = (-1.75, 0.001)$: left (global) minimum is found very slowly ($\lambda$ too small).
$(x_0, \lambda) = (-1.75, 0.19)$: No minimum is found. Parameter $x$ jumps around in the left valley ($\lambda$ too big).
$(x_0, \lambda) = (-1.75, 0.1)$: left minimum is found.
$(x_0, \lambda) = (-1.75, 0.205)$: Jumps over left min - local right minimum is found.
d) Adjust $\lambda$. Start with larger $\lambda$ and reduce e.g. every $n$ steps by some factor.
$\lambda \leftarrow f\cdot\lambda$ every $n$ epochs.
Even better: monitor the reduction of the loss - reduce $\lambda$ when necesarry and increase when possible.
Consider two 1D Normal Distributions with $\sigma^2=1$ located at $\mu_1=0.0$ and $\mu_2=2.0$. Sample N values from each of these distributions and assign class label "0" and "1" to the values ("0" for the values coming from the normal distribution at "0"). Let this be your labeled data. Learn a logistic regression model with these data. Choose N=5 and N=100.
At which location is the 50% decision for your class label beeing "0" (and "1")?
Hints:
Run and understand the example "MNIST classification using multinomial logistic regression" from scikit-learn.