User:Jimmy Novik/Nonlinear Mapping Function

In order to map x to linear models, we need to perform non-linear mapping on the inputs.

Options

1. Choose a very generic φ, like an infinite-dimensional that is implicitly used by kernel machines based on the RBF kernel. (Not good for advanced problems)

2. Manually engineer φ.

3. Use deep learning to learn φ. y=f(x;θ, w) =φ(x;θ)Tw

Example: Learning (approximating) XOR function

Goal Function

\mathbb {X} =\{[0,0]^{\top },[0,1]^{\top },[1,0]^{\top },[1,1]^{\top }\}

Failed to parse (syntax error): {\displaystyle f(\mathbb{X})=\[0,1,1,0]^{\top}\}

If we choose a linear model with $\theta$ and $b$ , then the model is defined as $f(x;w,b)=x^{\top }w+b$ , which will output $w=0$ and $b={\frac {1}{2}}$ when solved. This isn't very useful to have a function that outputs a constant value over all of the inputs.

If we add a hidden layer, such that $h=f^{(1)}(x;W,c)$ and $y=f^{(2)}(h;w,b)$ with the complete model being $f(x;W,c,w,b)=f^{(2)}(f^{(1)}(x))$

There must be a non-linear function to be able to turn our linear model into a non-linear. This is usually done by affine transformation followed by a fixed nonlinear activation function.

$h=g(W^{\top }x+c)$ , where W provides the weight of a linear transformation and c the biases.

The default recommended activation function is rectified linear unit, or ReLU, defined as $g(z)=max\{0,z\}$

The complete network can now be specified as $f(x;W,c,w,b)=w^{\top }max\{0,W^{\top }x+c\}+b$