Mathematical Description

So that is the most basic description of how the brain works... But what we want to do is mimic this in a machine?

How do we mimic the brain in a machine?

Well... The language of computers is MATHEMATICS!

So the question becomes how do we mimic/describe the behavior of neurons mathematically?

Basic Mathematical Framework:

To describe a neuron mathematically we use the basic building block of a percepton:

If we imagine the red dot (in the above picture) produces a number $x$ and the blue dot produces a number $y$ (we can think of this as a numerical value to represent the chemical output of a neuron). Then to combine the red and blue dots we use the wieghts over each arrow ($a$ for the red and $b$ for the blue), i.e. green will recieve the input $ax+by$. For the trained student of mathematics they will immediately notice this as a dot product or a linear transformation (i.e. a matrix) from linear algebra

\[\begin{bmatrix} a & b \end{bmatrix}\begin{bmatrix} x\\ y\end{bmatrix}=ax+by \]

So if we have a lot of 'neurons' sending weights to many 'neurons'...

can be described mathematically as:

\begin{equation} \begin{bmatrix} a_{11} & a_{21} &\dots& a_{n1}\\ a_{12} &a_{22}&\dots &a_{n2}\\ \vdots &\vdots &\vdots &\vdots\\ a_{1n}&a_{2n}&\dots&a_{nn} \end{bmatrix}\begin{bmatrix} x_1\\x_2\\\vdots\\x_n\end{bmatrix}=\begin{bmatrix}\sum_{1=1}^n a_{i1}x_i\\\sum_{i=1}^na_{i2}x_i\\\vdots\\\sum_{i=1}^na_{in}x_i \end{bmatrix} \end{equation}

Now putting these two pictures together gives us take-away 1 from above, now to handle take-away 2. That is we need a probalisitc nature to these weights, how this most differs from what has been implimented thus far is that it needs an element of non-linearity. Most often this non-linearity is introduced with a sigmoidal function, which we will denote simply as $\sigma(\bar{x})$. To append our above equation we now get:

\begin{equation} \sigma\left(\begin{bmatrix} a_{11} & a_{21} &\dots& a_{n1}\\ a_{12} &a_{22}&\dots &a_{n2}\\ \vdots &\vdots &\vdots &\vdots\\ a_{1n}&a_{2n}&\dots&a_{nn} \end{bmatrix}\begin{bmatrix} x_1\\x_2\\\vdots\\x_n\end{bmatrix}\right)=\sigma\left(\begin{bmatrix}\sum_{1=1}^n a_{i1}x_i\\\sum_{i=1}^na_{i2}x_i\\\vdots\\\sum_{i=1}^na_{in}x_i \end{bmatrix} \right)=\begin{bmatrix}\sigma\left(\sum_{1=1}^n a_{i1}x_i\right)\\\sigma\left(\sum_{i=1}^na_{i2}x_i\right)\\\vdots\\\sigma\left(\sum_{i=1}^na_{in}x_i\right) \end{bmatrix}\end{equation}

Now we will give the picture of the whole experience and introduce some common terminology...

The hidden layers are the analogous to the many neurons in the brain, the sigmoidal activation will be discussed further later on. Now before we go any further I'd like to talk more about the input and the output.

Decision Problems:

Now that we have seen how we mimic the brain mathematically, let's look at what we want our machine to do. The most common application of neural networks is decision problems.

At it's most basic a decision problem is: given an input provide the answer yes or no.

So lets break down this situation. First what is the input?

Usually the in input is a set $S$:

\[S\subseteq\mathbb{R}^n\]

for some $n\in\mathbb{N}$ (i.e. $n$ is a number). For the non-native math speakers reading this, the above symbols mean that our input, $S$, is some collection (i.e. a set) of order tuples $(a_1,a_2,a_3,...,a_n)$ where each $a_i$ is a real number. Now for a decision problem whose answer is yes or no, our output is:

\[\{0,1\}\subset\mathbb{R}\]

With 0 for no and 1 for yes.