Month: January 2018

Neural Networks – Intuition – The Perceptron

In this post we will focus on how the perecptron works. We are trying to get a discrete answer from a large amounts of inputs. The percepetron takes this large amount of inputs,weighs them and then adds them together (a dot product). It then takes this value and forces it to be a discrete “yes” or “no” or positve and negative. This is done through by placing the value through the activation function. The original weights are then updated to try and obtain a more accurate prediction the next time around.

The basic way this is done is as follows, and will be discussed more fully when we discuss the math behind the perceptron. The components of the updating function are the learning rate, the predicted class label, the correct class label and the inputs. We take the difference between the predicted class label and the actual class label, modify it by how quickly we want the weights to change (the learning rate) and then modify the inputs using it. We then have updated weights for the next iteration.

This will then continue until the algorithm classifies all examples correctly. Obviously this can only occur if the data set is linearly separable. If it’s not, we can just have it run for a predetermined number of iterations (epochs) or have some threshold for the maximum amount of tolerated misclassifications.

We will build on this concept in future posts, but that is the idea of how a basic artificial neuron works. Here’s a diagram (from Sebastian Raschka’s excellent Python Machine Learning pg. 24) that puts everything together.

We can see that the inputs are combined with the weights, are then passed through an activation function and then used to update its own weights.

Neural Networks – Intuition – Part 1

In the next series of posts we will try and give over a real intuition for how neural networks work. Although, the math behind neural networks can be very complex, fundamentally, they are relatively simple. It’s pretty amazing to think about how such a simple idea can be at the heart of many our greatest advances in machine learning and how ideas thought of decades ago are also cutting edge.

The basic idea behind neural networks is that their core components mimic neurons in our brain on a very simplistic level and when we put enough of these neurons together we will get something resembling some level of intelligent behavior. If that sounds a little “hand-wavey” that’s because neural networks are notorious black boxes and can be very difficult to troubleshoot and understand. We can map out the algorithm which generates the final results in a general sense but actually understanding the flow of the algorithm with real data is very difficult. This is true of many machine learning algorithms but especially with neural networks. Things also become more complex when networks themselves are combined. This can ultimately lead to deep learning which is a very popular topic these days.

A neuron can be viewed as a type of circuit which has inputs and either fires or doesn’t depending on if a certain threshold is reached. Granted that this is a very simplified model, and may not even be biologically true, but this model is capable of generating very accurate predictions and classifications. The first neuron model that we will discuss is the perceptron.

In this model, we are trying to learn the optimal way to multiply the inputs (weight coefficients) in order to get the most accurate output and this model proposes a way to do this automatically. We take the inputs, multiply them by the weights and then feed it into some other function (out activation function) which will “activate” if it’s past a certain threshold or won’t activate otherwise. We will discuss this in more detail next week.

Cross Validation – Applications in Python

In the previous posts we discussed the theoretical background and intuition of cross validation. In these posts we will give an example of where cross validation is used. Almost any type of machine learning problem will use some type of cross validation as it is needed to avoid overfitting and underfitting.

Since cross validation is so generic, in the tutorials for svm, we also went into some details of cross validation. We will repeat some of the material here with a different emphasis, but the code will be from the above post and will not run independently of it. Again, this is from Rohit Shankar and I have no rights to this code.

It uses 10-fold cross validation which is a pretty standard number of folds. Thankfully Python has libraries which are very useful for doing cross validation and the function that is used here is the cross_val_score function. After filling out the parameters it gives us the final cross validation score.

Here’s the important part.

#Applying 10 fold cross validation

accuracies = cross_val_score(estimator = classifier, X=X_train, y = y_train.ravel(), cv = 10)

mean_accuracy= accuracies.mean()*100

std_accuracy= accuracies.std()*100

print(“The mean accuracy in %: “;, accuracies.mean()*100)

print(“The standard deviation in % “, accuracies.std()*100)

print(“The accuracy of our model in % is betweeen {} and {}”.format(mean_accuracy-std_accuracy, mean_accuracy+std_accuracy))

This gives us the accuracy using cross validation instead of a single value. As we can see we can take the mean value or the standard deviation of all of the cross validated values which will be much more valuable than a single observation. This concludes our discussion of how cross validation works.

Cross Validation – Mathematical Intro

In the past couple of weeks we gave a higher level of understanding of what cross validation is accomplishing. In the coming weeks we will discuss these details at a lower level and get more into the details of how it works and what it’s accomplishing.

The validation set is pretty straightforward, it just takes the error of what the test set is determined to be. The leave one out cross validation scheme is a little more involved though. In this one, a single observation $(x_i, y_i)$ is used to determine the error of the sets and the rest are used for training the model. We repeat this process to get multiple error values and we will then take the average of these for our final test error. We can express this as follows:

$CV_{(n)}=\frac{1}{n}\sum^n_{i=1}MSE_i$

Recall, that one of the drawbacks of leave one out cross validation is the computational expense. If we are using least squares or polynomial regression we can use a shortcut to compute the error using leave one out cross validation. The formula is as follows:

$CV_{(n)} = \frac1n \sum^n_{i=1}(\frac{y_i- \hat{y}_i}{1-h_i})^2$