Getting Started with Matlab – Part 1:
A lot of the math used for Machine Learning (ML) is linear algebra, actually a lot of engineering in general is based on the stuff. A matrix of numbers could represent everything from a circuit based control system, to a concrete pillar under a bridge. This also means that all the CAD software runs on the stuff as well. Its basically everywhere.
Linear algebra can be done by hand or you could do it using a programming language. You could be hardcore and write your own super optimized matrix inversion antilogarithms in C. Or even use python’s Numpy, or even R. However the industry standard is Matlab, these guys hold a virtual monopoly over the realm of engineering programming languages. When I say “engineering” I mean the hard physically based stuff like designing aircraft and rockets, or buildings.
Some would argue that Matlab is easier then Python, it is but only for linear algebra. You would not want to make your website in this stuff. Matlab works using a repal interface, all your variables are stored in memory and persist in your work space, this work space can be saved and loaded up again when you need it.
Below are some Matlab’s more basic commands.
Near the end of the last post I started talking about gradient descent. Now for the most part you can think of gradient as surface that your trying to find the lowest point in. However once you get into systems that require you factor in more then just one feature it stops being a 3rd surface and become an abstract idea of being a surface. Now you have to optimize for N different cost functions that represent the N number of features.
To help in do this gradient descent effectively you have to properly calibrate the learning rate and adjust your training set using feature scaling. Each of these two things will help in making gradient descent find the right solution faster.
Feature Scaling: You want to do this when there are wild discrepancies in the range of values. For example for one of the features could be the size of the house in meter squared (in the 100s), and other could be number of previous owns ( 1- 5). You want to think about these types of things because it may cause your gradient decent algorithm to jump back and fort, making it harder to find the global minima. When you do feature scaling you are simply trying to get all your features into the same range of values.
Learning Rate: As you know the learning rate is effectually your step size as you go down the 3rd surface, trying to find the global minima. Some times the gradient descent algorithm may step so far ahead it may miss the minima, and keeps missing it since its step size (learning rate) is so high. However if you making your learning rate too small it may take a very long time before you find your minima since your have to take so many more steps.
Another way to find the values that minimize the cost function is to use the Normal equation.
Where theta is the value that minimizes the cost function.
Where X is your feature matrix
Where Y is the known output
Using the Normal equation we can compute these values in a straight forward process without the use of iteration. However this comes at a price of speed when the feature matrixes get very big, since to compute the inverse of a (n x n) matrix is roughly O(n^3). In these cases gradient descent is going to be the better choice.
You can see why I started this series here.
What is machine learning ?
- My definition:
- When you tell a machine to learn from experience rather then, explicitly giving it a bunch of instructions.
- Course Definitions:
- “the field of study that gives computers the ability to learn without being explicitly programmed.” – Arthur Samuel
- “A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E.” – Tom Mitchell
You can think of this like teaching a small child how to do something you already know. Such as counting objects, or throwing a ball. Another way of thinking about it is that you give the machine the data, knowing that there is some relationship there. Then having the machine find it by its self.
This type of learning comes in two different flavours:
Given a bunch of data and asked to predict what will happen next. An example will could be: “given all the historical data about housing prices, what will be the price of a house in 2020 ?” We are mapping input data to a continuous function to.
Take the input data, and give me discrete outputs (classifications) . For example if you were to take data on students and predict which students would become engineers. Here we still know what factors really influence the result, which still makes it supervised learning.
We have mountains of data that we think is random and has no structure. We have no idea what the relationships are between the variables. So we let our machine loose on the data to discover the relationships between the different variables. And it starts to cluster the data into different piles.
So a while ago… well now a long time ago, I enrolled in one of the most widely know machine learning courses ever. It is taught by the 600 pound gorilla of the Machine learning world aka “Andrew Ng”. And is offered through the Coursera online learning platform and Stanford University. When I first bought the course I was just like: “lets do this !!! ” , then a week went by and I was like: “Wow this is actually pretty easy to understand”, which in turn lead to me to where I currently am. That is not working through the lectures, but writing this blog post. I really want to get that certificate, and uncover the mysteries and oddities of the machine learning (ML).
Therefor, I am going to be starting a new series of blog posts that summaries what I learn each week from the course. This way I do review of the material at the end of each week, as well as gain a better understanding of the matter by teaching it to you guys.
So far I have only competed the first week of the course. This was Introduction week, and covered the following topics:
- Linear Regression with One Variable
- Model Representation
- Cost Function
- Parameter Learning
- Gradient Descent
- Gradient Descent for Linear Regression
- Linear Algebra Review
So far this class has been taught amazingly well by Andrew Ng. I wish all courses that involved abstract topics and math were taught this well.
In the coming week I will make two post for part 1 and part 2 of this series. See you then.