Skip to main content

Do you know steps in building a full Machine Learning model?

1. Data Collection

In Machine Learning the data is the most important thing, unlike humans who look at a person's face a few times and recognize him/her, ML needs tons of data. The 2001's paper from Microsoft showed that moderate and complex models performed almost the same given sufficient data. 

Apart from it, the quality of data is also important, data that does not represent appropriate relation between features and their label is of no use. 


2. Data Preprocessing

The preprocessing of data is very essential before feeding it to the algorithm, removing irrelevant features, merging highly correlated features, removing or manually adding missing values, and converting data to numeric values, suppose the data contains a feature representing the country and your dataset consists of many countries which might be moderately correlated to your output so you might not wanna remove it, you can convert it into a one-hot encoding (a zero vector of length equal to the number of countries from a known list and having a value of one at the position representing the country of that datapoint). Also splitting up into X and Y. That is a vector of features and target or label.

3. Splitting the training data

After training the model on the known data it might not perform well on the data which it hasn't seen before. One way to know this customer feedback after deploying it in production, another way is to split your training data into training and test set (generally 80-20 %) and testing it on the test set before deploying it. There are some parameters that are not learned by the model (called hyperparameters) like the size or complexity of your model, number of trainable variables, learning rate, etc. Those parameters are tuned after observing the loss on the test data. But even after this, your model might not generalize well as you have tuned your parameters by looking at or using the test data. Thus data is split into 3 training, validation, and test sets. Here validation data is used to tune hyperparameters.

4. Defining a Loss function

The loss function basically measures a distance between the actual expected output and the model output. There are different kinds of loss functions, most common is the L2 norm or the square root of the sum of the square of the actual target value and model output value divide by the number of data points.

5. Choosing Optimizer and other Hyperparameters

The optimizers are algorithms that tune or adjust the trainable variables in your model in a certain way. One of the most common is the Gradient descent optimizer. It subtracts the partial derivative of the loss function with respect to the trainable variables from the trainable variables after multiplying with a constant called the 'learning rate'. The learning rate is generally kept below 10^-3. The trainable variables may sometimes explode (in deep networks their value might get very high), or they overfit (memorize) the data points, hence, 'regularization' is used that is the sum of values of variables multiplied by a constant are added to the total loss to keep the variable values small.

6. Building the model

The model is defined based on the complexity of the task. For an easy task of say calculating house prices based on size,  region, age a simple 'regression' model will work. For a task of facial recognition, a much complex and bigger model is required. The trainable variables are declared and assigned to a small random value (values are generally taken from a truncated normal distribution with mean 0 and std 1)

6. Training the model

In the training process, the value is obtained from the model for the training set, and the training loss is calculated, then the optimizer tunes variables and takes a step to reduce the loss. Generally, the datasets contain thousands or millions of data points. Iterating the whole dataset through the model once will take up a lot of time. Therefore, the model is trained in batches (of size 32 or 64 or 128...) taken randomly from the training set. 

7. Inference

After finishing the training, the Machine Learning model is ready for inference. A data point (except the output) can be fed into the model for prediction. The model can also be deployed to a VM and REST APIs can be used for inference.
















Comments

Popular posts from this blog

What is AI?

 It actually seems funny to write answer to this question (as it's so unusual to find an article about this these days 🤔).  AI is short for Artificial Intelligence or intelligence which is created by humans. But what is intelligence then? Intelligence breaks into the tasks that beings are capable of doing. Like thinking, memorising, remembering, deciding, reasoning, predicting, recognising, improving, inventing, reproducing, dreaming, assuming, surviving, feeling, hoping, coping, all these tasks ending with an 'ing' reminds us that they will never end until life (except for reproducing😉). The thing that makes beings actually alive is knowing that they are.  But are all the beings intelligent? Not all of them carry out all those tasks. As being smartest of all we humans still don't know if a mouse dreams or not (atleast Jerry does😏). But we do know, beings with a smaller brain or number of brain cells cannot carry out complex tasks. I'm sure ameoba can't recog...

Do you know Machine Learning?

 Machine Learning is like Jesus, It's everywhere... From pizzerias to Notco (a company which uses AI to make vegan food that tastes like meat) and from banks to Netflix all are using Machine Learning. But can machines actually learn something? 🧐 There are several algorithms that improve performance on a particular task with experience, that's it. By the way, if anyone asked, that was the definition of Machine Learning. The thing that computer systems can actually increase their performance or learn tasks is what AI is driven by.  Machine Learning is basically divided into 3 categories, viz, Supervised Learning, Unsupervised Learning, and Reinforcement Learning.       Supervised Learning is learning from a training set of labeled examples provided by a knowledgeable external supervisor. Each example is a description of a situation together with a specification—the label—of the correct action the system should take to that situation, which is often to ident...