1. Data Collection
In Machine Learning the data is the most important thing, unlike humans who look at a person's face a few times and recognize him/her, ML needs tons of data. The 2001's paper from Microsoft showed that moderate and complex models performed almost the same given sufficient data.
Apart from it, the quality of data is also important, data that does not represent appropriate relation between features and their label is of no use.
2. Data Preprocessing
The preprocessing of data is very essential before feeding it to the algorithm, removing irrelevant features, merging highly correlated features, removing or manually adding missing values, and converting data to numeric values, suppose the data contains a feature representing the country and your dataset consists of many countries which might be moderately correlated to your output so you might not wanna remove it, you can convert it into a one-hot encoding (a zero vector of length equal to the number of countries from a known list and having a value of one at the position representing the country of that datapoint). Also splitting up into X and Y. That is a vector of features and target or label.
3. Splitting the training data
After training the model on the known data it might not perform well on the data which it hasn't seen before. One way to know this customer feedback after deploying it in production, another way is to split your training data into training and test set (generally 80-20 %) and testing it on the test set before deploying it. There are some parameters that are not learned by the model (called hyperparameters) like the size or complexity of your model, number of trainable variables, learning rate, etc. Those parameters are tuned after observing the loss on the test data. But even after this, your model might not generalize well as you have tuned your parameters by looking at or using the test data. Thus data is split into 3 training, validation, and test sets. Here validation data is used to tune hyperparameters.
4. Defining a Loss function
The loss function basically measures a distance between the actual expected output and the model output. There are different kinds of loss functions, most common is the L2 norm or the square root of the sum of the square of the actual target value and model output value divide by the number of data points.
5. Choosing Optimizer and other Hyperparameters
The optimizers are algorithms that tune or adjust the trainable variables in your model in a certain way. One of the most common is the Gradient descent optimizer. It subtracts the partial derivative of the loss function with respect to the trainable variables from the trainable variables after multiplying with a constant called the 'learning rate'. The learning rate is generally kept below 10^-3. The trainable variables may sometimes explode (in deep networks their value might get very high), or they overfit (memorize) the data points, hence, 'regularization' is used that is the sum of values of variables multiplied by a constant are added to the total loss to keep the variable values small.
6. Building the model
The model is defined based on the complexity of the task. For an easy task of say calculating house prices based on size, region, age a simple 'regression' model will work. For a task of facial recognition, a much complex and bigger model is required. The trainable variables are declared and assigned to a small random value (values are generally taken from a truncated normal distribution with mean 0 and std 1)
6. Training the model
In the training process, the value is obtained from the model for the training set, and the training loss is calculated, then the optimizer tunes variables and takes a step to reduce the loss. Generally, the datasets contain thousands or millions of data points. Iterating the whole dataset through the model once will take up a lot of time. Therefore, the model is trained in batches (of size 32 or 64 or 128...) taken randomly from the training set.
7. Inference
After finishing the training, the Machine Learning model is ready for inference. A data point (except the output) can be fed into the model for prediction. The model can also be deployed to a VM and REST APIs can be used for inference.

Comments
Post a Comment