Models training is carried out similarly to expert training. It is necessary to collect a set of relevant data, classify it, analyze relationships and gain relevant experience. For the purposes of solving the problem using machine learning methods, it is necessary to submit a sufficient amount of data to the input, with the help of which we will teach the algorithm. This is called the training dataset or training sample.
In order to make predictions, it is necessary to identify the relationship between the features of the original data and the responses (the desired value). The Data Scientist starts by making a guess about exactly how these relationships work. Then, based on this assumption, he makes predictions. If they correspond to reality, this means that the assumption is correct. This approach is called “modeling”, and the assumptions and prediction methods themselves are called: “machine learning models”.
Today we will get acquainted with the basic utilitarian models that can be used for forecasting and classification, these are:
• Decision tree
• Random forest
• Logistic regression
A decision tree is used to describe the decision-making process in almost any problem. Based on the values of the signs, specific answers are given, after which a tree is formed with the answers “Yes” / “No” and different options for decisions or actions.
A random forest is such a learning algorithm when a certain number of trees independent of each other are built, then the algorithm decides which one is better based on voting. In some cases, random forest improves the quality of prediction and helps to avoid retraining.
Logistic regression is an algorithm for classifying and predicting the probability of some event compared to the resulting logistic curve. In logistic regression, the number of parameters is usually limited. Thus, it is difficult for the algorithm to adapt as much as possible to the features in the formula, and therefore the probability of retraining can be reduced.
In the next article we will take a closer look at how to compare models with each other and evaluate their quality. I continue to publish articles about business development incl. digital and information technologies, and also continue to provide business consulting. If you are interested in articles on this topic, then subscribe to my Telegram channel: https://t.me/biz_in. If you need business consulting support, then I am waiting for you on my website: https://akonnov.ru/.