Articles eng

How and where Data Science models are chosen

Main and most famous library is scikit-learn, or sklearn. It contains a significant number of tools for Machine Learning.
In the last article, we got acquainted with three basic, utilitarian models-algorithms using in Machine learning:
•Decision tree
•Random forest
•Logistic regression

How to choose the most optimal solution, even from three models above? To identify “the best”, you need to train the model: build a logistic regression, a random forest, or choose a decision tree that best suits our training set. The learning algorithms and their customizable hyper-parameters are responsible for this. At the same time, it is very important to prepare the samples used by our model.

After the training is completed, the corresponding model is able to predict: accept new objects (features) as input and formulate answers (target feature). Thus, the machine learning procedure can be divided into two stages: model training and the operation of such model.

It is not necessary to work with three models at the same time. Each has its own merits and demerits. Let's evaluate the models in terms of quality (accuracy) and speed of work:

1. Quality (accuracy). This is the most important criterion for business: the higher the quality, the more profit the product brings. The calculation formula is relatively simple: the number of correct answers to the number of total answers.

2. Work speed. An equally significant criterion: if service is slow, the outflow of users cannot be avoided.

3. In real life, there are separate complex mathematical algorithms for checking the effectiveness of models, which, in fact, are based on the quality, accuracy and completeness of forecasting. For such a check, the corresponding algorithms from Python sklearn library are used.

The source code in learning algorithms is usually much more complex than the code in model used. In learning algorithms, developers have written complex mathematical functions that allow us to apply this algorithm to solve certain problems. The most important thing for a Data Science specialist is to understand which algorithm is suitable for specific tasks, be able to configure it and be able to work with what it will give out.

Many algorithms for machine learning are currently available in Python libraries. The main and most famous library is scikit-learn, or sklearn. It contains a significant number of tools, so they are structured in sections. For example, the decision tree is located in the tree section. Algorithms for checking the quality of models and the above three algorithms are also contained in this library.

I continue to publish articles about business development incl. digital and information technologies, and also continue business consulting. If you are interested in articles on this topic, then subscribe to my Telegram channel: https://t.me/biz_in. If you need business consulting support, then I am waiting for you on my website: https://akonnov.ru
Business Self-development