Just imagine that we have a case of some software implementation. Let's take something from the direction of Data Science, for example: Predictive engines (predicting the behavior of customers, competitors, partners), Analytical engines (for the purpose of collecting information about the placement of our goods in networks, clustering our customers for the purpose of forming groups and unique offers ), Controlling machines (control over the condition of equipment, warehouse control, etc.).
What are the main stages we can foresee in such a project?
1) Analysis of the project and audit of as-is and to-be process with the writing of the corresponding TOR.
2) Possibilities and options analysis for the execution of technical specifications, taking into account internal and external factors.
3) Modeling the project taking into account financial and project risks, building a feasibility study.
4) Search and selection of Contractors, updating and approval of all essential conditions.
5) Project implementation with the control of targets and terms, as well as with the subsequent monitoring of key KPIs.
In general, these stages are obvious to many, so let's take a closer look at the Performers and the specifics of the execution of such a project.
Apart from the management, we need the functionality of the following employees:
1. Date analyst / Business analyst / Internal auditor / Expert for the purpose of analyzing and describing the processes and the formation of technical specifications and the Methodologist for training purposes and the formation of instructions. The date analyst should be able to use tables (Excel, R language, etc.) for data analysis, visualization and business intelligence tools (Tableau, PowerBI, Metabase, etc.), be familiar with methods of searching, obtaining and cleaning the initial information (R language, Python, SQL, PowerQuery). The main skill is to search for patterns, deviations, followed by the formation of models and proposals based on the research with the results presentation.
2. Data scientist to write the source script, must understand mathematics (mathematical analysis, mathematical statistics, numerical methods, etc.), libraries (CRAN, NymPy, Pandas, PyTorch, etc.) and scripts (Python, R, C #, Java, etc.). Thus, Data scientist analyzes, formulates and proposes a technical solution for the approved task.
As I described earlier, low-level programming (creating a script from scratch) is no longer practiced in the modern world, everyone uses ready-made libraries (consider a ready-made script) to solve technical problems.
3. Data engineer (programmer) for the implementation of the algorithm / solution integration into the IT system of the organization, received from the Data scientist. He must correctly submit data to the input (of the script), embed the result (of the script) into the project, and ensure the efficient use of system resources. It should be borne in mind that Data Engineer needs to integrate scripts into reports, databases, data flows, infrastructure. It should also provide the construction of new levels of abstraction (efficient orchestration of the system), flexibility and the required level of performance and scalability.
The efficiency, effectiveness and economy of the system will depend on the Data Engineer work quality. Without considering the scalability of the system, the customer may face an insurmountable challenge when the load on the system increases. The project budget can also vary greatly depending on the chosen solution (for example, own capacities or clouds).
Data engineer, respectively, must understand programming languages (Python, R, C #, Java, etc.), be able to work with databases (MySQL, NoSQL, etc.), where the source information is stored, distributed data processing programs (Hadoop, Spark, etc.) in order to use different servers, computers, clusters to process large amounts of data. Knowledge in the specifics of data collection through sensors and IoT (Internet of Things) will also be a plus.