Methodology for building data solutions that solve business problems.

Most data solutions fail and this is due to several factors, and the most common one is to base everything on hiring a certain technology. In reality, neither a Big Data tool nor any algorithm is going to magically solve problems or can be implemented in a "plug & go" model. It requires attending to several areas, planning our actions and in many cases being willing to iterate.

So that our chances of success are magnified, I have prepared a checklist of elements to attend to in our planning.

Steps to implement business-oriented data driven solutions:

1. Hypothesize a solution to a well-defined problem.

When choosing a problem it is important to choose concrete, high-impact business problems, with affected business partners, and where a quick return on investment can be demonstrated. In other words, it is important to look for quick wins! Because you will need collaboration and data from business units... and there is a tendency in companies not to share data.

For example, my problem may be not having a unified vision of my customers and as a solution I can propose the implementation of a CRM. But that is going to take a lot of implementation, cultural changes and it will probably take more than 2 years before I can start to reap the benefits. It is preferable to start with more specific problems: reducing churn, improving the response of my telemarketing campaigns or a model for improving demand forecasts, etc. And from there, I can get medals early on before I dare to tackle more complex problems with allies.

The problem must be well defined. That is, identify who has the problem (units, department, customers), and an assessment of its impact. Causes of the problem and hypothesis of solution with data.

If my goal is to improve the prediction of demand for air conditioning equipment, then I assess whether with a data model I can hit on how much the market will demand me to avoid stocks and fall short in sales or long in product. In my evaluation, I will say that with my model I can reduce my costs by lowering my stocks or sell more in periods of success for not running out of stocks of products in high demand temporarily. And it is even better if I can quantify these objectives, e.g. stock reduction by 20%.

2. Selection of data that would help me in my model.

In this step we will analyze what data we need for our model to work. To do this, I must map the data needed for our solution, differentiating between whether that data is in my organization (internal), in the market (external) and whether it can be easily processed or not. For example, nowadays, videos, logs or photographs can be transformed into data but with a great processing effort.

Internal information is the easiest and cheapest. That's why we must start by analyzing what data sources are in my company, who has them and how those data are currently used. It requires asking a lot of questions to the different business units (IT, Finance, Marketing...).

Our data alone is probably not enough, especially when we are looking for answers from the market and not just from our customers. Much of the information we need can be open data. Companies like Deyde DataCentric can help you to complement and offer you a complete data catalog. In the example of predicting demand for air conditioning equipment, data such as historical sales by date, vacation calendars, land registry data, data from other suppliers in the supply chain and, of course, meteorological data could help us in our model.

3. Cleaning and preparing data for analysis

An important issue is the data quality processes to clean, validate and relate the different datasets. Because with bad data the answers of any model will be wrong.

In many cases this is where much of our efforts go. Specifically more than 50% of the time on average. That's why you can count on professional solutions that allow you to reduce your efforts like ours from MyDataQ.

4. Choice of the analytical model I can use.

Depending on whether you want to give an answer to what has happened or predict the future, the complexity of the model and its ease of application will be complicated.

Data scientists and analysts will advise you on the most optimal statistical model or algorithm.

In any case, in my opinion, it is not advisable to fall in love with "hypes". It is not mandatory to use sophisticated models such as neural networks, gradial boosting, random forest. Sometimes it is better to use simpler statistical models. It is better a simple model that provides an explanation of what is happening or allows me to improve my current situation, than a black box that predicts the future but without any understanding of what it is based on.

5. KPIs to evaluate the achievement of our hypotheses and testing plan.

I will need metrics to evaluate the achievement of our hypotheses. KPIs that respond to the hypotheses raised to know how we are doing. Be careful, we start with hypotheses to test, we must be prepared to iterate. It may or may not work, but we may also discover new unexpected benefits.

There is always an opportunity to test. Whether A/B or miltivariate tests, and ideally always against a control group.

6. Necessary resources: HR, technology...

My planning should include a resource allocation. In terms of equipment, we must identify whether we do not have equipment prepared to interpret complex data sets, so that in some cases it is necessary to hire personnel competent in statistical and mathematical data analysis. In other cases we will need the help of external companies to assist us in the management of these data analysis projects or in the execution of specific tasks.

With all the data model defined in the previous points we must identify the technological solution that allows us to run our model without problems. Ideally, if we can reuse a technology that we already have in-house, all the better. Otherwise, select a technology that can be integrated with what we have via API.

In any case, as a general recommendation, we should not be dazzled by magical functionalities. Database technology should be a means and not an end. It is better to have an Excel with good data and processes than a powerful platform with inconsistent data. A data analytics model with complete and accurate data is better than any complicated model with erroneous data.

In other words, the process should start with a well-defined business case, then select the analytical algorithms to be applied and only at the end see what technological support should be developed.

7. Action planning to ensure its implementation, monitoring and integration with the company's solutions.

Now it is time to bring our solution to life. We must define the actions to make this happen and these must be led by ourselves to ensure that our solution is used by the business units that must use it. Otherwise, there is a risk that it will be forgotten in a drawer.

We must also ensure that the solution integrates well with existing systems and monitor the solution once it is in production to detect and solve possible problems.

8. Measuring, learning and sharing what we have learned.

Finally, everything we have done must be measured to see if it works or not and the time will come to draw conclusions and share the results and learnings with all areas related to Marketing, IT or customer experience.

Methodology for building data solutions that solve business problems.

Giang Monday, September 18, 2023