How to analyze the data abnormality problem

Editor’s Guide: Data Abnormal Monitoring Analysis is not only a high-frequency test in the data analysis interview, but also a normal work content in business analysis. How should I analyze when the data abnormality fluctuates? The author of this article published its own view and shared with you.

I. Performance of data abnormal fluctuations

Data abnormal monitoring analysis is not only a high-frequency test in the data analysis interview, but also a normal work content in business analysis. When we face various business scenarios, products, operations, etc., often find a lot of abnormal data, such as:

A app daily DAU suddenly fell 10% compared to yesterday, what should I analyze? A company sales income fell by 15% from the previous month, how to analyze? A product customer price has dropped by 20% from the previous month. How do you analyze? ……

The above various business scenarios, in fact, it is found that the data has abnormal fluctuations, then the problem is coming, how should we start to analyze? Most people solve the problem of the problem, they are all reasons for the theme. If they come to find an abnormal reason, where is the abnormal point. In fact, this analytical idea has a biggest drawback that is not systematic, it is often able to find some reasons, but it is likely to be a one-sided, and it may even still step on the thunder. Let’s share a relatively practical approach, I am “point-line-face” analysis.

Second, the methodology of data abnormal analysis

1 o’clock

First investigate data accuracy, determine if it belongs to a data error. It is prone to data incorrect as a data acquisition link (buried point), data extraction link, product link (BUG), business link (data caliber), etc. If there is no problem to enter the next analysis;

2. Line

The longitudinal analysis is to be conducted, and whether it belongs to periodic fluctuations, some industries have more affected by the season or in the season, such as home appliances, beverages, online education, etc. are more affected by the weak season. If there is no problem to enter the next analysis;


First, a comprehensive use of formula dismantling method, multi-dimensional dismantling method, etc., disassembles a small problem that can be analyzed by one by one by one by calculation factor; finally, bold Assuming, be careful to verify that the root cause of the problem is gradually investigated.

For example, we have a decline in sales revenue, and after “point-line” analysis in the “point-line-face” analysis, we can use “three songs” in the analysis link of “face” in the analysis link of “face” analysis. Analysis “method, the main process is as follows:

1) Decomposition of big problems using formula dismantling method, multi-dimensional dismantling method, etc.

2) Calculate the influence coefficient, the main cause of the main cause of the position is = (fine segmentation this month – the number of subdivision) / (the total amount of this month – the total amount of last month)

The greater the influence coefficient, indicating that the dimension data is the main influencing factors that cause overall fluctuations.

3) Through the hypothesis test, the root cause is gradually investigated.

It is assumed that after calculating the influence coefficient, it is primarily determined that the revenue declines caused by the new users, and next, it is necessary to assume the reason for the decline in new users. Common hypothetical dimensions have internal, external dimensions, where internal dimensions have main channel sides. , Product side, operational side, technical side, etc., then verify the above assumptions by data one by one.

The above is a few common split dimensions, which are roughly scope by initial splitting.

Third, the case of data abnormal analysis

A chain retail group has maintained steady trend in the last two years, but the sales income suddenly fell by 17% (pictured) in June 2021 (now), then the sales person in charge is very urgent, let you check it as soon as possible The reason for the sales fall. In the face of income, this tricky problem, if there is no experience, it is still a headache, just like the hot salami, there is no way.

Next, we will exercise according to the above routine.

1 o’clock

First investigate data accuracy, determine that it is not a data error;

2. Line

Draw the length of time, not seasonal fluctuations;


1) First split the total income into new, old users revenue

As shown below:

By observing the data, it is found that the new and old users’ income have different degrees of decline, so the second step is started, and the influence coefficient is calculated separately.

2) Calculate the new, old user revenue influence coefficient

New user revenue influence coefficient = (33 – 47) / (100-120) = 0.7

Old user revenue influence coefficient = (67 – 73) / (100-120) = 0.3

New user revenue influence factor is 0.7, indicating that the main income decline is the decline in new user revenue, and the scope of the problem is subdivided, what is the composition of new user income?

The new user revenue = number of new users * conversion rate * customer price

By taking data analysis, it is found that the new user conversion rate and customer bill prices are stable, then the problem is on this indicator, what is the new user consists of?

New users = channel 1 + channel 2 + channel 3 + … + channel N, us, to split the new user according to its channel source:

By splitting the source of the new user channel, we found that the number of new users in June 2021 decreased very serious, so we positioned the root cause of the income decline lied that the number of new users declined seriously. Next, we can start the third step to make a hypothesis test for the number of new users of the channel 1.

3) Make a hypothesis for the decline in the number of channels for channels

The possible reasons for the decline in channel flow can be considered from two large dimensions: for external dimensions, external environment variations, competitive changes, etc., for internal dimensions, possible reasons for channel clues, policies, etc. At this time, on the other hand, the data is analyzed and verified. On the other hand, the person in charge of the channel 1 needs to locate the specific reasons, find the specific reason, then the subject matter.

This case is purely fictitious, if there is a similar, purely a coincidence. Of course, the actual business is much more complicated, and there are many factors that need to consider, and the time spent will be longer. However, the methods and processes of solving problems are multiplexed. I believe that when you have faced similar problems in the future, there will be a clear analytical idea and clear settling points.

Finally, if we summarize the “point-line-surface” analysis method, we can refine the following picture:

I hope this article can have a practical help to everyone to prepare interviews and actual work. If students feel help, welcome to praise, forward! If you have other ideas, please join me to discuss communication.