Big data analysis, what is analyzed?

The word “big data” believes that everyone is already familiar, however, you really understand what is big data, what is a big data analysis? If you want to know what is big data analysis, you need to in-depth business processes and understand how data is collected. In this article, the author interprets the big data analysis, let’s take a look.

“Big Data” is already shouted, and “big data analysis” is often mentioned. Can you completely “big data analysis”? For everyone, I have a lot, and I have little feeling in my usual work? Today’s system explains.

I. “big data” understanding ordinary people

The big data of ordinary people understands is a thousand strange, such as:

An Excel file 200M, how big data! my country’s population is 1.4 billion, this data is great! Lying in the trough, I just saw the car, there is a 4S shop to promote the phone, and I will definitely collect my big data. ……

These kilometers of understanding are caused by “data” itself. I want to understand the true meaning of “big data”, and you will say from where you come from.

Second, understand “small data” first, talk about “big data”

The most primitive data acquisition method is: questionnaire. Data from a special investigator, with an on-site inquiry, measurement, etc. This approach has been used for more than 400 years, and the classic statistics, management theory is built here (as shown below).

Is there a small data? it works! very useful!

Can collect data in a certain region, representing the government’s dominance in this area. The more data can be collected, and the central government can master the local situation and strengthen management. The data is so important, so that it is a long time in history, the statistical task belongs to the government, military, intelligence agency. In my country’s first investigation company, in the early 1990s, China was established in P & G demand.

However, there are three obvious issues of research practices:

Very expensive. Visitors, supervisors, review, entry, data processing … are all people. Very time time. Design questionnaire, fill in, recycling, is time. Low accuracy. The data measured on site is relatively, but most of the oral question is not allowed.

These problems have led to data acquisitions in the era of questionnaire, and can only have a sample, and cannot be collected in full. It is therefore derived from special sampling theory and methods. But regardless of how the data method is improved, in business, sampling, it is always a difficult to charge. The decision makers will always feel:

Is it too small for the sample; it is not a representative; there is no over-covered sample is really consistent.

As long as it is sampling data, it will be questioned, it is always full of doubts. This is also the origin source of “small data”, follow-up all “big data” is actually around “small data” issues.

1. The data is getting bigger first: system collection

The data is started from the system to start from the system. For example, enterprises expand the scale, to build a chain in all parts, the first step is to install POS machine, collect trading data, replace paper order / shipping orders. At this time, I want to know the sales data, and it is possible to view the data collected by the POS (below).

From sampling data to full amount, it is a major change. Based on the full amount of data, you can directly manage each terminal store, and use it directly based on data. Therefore, most of the sales analysis, management analysis, business analysis systems are based on this basis (as shown below).

However, the limitations of this stage are also obvious: POS can only record trading results, and the process is not known. Who is buying, how much it bought, I bought, I don’t know.

The analysis of this stage is typical to know, and I don’t know how much it is, most of the analysis can only guess through the results of the transaction. If there is only this stage of data, you want to do in-depth analysis or rely on investigation. For example, traditional enterprises want to understand the store transaction process, will be a store investigation, study consumers in the store, ask about the consumer experience.

2. Data becomes a second step: active collection

After the system collection, you will naturally think: In addition to trading data, other data can also be harvested. For example, the simplest form: let the user hand themselves to the ID card, entry through image recognition. This can be acquired to user data and avoid manual filling errors.

But the problem is coming: I will give you a ID card! These real data can be collected relatively accurately in terms of traditional collection, only banks, aviation, communications such as national background + legal norms.

However, these difficulties have not blocked the enthusiasm of enterprises to collect information. Common means, such as: Send the user’s membership card and give the user to the birthday. The user completes information to the integral reward. More aggressive, even common sense to install the face recognition in the store, and install the eye to track equipment to collect data (of course, cost is high).

Why companies will pursue this data because they are really useful. At least the data can be specifically to a person, you can identify who is a high-end user, who is a sleeping user, so that it is refined operation (as shown below).

True low cost, solve the problem of collecting consumer behavior data, but also rely on Internet products.

3. Data becomes the third step: behavior joins

The biggest advantage of Internet products is that the app / applet / h5 itself is a digital product. At this time, not only the condition is recorded to the user’s click, login and other behavior data, but also integrated the user ID, mobile phone number and other information into a unified ID, and the efficiency is stronger than the process of depends on the line of paper. You can also label the contents of the video, pictures, articles, through the user click, number of forwardings, and browse the user needs. For those who have experienced traditional enterprise data, the data of the Internet products is simply a bird gun.

Compared to the traditional member basic information, consumer flow data, the user’s behavior data is special, especially big, everyone think about how long it will take a thing for how long it will take a thing. You may have a few hundred times before you click to browse, and finally there is a single transaction.

Therefore, a special large data architecture is required to support the storage and calculation of these data. Big data technology in a narrow sense, refers to the storage and calculation of a large number of user behavior data, non-structured data.

With this data, there is only our popular internet analysis methods, such as funnel analysis (as shown below).

Based on these basic data acquisitions, more data applications can also be extended, such as:

Model class: behavioral prediction, recommended algorithm; test class: product abtest; portrait class: user portrait.

Although these methods, transaction data can be done, but the amount of data is much widowed, and the results accuracy are directly determined. Thus affecting the service terminal. In the traditional age, only bank, operators, and airlines exclusive analytical methods have become the standard of Internet companies.

However, even this, since there are some problems that cannot be resolved:

User data is scattered in a number of platforms, resulting in shortcoming of single platform data; user psychology cannot directly react as data, impulse type behavior will interfere with normal data judgment; information security regulations require more strict, and the data acquisition / use restriction is increased, how to legal Regulations, continuously use big data resources, remain today an important issue.

Third, the “bigger” method of the countertop

Of course, there are some gray / black methods to make the data become large.

Buy data directly from “data” people! Reptile, hit the library, hardfare user data. The user data is simplified by the device.

This is the source of various harassment phones, spam messages. Of course, as the state is getting stricter for information security protection requirements, these stuff-survival spaces are gradually compressed, so they don’t talk.

Fourth, the ultimate answer of “big data is used”

Throughout the data from a small to large process, it can be seen that the data is useful. Even in the simplest, least accurate data, it can also respond to management issues. Therefore, decision makers are always tireless for the pursuit of data, which is never satisfied (as shown below).

So, why is there so many people ask “There is a big data”? “

Because not everyone understands the use of “data”, don’t say big data, small data is still not used.

As of 2021, some people still have a head decision, chest assured; still someone is addicted to “the old man is ten years, I am right”; still thinking that big data is cured, the code is knock, banknotes from the computer The screen is sprayed out; some people still superstitious “underlying logic” “core thinking”, bitter practice.

In short, want to use the data, you have to go deep into the business process, specifically understand the data acquisition method, so that the business meanings behind the data can be used to convert the specific problems that you face into data issues, in order to solve the correct answer.

Collecting and calculating data is a science.

Applying data generates value is an art.

The difference is thus the same.