Systemic capacity and determinism of data visualization services

Editor’s Guide: When you say that you have to be a product, you need to build a system capability. After the product is online, you should continue to provide the determineable determine that users can rely on, in order to make people nostalgia and dependence. This article uses data visualization products as an example, analyzes its system capabilities and certainty, let’s take a look.

Then the Thinking of the Liang Ning series of courses, this lesson is the systematic ability of the product, and her homework after class is:

Pick a product you are most familiar with, talk about how it should provide users with certainty satisfaction? Is this product done? If not, where are you thinking? Sustained satisfaction will depend on, uncertain feeling is harm. You can talk about it, do you have a deterministic injury?

Before writing homework, let’s learn Liang Ning’s noun explained in this class:

When you say that you have to be a product, you need to build a systematic ability; certainty is important. Life is so uncertain, so when you see something very certain, it is a nostalgic and dependence. Continuously providing user-dependent determinism, this is the key;

Today we surround the work of Liang Ning, which explores the data visualized product from “system capabilities”, “deterministic” (more precisely “data visualization services”).

First, how to perform data services? Systemic capacity of data visualization products

In fact, “how data visualization is implemented? “The system of data products is introduced in the article, mainly divided into: data storage layer, data calculation layer, data display layer (as shown below). At that time, only the various levels of the system were explained, but how did the various levels of the whole system work to perform data services?

As a business person, the data visualization product is the most intuitive service of their most intuitive service. We are single from the literal mean, and the entire system consists of two parts: data for business scenes + the ability to make data chart.

1. Data for business scenes

In the actual work scene of the company, data is scattered in different storage, such as log data, embedding point data, server data, etc. But if these data are not processed, it is almost impossible to meet the needs of the business, so it is necessary to mention the data warehouse. In the BI tool, the visual data comes from the data set (you can understand as a data sheet that meets the analysis needs of the service party, covering all the indicators, dimensions), and the data set is direct reference or aggregated computing data warehouse. Data table.

Again the data sheet in the data warehouse, maybe you have also heard ODS, DWD, DWS, ADM these digital-level structures, in fact, in addition to these technical content, there is still a point in the data warehouse to know: it requires data warehouse engineers according to business Structure creation. In the actual work scene of the company, engineers create data warehouses, depending on data mapping tables (also known as DataMapping, which will be indicated in a service [indicator definition], and dimension [indicator to view] related to this field Table) Solidify and model the system existing data (indicators in the data mapping table may need to be obtained by complex mathematical calculations), complete a data sheet that meets the analysis requirements by complex mathematical calculations. Data warehouse.

2. Ability to make data chart

Here, our data visualized product is used as an example, and there are two parts: a part of the back-end capability and the other is the front end capability. Getting to experience Tableau or similar products should clearly create the entire process of the dashboard:

Through this process, “Data Connection” can be apparent that “generated data set” is part of the backend capability, the data connection, that is, connecting the database (including data warehouse) with our data visualized product through JDBC. This is a prerequisite for generating a data set. Generate a data set, that is, to satisfy the rear visual needs, generate one or more data tables within the connected database (typically, the indicator, the indicator, the dimension, the dimension, the dimension, the dimension) , The data mapping table field organized before the data product is consistent, or its subset). Since you need to create a dashboard, it must be viewed frequently, so the data here must ensure that the periodic update can be performed over time, here, you have to mention the scheduling module. The so-called scheduling is the timing update of the data set through the timing settings performed by the task.

“Creating charts and dashboards” is the front-end capability, and the data from the backed end is displayed on the icon. Some big companies or professional visual system companies will develop their own chart components such as Tableau. And China’s Baidu (Antv) also opened its own chart component, and the front-end classmate could achieve visual display of this secondary development.

Second, how to better data services? Data visualization product determinism

1. Loading rate

Data products and those things such as C-terminal, system background products, etc. are the amount of data, and sometimes a data that requires query data may have hundreds of data. How long is the user waiting for? Does the data query execution time beyond the company’s restriction? At this time, the length of the data loaded is to be subject to the user, even the system is limited. In order to improve the loading rate, the first step is usually communicated with the classmates on the business side, cutting the underating analysis to reduce the initial amount of data, such as whether the fine-grained query method in the day is changed to the high-profile total The degree query to reduce data storage (however, the possibility of success in this way is not high). The second step will be started from the data side. If you do not query but a summary query (for example, there are 10W users, I need to randomly demonstrate the label information of a single user, this is the point query. If I only see this 10W user The overall label information distribution, this is a summary query), then we can store the data in clickhouse, and the rate of query will be faster. That if you are a point query, you can replace the spark to manipulate the data set, so that the query efficiency is faster but the more memory consumption is also, so the company can buy some computing resources more. In the third step, you will start from the design side of the board. For example, in the time filter limits the span selection of the date, you can reduce the number of dashboard’s charts to reduce the time of single-handed board query (later, there is an opportunity to separate Introduce the design of a single board).

2. Data Accuracy

It is often clear that some data reports should be clear, more important than the conclusion is that the data is determined. If the data is not accurate, the conclusion is not worth the value, and the data visualization product should also comply with this rule. The data accuracy here is not only whether the data referring to the narrow sense is wrong, and it includes normal execution (if the data cannot be produced, the data in yesterday is not yesterday, but the data of the day before yesterday, then this is also a wrong) Whether the data has an abnormality (such as a null value, etc.), the data exists (or is also known as data islands, is the same indicator whether the different business unit data source is consistent? Is the rule? If these can not guarantee Different business units are unable to communicate well).

Data errors, data exceptions, data execution interrupts can be monitored by monitoring modules in the scheduling system, and data manufacturers can also provide some programs for large data development engineers, such as empty data, null, or related indicators and the previous execution. Data results variations exceed a certain threshold, will remind the corresponding digital classmates to confirm the problem. For some portrait tag data, it is also possible to verify that the image label is accurate.

Data erliness requires data warehouse engineers to unify the data of all business lines of the company, ensuring the same indicator’s data source, and the caliber is consistent. On the other hand, this can also help some indicators are promoted within the company.

Today, I will say this, if the content is helpful, please share or collect it. If you have other points of view, please leave a message below ~