How to properly propose data needs

Introduction: At the moment, data plays an increasingly important role in the business, no data does not know how business progresses, no data does not know how to decision, the company regards data as a wind standard, and it also treats data as a source of power. The flow link of the company’s internal data is the most frequently different roles to express the demand for data. This article will share the paradigm of data demand, through this paradigm, to express its own data requirements or to assist the data demand, and quickly express your ideas.

01 Reflections on paradigm

If the functional product, I usually think about business model, business logic, user interaction three levels, how to understand the product demand of the business side, and use it to review your product demand perfect, logic is innocent.

These three-story thinking is a classic MVC framework mapped to the software development area, from the demand level to the implementation development level, the corresponding reference can be found, which is especially important in software engineering, greatly reduced communication costs and design costs.

This triggered, for the data demand, I think that we use SQL to operate regardless of data development or data analysts, or yourself, which makes me inevitably think of the syntax structure that can be used with SQL. As a data demand expression paradigm.

The fact is ok.

02 Analysis of SQL Expression

Regardless of how complicated, how many sheets do you need, how many operations do, as long as the library table is complete, it is theoretically we want. You see, it is often such a structure in the SQL data query:




Count (a) AS I,

Count (Distinct B) AS H,

Max (c) as t

From tab

WHERE day> 20211101

And b = ‘x’

Group by A,



There are three key clauses at this inside to be concerned:




Obviously, from and where determined the scope of the data we had to query, Group By gave us a packet basis for data, and SELECT finally produced the desired data.

In SQL’s grammatical structure, it has been very clearly told our data logic. What we have to do is to refer to it, accurately express our data demands.

03 SQL revelation

After analyzing the SQL structure, we can get such a demand expression paradigue three elements, which guarantees that we correctly express data demand and smooth communication data. These three elements are:

1) Statistics

This part corresponds to WHERE and FROM, need to know the statistical fields and qualified conditions involved in the demand.

If you want to count the data of the goods, you need to be determined whether all goods is still currently on the status of the store; statistically user information, whether only pay users; whether the statistical flow data contains only the traffic of the mobile phone, and so on.

The scope of time is necessary, even if all historical data, all historical data need to be clear. Of course, the time can also be dynamic, such as recent 7 days, nearly one month, etc., this is related to the operation time of the data.

The statistical scope of data has helped us to limit the problem within a certain limit, reducing communication disorders caused by inconsistent data range cognition.

2) Statistics dimension

The statistical dimension is part of Group By in SQL, using Group By to group detail data, and data performance is evaluated by these packets. All data will have a corresponding dimension representation whether it is a single packet or multiple grouping.

Within a time range, the dimension can be divided into difficulty in the year, month, week, day, time, time, etc., the premise is the details of the corresponding smaller particle size.

The classified data in the statistical sense can be used as a dimension. If the gender can analyze the data according to male and female dimensions. Continuous data can also be divided into dimensions by means of a split box.

The dimension determination, clearly, and the analysis of the analysis is critical. Of course, multidimensional analysis is also a common means of data search. When needed, it is necessary to clarify whether multidimensional analysis and which dimension combinations are needed.

3) Statistical indicator

The statistical indicator is SQL’s SELECT data output part, which is the final data form. In general, all dimensions are output and indicated as an indicator.

The indicator is the caliber, that is, the calculation method of the aforementioned dimension, the dimension is generally involved in the calculation, but sometimes does not participate in the calculation. The calculation method is generally a polymerization calculation of counting, weight count, evaluation, average, standard deviation.

The statistical indicator is the result of the demand side, from the range to dimension, to the final indicator data output, the calculation method of the indicator is the most important aspect.

The above three elements can more accurately describe a data demand, so that the demand executor explicitly performs data execution operations.

04 Ultimate We do not understand the above-related concepts because the demand side does not have SQL skills, but we do need this information, you can do some guidance and training usually make it in this paradigm.If you cannot understand these concepts, you can guide the demand to draw the sample sample and indicate the logic of all columns.

In addition, some other information is also helpful to us:

Demand – Identity Background: Helping us to judge the motivation and analysis of the proposal of the demand;

Demand background: Understanding business backgrounds for this data needs;

The frequency of demand: If it is periodic data, it can be considered;

Demand’s expectation completion time: convenient for demand, determine priority.

Author: Li Qinghui (Xinba), data product experts, data product team leader; good at data governance, data analysis, data operation; Python book “deeply shallow PANDAS” author; “Data Creator Alliance” member.

This article is published by @ Ò» Êý¾Ý ÈË µØ µØ Ô­