exclusive!How to play data analysis?

The completion of the product is inseparable from the data analysis. As a product designer must have the ability to establish a model according to the actual situation. This article introduces how to play data analysis, it is recommended to understand the children’s shoes reading of data modeling.

As a product designer, data analysis is not only analyzed according to the data given by the product, but to establish a model according to the actual situation. (For the convenience of understanding, the code shown in the article is not special specification, does not affect the use, hoping including

Data driver design process

I. Data-decision-making – the process of data decision making is a quantitative process

Data decisions are to assist us in decision by quantitative data, thereby enhancing the scientific and accuracy of decision-making.

1. Understand quantization

The earliest scientist in history did not recognize the experiment. The experiment can have an error. All measurements must be accurate, and any errors are attributed to the mistake until people will slowly realize that the error will always exist, and cannot be eliminated. The same is true, quantification is to reduce uncertainty, estimate risks, to assist decision, so quantitative processes do not require infinity, do not need to completely eliminate uncertainty, as long as we can support our decisions.

2. Record interval – a way of quantization

Because quantization is not necessarily a precise number, and in reality, we often encounter data imperfect, and the amount of data is too large and difficult to handle, so we introduce statistical concept-confidence intervals for assistive We decisive. The confidence interval refers to the range of a correct answer at a specific probability.

Under normal circumstances, we are required to have a narrow space between the faith, and confidence is 80%. Very low confidence means that this data interval is large, and the interval means that this interval lacks mean.

For example: This exam has 100% confidence in [0,100], this section is equal to everything, lacks the meaning of reference; this exam has 5% confidence in [95,100], which means this There are 95% confidence in [0,95], so [95,100] is therefore possible to be wrong. This exam has 80% confidence in [85,100], which means that this range is likely to be right, can respond to true situation, and even we can think that the class average is about 92.5.

Record interval example

Second, data dismantling

1. Determine the target – the target must be quantified

Each project has a certain goal, so we must understand what our goals are doing before doing, sometimes, business or products will directly tell us what is the goal, such as improving the retail rate, enhances conversion rate, at this time Very clear, we can do the target dismantling directly. Of course, sometimes the goal will be more vague, such as enhance the user experience. At this time, we need to make the target change by clear chains.

2. Clarification Chain – Make the target to quantify

The clarification chain is a series of short link processes that imagine something in the invisible material to the tangible thing. For example, in some time our goal is to enhance the user experience. This goal is not in accordance with the “Smart Principles”. We can’t do it directly, we need to turn this goal to quantify. Can we perceive this goal? What is it perceived?

Can these perceived aspects be measured? So we have to measure through other data? At this time we have to ask, why should we improve the user experience? What kind of behavior will be enhanced to enhance user experience? That maybe users are more willing to visit our platform, so you can use the stay, the number of screens, the number of screens, you can measure whether the experience is really improved.

Clarify chain

3. Target dismantling – turn the business target into design goals

After determining a good goal, the target at this time may be more reflected in the service side / product side, more abstract, difficult to directly pass the design, therefore need to disassemble the target into a combination of different data indicators, and select the data that can be turned to the design. Achieve goals.

4. Behavioral Path Analysis – Research User Behavior Data

Based on the user’s behavior path (the user behavior path will be visualized by the user to clicking the browsing data) to disassemble the target, find the design of the segments to achieve the goal.

The difficulty of this method is that it is very familiar to the business. It needs to be detailed to understand all the paths of the user, usually, can also use the “Grasping Little” mode, and organize the user’s main path, study the main path, temporarily abandon the sub-path. For example, the user completes the target G may need to experience the A-B-C-D-E-F, and organize the UV of each page to find the most severe point of the middle leakage.

User behavior path list (example)

User main path list (example)

5. Formula Analysis – More Open Method

That is, disassembled by the calculation formula of the data. For example: GMV = UV * guest bill price * conversion rate, we will know that we can improve our goals by upgrading UV, enhance guest unit price or increasing conversion. The formula method can also be nested, such as conversion rate = order user number / home UV, order users = page A UV * page A conversion * page B conversion rate * ¡¤¡¤¡¤ * page N conversion rate.

It can also be used in conjunction with the behavior path method. It should be noted when using formulas. When there are some ratios indicators, molecules and denominters cannot be large or shrinking, otherwise it is difficult to achieve the growth of total data indicators. This approach is suitable for the goals very clear. Formula Analysis Example

6. Data hierarchical method – a more divergent method

Looking for innovative solutions or data systems are not perfect enough. We use the dimensions to classify, discover the commonality and contact between the data, thus find the point of design. It should be noted that every hierarch of the data requires a uniformity of the dimension, usually the three dimensions of user path data, user portrait data, product data, and layers:

I. User Path Data: How many users do not enter the next page in this page, which pages they have, how much is it? After going to these pages, I went again, and the ratio is much, and the path to the user is looking for common.

User path number example

Second, the user portrait data: What are the users who visit this page, these users have a commonality, such as women, women aged 18-25, are women aged 18-25 graduate students.

User portrait data example

Third, product data: product data is sorted, layered. For example: coupons receive pages UV, coupons collection quantity, coupon use quantity. So how is the coupon rate and the rate of use? What is the ratio of the general ratio of the three coupons? What is the overall ratio? When the page UV is 0-1000, the coupon is received, the coupon is used, and the collection rate and the collection usage are. When the UV is 1001-2000, the coupon is received, and the number of coupons is used. Its collection rate and the collection rate are pushed in this class:

Product data hierarchical example

Product data hierarchical example

After the data is hierarchical, you can nearel, for example, the user portrait data is divided into A, B, C layer, then we can study how the A-layer user behavior data hierarchy is, how is the B layer user behavior data hierarchical How the C layer user behavior data is hierarchical. Once the data is hierarchical, you can find associations:

Looking for the association: It is recommended to use a pie chart and a draw chart. The pie chart is used to view the distribution. For the viewing trend, such as the number of pieces of the user can see the number of coupons, we can find Which part is the number, and the relevance of the number and usage can be viewed with a line diagram. Matching the target: We will easily find the law after we draw, the more the number of coupons we will find that the lower the user’s usage, the combination of formula, we know, the use rate = number of coupons used / received The number of coupons, if we need to increase the receipt rate, we can improve the number of coupons used can also reduce the number of coupons received, but if we reduce the number of coupons received, although the usage rate is improved, it is not helpful for business. It’s just false prosperity, so we should enhance the number of coupons. Reasonable speculation: When we find the power point, we can make a brain, speculatory data is not ideal, follow-up to verify via user research. For example, we can guess that users receive so many coupons do not know what these coupons are, what the threshold is, just see it.

Third, data analysis

Data analysis is divided into three parts, which are sorted by data cleaning, data processing, and chance.

Data cleaning

Data cleaning includes invalid data cleaning, repeating data cleaning, unrelated data cleaning. On the one hand, in order to remove spam data, avoid affecting data results, on the other hand, in order to reduce data interference, improve processing efficiency.

2. Data processing

Since the data we want is not a standard common data indicator, we need to process the original data into the data we want according to our needs, such as the number of users who receive a coupon. The ratio, the user who receives two coupons accounts for the proportion of total users.

3. Python tutorial

It is recommended to use Python, easy to learn, and data processing is more efficient. The code can be multiplexed.

4. Head file

Each Python file has a header file, and the header is imported into various modules, which is commonly used with Matplotlib, Pandas, Numpy, OpenPyXL. Where matplotlib is used for drawing, PANDAS and NUMPY for data processing, OpenPYXL is used to support various data table format imports.

5. Import raw data

Before processing, you need to import raw data as an example, where ./newData.xls is the path and complete name of the original data Excel table, Source_Data is the data structure for storing the original data, and can define it according to your own demand For different names.

6. Create a holiday

We also need to create a new empty list for storing our processing, 0 and 3 in Shape = (0, 3) indicate the number of rows and columns, and the number of initial lines can be 0, no tube, column number Set to the number of columns we want. Where title1, title2, title3 can customize the name of our head. 7. Processing data

Data cleaning, if a certain data is empty, then remove this data.

According to the requirements, the data of the list will be subjected to subtraction, and it is necessary to note that the denominator cannot be 0.

8. Data output

Once the data processing is complete, you can export the handled data to Excel et al. To other colleagues.

9. Drawing

If necessary, you can draw directly to judge the relationship between the respective data.

10. Sample example

11. Opportunity

After data analysis, we can find a lot of opportunities, but different opportunities value is different, so we need to sort according to the value of the opportunity. It can be used directly by the confidence interval, for example, when we optimize the page A, the traffic rate of page A is reduced by [5%, 10%], and of course, it can also be calculated by refined data. Precision, but will cost more energy. After the value sort is complete, we will also need to follow the other members of the project, calculate the ROI based on feasibility and actual resource input, thereby choosing the most suitable opportunity to achieve

Fourth, design plan

Design research

After determining the opportunity point, some exploratory opportunities can be directly implemented by design, while others have conducted reasonable speculation, and verify these estimates in the design of these estimates. . The impact of the environment is subject to the environment, the way we use is to interview and questionnaire surveys, these two are still the same

2. Determine the goal

Here is to determine which speculation we verified, for example, we verify that users are, the problem of our questionnaire needs to come around this goal.

3. Screening users

There are two ways to screen users, one is our directional distribution, one is a directional selection. Directional issuance means that we randomly take a certain number of users from the required users to put the questionnaire or make a call. Directional selection means that we will put on the full amount user, and then filter users who meet our needs from the collected results. When our resources are more nervous, you can use a small amount of samples. Generally, 80% can be found. The problem

4. Design problems

The problem of design needs to surround our goals, from simple to difficult, as much as possible to use the choice questions, subjective questions, and can communicate in telephone interviews. It should be noted that user information is required at the beginning of the questionnaire, so that we can confirm that the user is indeed a matter of eligible users.

5. Collect feedback

After collecting the results, the results are processed according to the use of data cleaning and data processing using the previous data. It is necessary to pay attention to the need to retain source data. If it is a telephone interview, it is necessary to retain phone recording to facilitate the confirmation of subsequent details.

6. Design plan

When we find the goal, after the feedback, we can design our design. At this point, we must surround the target, scene, and use the “Fugera Model” Reasonable Design.

7. Design verification

At the same time, it is necessary to determine the data burying point. We need to determine if our design is valid according to the situation of each data, and find more optimized points through these data, and at the same time, it is best to think before If there is no data, it is buried, preparing for future


The above content is more written, it is difficult to digest, and it is recommended that you read more. In the middle, some contents such as design, the user is actually not meticulous enough, and there is a chance to have a chance to share it with you.

Author: Why bother is complex; public number: Why bother is complex