A comprehensive computer data product selection comparison (including digital warehouse, report, BI, middle station, data management)

In today’s data, the data is not underestimated for individuals and businesses. Therefore, many companies have strengthened data jobs and strengthen corporate data construction. Next, the author organizes the selection comparison of the ultra-comprehensive enterprise data product, I hope to help everyone.


This from top to bottom is emphasizing the era of digital transformation, more and more companies attach importance to data, and more and more companies have data construction needs.

No matter whether you do any data, you will inevitably have a certain information basis, but also the basis of data construction, less data platform, data application tool, data management tool, etc.

With regard to enterprise data construction, I have been engaged in nearly 7 years. From technology to project management, I have done Party B, and I have also experienced experience in the de facto. Experience, and share the “insider”.

The products involved include: Digital, Big Data Platform, Report, BI, Data Table, Data Governance, etc.

First, the number

The data warehouse is a solution. It has different architectures (traditional digital market, big data platform, etc.), and there are many layers and components under the architecture, and the architect capability is more necessary than tools. said.

Regarding the selection of the number of ports mainly involved: data storage scheme, ETL, and front-end applications.

The underlying data warehouse server is usually a relational database system, and the commonly used scenarios have Oracle, DB2, and GreenPlum, TEREDATA and other data warehouse professional solutions.

Traditional relational databases are: Oracle, MySQL, DB2. Large-scale parallel processing Database: Vertica, Teradata (Business), GreenPlum.

The old rivers and lakes of Teradata, the banking industry is used, but the cost is also really expensive. At present, we have used GreenPlum, which is the fastest and maximum cost-effective high-end data warehouse solution based on the industry. GreenPlum is based on PostgreSQL, Open source in 2015.

I know that there are 3 domestic four rows, and 5 logistics companies have four in use, and many companies migrate from Teradata to GP.

The mainstream of the big data platform is: Hadoop + Hive. This set of schemes have more common to say more, and the big data platform manufacturers that have been said later are based on this to design the platform products.

ETL tools, Kettle, Tablend, and Penthao are used.

Talend: Based on Eclipse, it has good scalability, stability, and customizable (you can develop Eclipse plugins yourself), and obey the Eclipse standard (such as the file directory structure is a structure familiar with programmer).

Tale has good embedded because it generates Java code, which can be combined with other systems, which requires users to join Java.

PENTHAO is an old tool. In 2001, the first version was released. Kettle is a component of the Penthao’s entire solution for data integration.

It is also based on Java development, but it does not require users to be Java, hide the underlying implementation details. The main disadvantage is that it has poor scalability compared to Talend. Since it is difficult to extension, the components available in the community are relatively small.

The front-end application tool is mainly reports, BI and data mining, and the first two selection will be described later.

Second, big data platform

In 2013, the Hadoop system continued, marking the big data application scenario can be satisfied. At this time, some companies have the frontier bus started to do big data platforms. The manufacturers are very many, and the mainstream is three, Huawei, the star ring is also Xinhua three.

These manufacturers have also begun to go to the concept of “data middle”. However, after all, it is a traditional software manufacturer. I don’t know much about the “data middle platform” mentioned by Internet Corporation. It seems that there is no difference between the big data platforms they do.

In fact, all big data platforms have the basic capabilities of “Data Medium”. You take the product platform and data in the data platform, in fact, the various underlying architecture and function are generally the same.

For most companies, the MTD is still suitable for the “rich owner” of the Internet operator bank, “data giant”, really TED more emphasizes data service, but ordinary enterprises have a few true Dressing this pain, or is old and managing data management.

If your company has a prayed business system, the amount of data reaches the PB level, massive data storage, calculation needs, three inside selection, compare the plan.

Third, report

It is still very few, and the mainstream domestic mainstream will be dry FineReport and Run dry. At first, the crystal statement, which is very hot, has recently seen it, and the open source tool JASPEREPORT is also used.

In terms of selection, ordinary small companies, 1 ~ 2 data workers, it is recommended to purchase a report platform, do not have to rise.

The sail finereport is complete, the ecology and service are very good, the reporting form engineer is also better, the price is slightly expensive than other manufacturers, the brand and service premium can also understand, after all, domestic TOP1. Run dry has been low-priced by sail soft, it is known as a set of thousands of dollars, it should also be carried out by concurrency, it is not considered service and project implementation, Party B can consider.

Four, BI platform

Bi is a foreign market in the world, Bo, Brio, Cognos, MSTR, etc., Bo, Brio, Cognos, MSTR, etc., is the old brand BI manufacturer, I use their products when I do pure technology. It is not too cost-effective according to product + users.

The architecture is also really complicated, and it is really difficult to compare the use style of the Internet era products.

In the early domestic market, there is only a statement in the domestic market, but the BI-related large numbers have been monopolized by foreign manufacturers, but the demand for BIs behind will be more and more obvious. It is similar to the development of the information industry, and the demand for small and medium-sized enterprises has gradually revealed.

So the country is also beginning to develop BI, such as sails BI, BDP, etc. There is no difference between things, the price is more beautiful than foreign, and it is basically limited to the server, does not limit the number of users, how to use OK.

Later, Tableau, Powerbi’s tool-type products fired, there were a lot of personal users who were supported, but they were really easy to use, but in the enterprise application scenario, they depends on performance and concurrency, they will see people.

However, use BI, it is not 2,3 people, it must be built, and then make various visualization, multi-dimensional analysis, etc. So there is a post of a port engineer, ETL engineer, a BI engineer.

Of course, if you are a cattle, a person is all negotiated, and many units are indeed doing.

Middle-scale companies, there are several business systems, it is recommended to purchase the BI system, what digital, indicator system, fixed report, multidimensional analysis, data visualization is all. There are several people in the construction period, and it is very comfortable after building, and if the business is fixed, it is OK to keep two people.

V. Data middle

The concept of “Zhongtai” is Ali promotion.

Ali learned this from Supercell, and then promoted to Ali’s internal power. “Data Medium Ta” is also promoted at that time, so main vendors are companies from Ali.

Kangaroo clouds, plummets, singing clouds are all companies founded by Ali’s P9, and technology is almost.

Kangaroo cloud is the Ali DBA team, compared to the thief, firmly tied with Ali, Ali orders, kangaroo clouds, follow the behind, the rear of the Ali products, the product design is OK, The propaganda is also very good, the business carried out the water; the legendary point cloud is Ali’s number warehouse and number of teams come out, and it seems that there is no top two strong, with a lot of contact (number is Ali’s own product).

If your company is complex, the amount of data is huge, the key is a number of customer application scenarios. The data interaction is low. It requires a lot of customer data value. It is also very urgent, and it can be considered to study the middle platform program.

Six, final summary

The report platform solves a large number of batch jobs such as fixed reports, automation reports, support printing and computing. The company has the need to solve the platform of the Soft, with 1, 2 report engineers can get it; the BI platform is in the report Increase the ability to solve the multidimensional analysis, self-service query report, requires the number of large-scale data support, requires the BI engineer to set various metrics, dimensions, do multidimensional analysis reports; do not have a fixed statement; big data On the basis of the BI platform, the platform solves the storage, calculation, real-time calculation of large data volume; no need to pay attention to the underlying massive data storage, calculation, real-time calculation and other issues; need to increase large data engineers to maintain cluster maintenance, based on big data The various development work of the platform; the data 3 is based on the big data platform, providing ID, unified model, unified service, additional label plant, user analysis and other partial internet properties. Personnel need to increase the data manager in the data, other work is still completed by large data engineers, big data analysts, etc .; data governance is starting from the BI platform, which is constantly strengthened in large data platforms and data. So the BI platform, big data platform, and data in the data in the data, there is data governance, which also adds the concept and ability of data assets and billing in the data middle station.

Author: Li Qifang, focus data analysis and enterprise data management; public number: Data analysis is not something