Data is a corporate asset. But it is firstly a debt. The costs of acquisition, hardware, software, operation, and talents are very high. Without right management, it is unlikely for us to effectively extract the value from data. To make big data a success, we must have all the disciplines to manage data as a valuable resource. Data management is much broader than database management. It is a systematic process of capturing, delivering, operating, protecting, enhancing, and disposing of data cost-effectively, which needs the ever-going reinforcement of plans, policies, programs and practices.
The ultimate goal of data management is to increase the value proposition of data. It requires serious and careful consideration and should start with a data strategy that defines a roadmap to meet the business needs in a data-driven approach. Every chief data officer should ask themselves the following questions:
- What problem do we try to solve? What value can big data bring in? Big data is hot and thus many corporations are hugging it. However, big data for big data is apparently wrong. Other’s use cases do not have to be yours. To glean the value of big data, a deep understanding of your business and problems to solve is essential.
- Who hold the data, who own the data, and who can access the data? Data governance is a set of processes that ensures that important data assets are formally managed throughout the enterprise. Through data governance, we expect data stewards and data custodians to exercise positive control over the data. Data custodians are responsible for the safe custody, transport, and storage of the data while data stewards are responsible for the management of data elements — both the content and metadata.
- What data do we need? It may seem obvious, but it is often simply answered with “I do not know” or “Everything”, which indicates a lack of understanding business practices. Whenever this happens, we should go back to answer the first question again. How to acquire the data? Data may be collected from internal system of records, log files, surveys, or third parties. The transactional systems may be revised to collect necessary data for analytics.
- Where to store the data and how long to keep them? Due to the variety of data, today’s data may be stored in various databases (relational or NoSQL), data warehouses, Hadoop, etc. Today, database management is way beyond relational database administration. Because big data is also fast data, it is impractical to keep all of the data forever. Careful thoughts are needed to determine the lifespan of data.
- How to ensure the data quality? Junk in, Junk out. Without ensuring the data quality, big data won’t bring any values to the business. With the advent of big data, data quality management is both more important and more challenging than ever.
- How to analyze and visualize the data? A large number of mathematical models are available for analyzing data. Simply applying mathematical models does not necessarily result in actionable insights. Before talking about your mathematical models, go understand your business and problems. Lead the model with your insights (or a priori in terms of machine learning) rather than be lead by the uninterpretable numbers of black box models. Besides, visualization is extremely helpful to explore data and present the analytic results as a picture is worth a thousand words.
- How to manage the complexity? Big data is extremely complicated. To manage the complexity and improve the data management practices, we need to develop the accountability framework to encourage desirable behavior, which is tailored to the organization’s business strategies, strengths and priorities.