Do we still need an Enterprise Data Warehouse?
On the way studying for a Microsoft Data Warehouse Exam, I was asking myself, if today, a traditional enterprise data warehouse is still needed and the time I’m spending with my studies is worth it. I think there is no question that data has become more and more important and is nowadays a strategic asset for companies to transform their businesses and uncover new insights. But does a traditional data warehouse fit into that?
A data warehouse which is categorized as “traditional” and that’s what my studies is about, has the main target to be a central repository for all historical information in a company with the assumption, that the data would be captured now but analyzed later. For this, various data from transactional systems like ERP, CRM and LOB applications are extracted, transformed and loaded (ETL), normally first into a staging area and then cleansed and enriched and afterwards transferred into tables, that means an relational schema, in the data warehouse. The resulting data warehouse becomes the main source of information, a central version of the truth, for report generation, analysis, and presentation through ad hoc reports, portals, and dashboards.
What insiders recognized is, that the data warehouse described ahead is undergoing a transformation. Virtualization and moving resources to the Cloud is one reason. Another reason is, that organizations try to incorporate insights from data that don’t fit the traditional relational database model and that the velocity of how that data is captured, processed and used is increasing. Companies are using now real-time data to change, build, or optimize their businesses as well as to sell, transact, and engage in dynamic, event-driven processes like market trading. The traditional data warehouse simply was not architected to support near real-time transactions or event processing, resulting in decreased performance and slower time-to-value.
A modern Data Warehouse has to support workloads of relational and non-relational data, whether they are on-premise or in the cloud and whether they use on-premise solutions or solutions and services in the cloud. The so called “Logical Data Warehouse” (LDW) or “Modern Data Warehouse” uses repositories, virtualization and distributed processes in combination. Instead of working through a requirements-based model of the traditional data warehouse where the schema and data collected is defined upfront, advanced analytics and data science uses the experimentation approach of exploring answers to ill-formed or nonexistent questions. This requires the examination of data before it is curated into a schema allowing the data to drive insight in itself.
So the recommendation and the answer to the opened question is, that companies should use both approaches and for established data warehouse teams to collaborate with this new breed of data scientists as part of a move towards the logical or modern data warehouse.