Why Architecture Matters for Data and Analytics
The reality is that the data architecture of most organizations are complicated. Data systems are now a complex, ubiquitous and critical component of modern firms in contrast to most simplistic data models one would find when it comes to designing an enterprise data systems. The most important task of a data architect is having an end-to-end vision of the flow of information within an organization. Data architecture defines what data is important to the organization and how it will be effectively delivered and managed. It is the term for standards, metadata and models to ensure an organization’s data strategy is in alignment with overall strategic decision making needs.
At the heart of this architectural design is data growth, accumulation speed, complexity and variety. It is not uncommon to find an absence of a cohesive data asset in most organizations with no known structure. Storage of transactional and analytical data exist in legacy systems, desktop applications (spreadsheets/Access), external systems/vendors/providers. Most of these silo repositories can in some cases be invisible to IT thus no control or information available. Needless to mention the duplication of data across multiple business units within the organization or the variation or conflict in metadata definitions. My favorite is the establishment of source data as defined by every department or the conversion of a PowerPoint presentation generated by one business unit to the data source for another. To be fair, this disjointed repositories are a direct result of the epic fail on the part of the enterprise architecture. Its inability to serve the need(s) of the organization as a whole. Thus, a new one is always developed within individual units in its place. This will be covered in greater detail in subsequent posts.
So why does architecture matter? Architecture has a purpose – it defines what you’re building for and the purposes to be served. Data architecture is a combination of technology, process and methods. Harness the right technology for the design, determine the process to build, deploy and manage the components and finally, how it will be used. The architecture is the umbrella for standards, metadata, data models to ensure that the data system meets the strategic needs of the business users.
In other words, architecture and design is all about fitting to purpose and providing flexibility. What does all these mean? Very simply, architecture is building a platform that is flexible to allow a new level of complexity over the original. A system built to change rather than last.
With an insight into the importance of architecture for data and analytics, it is also imperative that we take a quick look at what isn’t architecture. Notably, architecture is not a laundry list of technology or vague list of objectives with no path no actualization. The success of the chosen platform or technology cannot happen in isolation without the right architecture. Data and analytics architecture is not subject to the latest technology but deliberate and carefully planned design that is data-driven.
Key Considerations for Data Architecture
Baseline Data Architecture
For information to be valuable, it needs to be delivered in such a way that it is useful to the users. The foundation of a data system with high business impact begins with the successful integration of the business goals or objectives and technology.
Business objectives need to be clearly defined and in cases where it is non-existent, it is an opportunity for the technical team to be proactive in acquiring the missing pieces from business stakeholders and subject matter experts. Keep it simple, it is an avenue for you to get to know more about the audiences and needs.
Establishing the rules for data governance and master data management is central or critical to the discussion here. This ensures that systems and applications consuming the data have a uniform understanding of the processed data across the spectrum. At this stage, you gain an understanding of the data generation and how it is generated. Also, the triggers affecting the generation of the data within the system and more importantly why it is generated.
The objective here is to ensure that the business architecture considered in scope is well defined with enough flexibility. Also, the thorough understanding of the data whether at rest or in transit and its availability (real-time (historical/stream) occurs at this point. The granularity level required here is entirely dependent on the level of detail required from the business architecture. Access and security are also essential at this stage of the discussion. The baseline architecture includes artifacts such as the enterprise taxonomy and namespaces which offer additional value to the organization. The figure below is a graphical depiction of how best to categorize each activity that occurs at this stage. One can think of the Core Dimension as the Kernel and the Extended Dimension as the Technical aspect of the architecture.
Target Data Architecture
Since the baseline view in general is drafted with very little detail (blue print so to speak), the next step in this sequence is the evolution of the information previously captured. As already alluded to, data architecture in real enterprise is extremely complex. Data with critical information is stored everywhere in the organization, it is important to perform a thorough business analysis and take into account every possible storage.
Driven by the requirements captured, design decisions such as denormalization, optimizations and platform adoption are applied. The metrics gathered in the previous state are harnessed and further amplified at this stage for the strategy of ingestion, distribution and consumption. The business objectives are further broken down into data requirements that define the databases and data models used by business consumers. This ensures that redundancies are identified and non-optimal point reduced to a minimum. Also ensuring the data integration is feasible for a successful integration processes across multiple systems.
The figure below is a high-level graphical depiction of how best to categorize each activity that occurs at this stage.
There are two more aspects to this discussion – Gap Analysis and Impact across architecture landscape. Next time we will look into how they fit in the grand scheme.