The Advent of Automated Data Modeling! Proceed with Caution
Our swift-moving world is generating huge amounts of data at a dizzying pace. It’s a fact. Some estimates suggest that 90% of today’s data was created in just the past two years. Not surprisingly, more data can mean more value.
For today’s companies, harnessing, analyzing and sharing data is paramount to ensuring competitiveness. But the many moving parts associated with managing these far-flung bits of information can make it difficult. Data demands some measure of structure. And for any structure to be viable – a solid, well-built foundation is a must.
When it comes to data management, data modeling is the foundation. Let’s take a moment to establish the fundamentals.
What is a data model all about?
- A data model is a collection of conceptual objects used to describe data, data relationships, data semantics, and data consistency or inconsistency.
- A data model visually represents the nature of data, the business rules governing data, and how data will be organized in the database.
- A data model provides a way to describe the design of a data repository or database at the physical, logical and view levels.
- Typically, there are three types of data models used when moving from requirement considerations to the actual database being utilized:
- Conceptual: describes WHAT the system contains
- Logical: describes HOW the system will be implemented, regardless of the DBMS
- Physical: describes HOW the system will be implemented using a specific DBMS
- Ultimately, the purpose of a data model is to describe the concepts relevant to a domain, the relationships between those concepts, and information associated with them.
Though data modeling has existed in some form almost since the advent of computing, a significantly original approach was engineered in the 80’s by G.M. Nijssen. Originally called Nijssen’s Information Analysis Methodology, its purpose was to show representations of relationships instead of focusing on types of entities as elements in data tables. The emphasis on relationships was entirely novel in those days, and this once ground-breaking idea remains particularly relevant in modern times.
A New Way to Think About Data Modeling
To be sure; the systems, applications, and data constructs of today have shifted far beyond ‘traditional.’ The impact of these new models and tools are on the cusp of creating remarkable advantages for innovative organizations. But there’s a downside. Many tool vendors have boldly suggested that directing your
attention to data models is no longer required. Just point their fabulously smart tool set at the data and it’s good to go. That could work…to a point.
The truth is data modeling has become easier, instigating a variability in the data management landscape. No longer the exclusive province of Data Scientists and IT types, data management is moving towards the stratum where it’s most needed, the business. But users can still greatly benefit from having basic statistical knowledge, and might be best served by allowing oversight and assistance from those more adept at the discipline.
What’s Driving this Sizable Shift?
Gartner says, “Predictive analytics vendors are trying to reach a broader audience than traditional statisticians and data scientists by adding more exploration and visualization capabilities for novices and business users.” The overarching value is clear, enable users to focus on the data and the data’s significance. That’s a good thing. The emergence of self-service data modeling is a fundamental aspect of these capabilities; and said capabilities are mostly augmented by machine learning and some Natural Language Processing (NPL).
The Impact of Machine Learning
Analytics vendors have legitimately reduced the complexity of data modeling mainly by tooling more automation into the process. This competency is attributed to the marvels of Machine Learning. By deploying algorithms that can determine patterns between previously existing data and future data; the capacities of previous data can be enhanced to account for future data. When this knowledge is applied, users can create new models (designed to address specific business problems or use cases) from previous models. Stated differently, machine learning can be used to base future models on the results of queries and analytics provided from former ones. With these smart new tools, in just about all cases, the data modeling process is done automatically.
Let’s dig a little deeper and examine how machine learning actually assists with automated data modeling.
- Deep Learning: Deep learning is associated with Cognitive Computing and is ideal for sets of big data. Gartner defines deep learning as “an increasingly popular variant of neural networks, with more than the typical two processing layers. The objective of the additional layers is to have higher-level abstractions (that is, features), induced from data that aim at better classification and prediction accuracy.”
- Ensemble Learning: Ensemble learning algorithms successfully aggregate outputs from a series of predictive analytics models. The intention is to form one distinct output. An advantage of this approach is that it combines different types of models and assists in fusing their outcomes.
- Bootstrap Aggregating: This super-charged algorithm style improves the precision of machine learning methods in regression models, as well as other model types.
It’s also interesting to note, and downright intriguing, that some analytics solutions can provide query results via NLP, and can also offer explanations for query results with this semantic-based technology. To clarify, NLP is a method of getting a computer to interpret and understand text without the computer being fed some sort of clue or calculation. An additional option of users asking data related questions of the computer is also on track for the coming years.
What’s the Bottom Line on Efficient, Effective Data Modeling Today?
It’s simple really. Since the exponential explosion of data continues to blaze forward; data modeling techniques, and indeed most aspects of data management, have been forced to evolve at a frantic pace in order to try and keep up. As a result, just about everything associated with data modeling has begun a momentous shift forward. Mostly due to the advent of supersmart analytics technologies which offer excellent opportunities to automate processes and reduce complexities.
However, it’s beyond reckless to interpret these advances as a means to disregard the discipline of data modeling. This basic exercise should never be an afterthought. If data scientists, database analysts, application developers and a growing gaggle of novices (I.E. business users, executives, administrators and the like) intend to use the foundation of data modeling to extract more value from their cache of divergent snips of knowledge, there’s a fundamental set of requirements that absolutely must be sustained. They are:
- Be clear regarding what information exists and what it’s about.
- Be able to extract portions of the information suitable for a particular purpose.
- Be able to efficiently exchange data between organizations and systems.
- Be able to integrate information from various sources, and know what data is about information that already exists, and what is about something new.
- Be successful at sharing data between applications and users with different views.
- Be well versed at overall management of the data, including the history, for life.
These are the core of outstanding data management and must occur with or without the new highly-advanced sets of tooling. Either way, someone (more than one someone) in your organization must have a solid sense of the ‘relationships’ between data.
As the need for pure dimensional data warehouses lessens, and traditional methods of data modeling fall from favor, one thing remains crystal clear; the requirement to understand the contextual implications of data relationships is an ongoing imperative. This essential element won’t change anytime soon. If ever. Our position is unambiguous. Savvy organizations should approach data modeling as a business process benefit, unrelated to the specific details. The key word, harkening back to Nijssen’s Information Analysis Methodology, is
‘relationships.’ There will always be data, (and brilliant, glitzy, point-and-shoot tools notwithstanding) there will always be a need to model it.