Estimated reading time: 5 minutes
The answer to why do I need to learn data modelling is that it is a core concept anyone who works with data should understand.
Why someone would read this article
- You have heard the term “data modelling” or “modelling data” but, you are unsure what it is
- From your initial learning about data, you know that “data modelling” is a fundamental concept that you need to understand
- A quick high-level overview of “data modelling” is what you are looking for
What you the reader will get out of this
- After reading this article, you will understand the following:
- What data modelling is
- The importance of data modelling
- When data modelling occurs in the data lifecycle
Firstly, data modelling as a concept is covered by showing the reader an easy to understand example. Then an explanation of each of the critical concepts of data modelling is presented. The different types of data models are then covered, and finally, why data modelling is essential is outlined.
Table of contents
Introduction
What is covered here is a simplified explanation to give you the reader an overview of what data modelling is all about and arm you with the understanding of the key concepts.
Understanding the distinction between a data model and data modelling is important.
A data model is a design or a blueprint of the data components that make up a data solution. The data components are known as entities and the model describes the entities and the relationships between the entities (all are explained shortly). The data model represents the business process, business activity or problem you want to model for the purpose of reporting or exploring insights about the data model.
Data modelling is the process of designing a data model and there are well understood approaches to this.
This can be summed up as a data model is a design of a data solution while data modelling is the process of designing a data model.
Example
Let’s say you are tasked with building a reporting solution to allow an ecommerce company Esales.com to report on their sales. Most if not all ecommerce companies in existence today will use pre canned sales reporting that their ecommerce application provides but we have used this scenario here as it is an easy one to understand. You talk to the head of sales Melanie at Esales.com, and you discover the key aspects that are involved in the entire sales process at Esales.com. You quickly work out that as part of the process there are customers, products and online sales (or sales transactions).
The following diagram shows a data model that captures the key concepts of the sales process at Esales.com. This is known as a conceptual model as it defines the entities and the relationships between the entities.
Key Concepts
- Entities
- Attributes
- Relationships
Entities
An entity is a basic object in a data model. Entities are customers, products, employees, students, sales transactions, payments etc. Entities are represented as single tables in a database. A single entity is known as an instance and an instance becomes a single record in a database table.
Attributes
Attributes are the details of entities such as name, address, email, phone number for a customer entity. These are the columns in a table. There are special types of attributes called keys which allow one to manage relationships.
Relationships
Relationships capture the links between entities. A customer may have one or many sales transactions for example, whereas a sales transaction has only one customer. This type of relationship is known as a one to many relationship. There are many different types of relationships which allow you to model the associations between entities.
Different types of models
There are 3 main types of models which vary in detail and allow you to develop a data model in stages giving yourself the time to address the key aspects at each stage.
Conceptual
A conceptual model is the first model you will produce. You could think of this as what you would draw on a white board or your notepad. The key entities and their relationship to each other are captured. This allows you to ask various questions of the model and explore relationships between the entities.
Logical
A logical model is a natural extension of a conceptual model. A logical model goes into further detail and expands out the conceptual model. Attributes for each entity are defined and keys to manage the relationships are added to the model.
Physical
A physical model is how the data model will be represented in the physical database that will implement the model. Each entity maps to a table in the database and all relationships are managed by primary and foreign key. Details about the data types for the attributes and their sizes are defined. There are tools available that can convert a logical model into a physical model for various relational databases.
Benefits of data modelling & why use a data model
- A data model forces you to think about what problem you are trying to solve upfront. Jumping straight in and creating tables in database will always end up in countless iterations of rework. Taking the time to design the data model and asking yourself probing questions as to how it should be modelled will ensure the model is robust.
- Common language between data professionals – Data models allow data professional to work to a common language. It is very easy for someone to pick up a logical model and convert it to a physical model for example. The notation is standard across all models thus ensuring conformity in understanding.
Conclusion
This is a quick and simple introduction to data modelling. If you are a Business Intelligence professional, data engineer or data scientist having an understanding of data modelling is valuable. There is much more to explore and learn in data modelling, but this quick primer should give you a start.