Understanding EDWs: what is an enterprise data warehouse?By Jon Taylor on February 28, 2021 - 5 Minute Read
Data. Businesses have a lot of it. But the most successful businesses don't just store it. They have a system that helps them collect, organize and analyze it to enable them to improve processes and products.
For larger companies, one of the best tools to do just that is an Enterprise Data Warehouse (EDW). EDWs are essentially giant storage sheds that help to manage data, store it and make it easy for teams to drive value from it.
This article will take an in-depth look into what cloud-based EDWs do and different data warehouse types to help you choose the right one for your organization.
Let’s get started ?
What is an enterprise data warehouse?
An enterprise data warehouse (EDW) is a data storage facility that safeguards customer and business data. Cloud-based EDWs, which will be focus of this article, are capable of collecting data and compiling multiple datasets from various sources. Furthermore, warehouses serve as repositories from which you can analyze data and make it accessible to people in an organization. Cloud-based EDWs can simplify data processes like:
- Collection: compiling data from multiple sources and databases
- Organization: restructuring raw data sources into usable inventories to make information easier to find
- Reporting: streamlining data changes and transactions to improve analysis and querying
In short, a cloud-based enterprise data warehouse is a huge storage facility for a company’s data. Behind the scenes, it organizes and simplifies data so it’s useful for your team when they access it.
It’s important to remember that, like any addition to your tech stack, enterprise data warehouses have pros and cons.
The good (and the bad) of investing in an enterprise data warehouse
Leveraging a cloud-based EDW requires moving data such as customer records and historical financial information about a business and its employees into the cloud. While the move will make data more accessible and organized, companies will likely have years of records from numerous data sources to export into the cloud, with information spread across a number of different internal systems, so the change will take a significant amount of time.
Think about all of the data points that have been stored over the years that have been under-utilized. We’re talking about data from marketing and customer relationship management (CRM) systems, historical sales reports and other databases to name just a few.
As EDWs collect and organize data automatically, it’s easier to spot transactional patterns or monitor consumer behavior without combing through records manually.
However, EDWs aren’t an investment you should set and forget. Your data warehouse needs will change as your business does, so it’s important to reassess what you need from an EDW if your company grows or you’re building a data warehouse for the first time. A startup’s EDW architecture will look entirely different from a business with 20,000 customers.
How metadata helps turn data into actionable information
Metadata is arguably the single biggest incentive for migrating information across to an EDW. Metadata automatically tracks changes and updates to a piece of data and keeps a historical record of it. Think of it as a digital paper trail for every action taken inside your EDW.
There are two main types of metadata your cloud-based EDW will use:
- Business metadata: information about customers, transactions, sales and day-to-day running of the business that your team needs quick access to
- Technical metadata: the bread and butter of the EDW. This data is useful for the people in charge of managing the EDW to ensure it runs smoothly and processes information correctly
So, what does this look like in action? Well, company metadata can store information about sales or customers to keep an accurate record of transactions. For example, a data set in your customer transaction spreadsheet may have something like “02/17/2022. 432. SR908. $45,698.”
But…what does this mean exactly? ?
Your metadata can help untangle the code and present the information so your team can use it. 02/17/2022 is the date, but the metadata can give us a better insight into the other numbers:
? 432: Indicates a customer account number
?? SR908: The identification code for a sales rep on your team, Tim
? $45,698: Represents the transaction amount between your company and the customer
We know Tim closed a $45k deal with customer #432 in February thanks to metadata. Inside an EDW, this data can be kept in bulk spanning years to help keep accurate records and turn every piece of information into knowledge.
What’s the difference between a big data warehouse and an enterprise data warehouse?
You’ll often hear the terms ‘big data warehouse’ and ‘enterprise data warehouse’ mentioned together, but they are fundamentally two different storage facilities. However, in order to understand the difference we first need to understand what big data is!
Big data is a technology to store and manage large amounts of data. It takes structured, non-structured or semi-structured data as an input and manages the processing of that data in a distributed manner, often splitting its tasks into smaller units that can be solved in parallel and then brought back together for a final result.
The underlying technology can be used to handle enormous amounts of data, but you need some significant processing “oomph” to get the most from it.
In an EDW, however, the data is structured and presented in a way that makes it very simple to analyze and gain benefit from. The warehouse itself often lives on a single server and processes all the work within. It’s easy to update and amend existing data tables and it’s easy to use, not requiring as much sophisticated technology as big data.
For the vast majority of businesses, there isn’t the need to leverage big data. However, it’s useful to know the difference between the two as you look to build your own data infrastructure.
What does an enterprise data warehouse architecture look like?
EDWs are essentially a huge storage facility that automatically organizes and analyzes data.
The facility processes and stores data while simultaneously making information easily accessible. But when you dig down into the architecture of a typical cloud-based EDW, it can be daunting:
Example of a typical data warehouse architecture. Image source: Yalantis
Inside an EDW, you’ll find databases, analysis tools and operational systems. Let’s take a more in-depth look at what each of these elements does ?
- External Data: Raw data from various sources like databases and IT systems is funneled into the EDW. It’s held here until it moves on to the next phase of the EDW to be processed.
- Extraction area: Raw data is extracted and loaded into the warehouse. It’s separated into different storage areas for easier access and analysis.
- Transformation area: Data is transformed into appropriate formats in this space of the EDW. The warehouse splits and converts data into suitable files to make it easier to store and access.
Enterprise data warehouse vs data mart: what’s the difference?
A data mart is a subsidiary of an EDW. The reason is that a data mart is:
- Smaller: data marts have much less capacity than an EDW (usually 100 GB or less)
- Restricted: while EDWs store and manage data across an entire business, data marts usually only cover certain areas of a company
- Less extensive: EDWs store raw data from multiple databases and sources. Data marts are less extensive and pull limited data from a few chosen sources
The fundamental difference between EDWs and data marts is their use cases. EDWs are a one-stop shop for an enterprise-sized company to store all of its data, while data marts are used as a stop gap or to manage smaller information databases.
Three examples of enterprise data warehouse architecture
As EDWs have been around for a long time, there are a variety of architectures to choose from, including on-premises and cloud options.
Each architecture has pros and cons regarding capability and complexity to set up and manage. Before investing in an EDW, it’s crucial to figure out what you need the warehouse to do, how much data you need to store and how hands-on your team will be.
Here are the three main EDW architecture types to choose from ?
1. Virtual EDWs
Virtual EDWs are a simple warehouse solution for businesses that want to manage databases.
This EDW collates source databases and requires minimal changes to existing tech infrastructure. Data doesn’t have to be transformed into different formats or moved into a third-party storage facility.
However, the simplicity of virtual EDWs has downsides. Data may be spread across various sources, and as virtual EDWs don’t always transform data into uniform formats, it can be harder to access or analyze it.
If your business has unstandardized data or projects that need in-depth analysis, this EDW isn’t the best fit.
2. On-premises EDW
This EDW is what it says on the tin: a data warehouse that lives inside business premises or a dedicated storage facility.
On-premises EDWs are connected to data using various software and infrastructure in a staging area or a physical warehouse. Although this warehouse was the traditional go-to structure of EDW architecture, the rise of cloud-based EDWs made it a more expensive and less versatile option.
The cons of an on-premises EDW are obvious. You need to invest in data warehouse equipment, which can be very expensive. You also need a dedicated space to store the equipment. On top of that, IT specialists must manage and maintain the equipment — another hefty cost.
Cloud-based enterprise data warehouses
The final architecture (and, in our opinion, the best choice) is the cloud-based EDW that we’ve talked about quite extensively so far.
Cloud EDWs are central storage systems for large amounts of data that are hosted in the cloud, so employees can access it from anywhere. This type of data warehouse is a specialized need, so they’re usually managed and maintained by a third-party company.
The architecture of a cloud-based EDW is simple: you’ll get storage, client services and customer support without having to get into the weeds of maintaining the system. The third party managing the EDW will take care of aspects like node processing, storage layers and data management activities.
This type of hands-on approach to EDWs comes with a cost, but depending on your circumstances, the benefits of a fully managed system can outweigh it.
Is your business ready for an EDW?
EDWs are an invaluable tool for businesses to keep data safe and functional. However, not all EDW architecture is the same. Some warehouses are geared towards simple databases, while others are capable of managing an entire company’s datasets.
The right EWD for your company is the one that matches your business’ specific needs. Do your research on the pros and cons of different EDW architectures and invest in a system that suits your goals.