Data Warehouse: Key Tool for Big Data

Datamation content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

Data warehouses are repositories where large amounts of data can be stored and accessed for reporting, business intelligence (BI), analytics, decision support systems (DSS), research, data mining, and other related activities. While they’re always associated with large amounts of data, data warehouses are not simply about massive storage capacity—rather, they’re about making disparate types of data from many different sources accessible to support decision-making. This article explains the concept of data warehouses, explores their construction and uses and explains how they differ from other types of large-scale data storage.

What is a Data Warehouse?

A data warehouse is a storage architecture to support the retention and access of large amounts of data used for a variety of decision-making purposes. They are optimized to retain and process large amounts of data fed into them via online transactional processing (OLTP)—a type of data processing that executes many concurrent transactions as in online banking, shopping, or text messaging, for example— and other high volume systems. This data can then be used for reporting, search, and analysis.

Data warehouses are designed to ease the function of analytics by bringing together data from disparate sources into a central repository where rapid analysis can be carried out. Otherwise, data scientists and analysts have to extract the data they want to analyze from different sources and bring it into an application for analysis. 

A data warehouse can gather data from many different sources—including traditional relational databases, transactional systems, and large swaths of unstructured data from multiple sources—where it can be accessed by BI, analytics, and artificial intelligence (AI) applications for prediction, decision-making, and evaluation. 

How are Data Warehouses Constructed?

Data warehouses are optimized to deal with large volumes of data. While most are kept in the cloud, some are still kept on mainframe systems and enterprise-class servers. Data from OLTP applications and other sources is extracted for queries and used by analytical applications. 

Data warehouses can be designed to receive and process different types of data, with data volume, frequency, retention periods, and other factors determining the specifics of construction. Business goals and objectives lead the design, which is then focused on collecting, normalizing, and cleaning the relevant data. 

Perhaps the most vital aspect of design is the underlying storage infrastructure. Storage media must be capable of hosting a large quantity of data—if it’s cloud-based, the appropriate storage tier should be chosen to meet the needs, balancing cost, capacity, and price.  Flash media offers the highest performance, for example, but at the highest cost. Hard disk drives (HDDs) offer better capacity for cost, while hybrid flash/HDD solutions can boost performance without breaking the bank, making it possible for analytics systems to move needed data into flash for faster processing.

Some data warehouse architecture is designed primarily to cope with structured data from relational databases. As most modern data warehouses collect and store data from both the cloud and on-premises systems, they must be set up to cope well with both structured and unstructured data like emails, text messages, and multimedia.

Subscribe to Data Insider

Learn the latest news and best practices about data science, big data analytics, artificial intelligence, data security, and more.

Similar articles

Get the Free Newsletter!

Subscribe to Data Insider for top news, trends & analysis

Latest Articles