Lake, warehouse and data mart

Let's consider three types of cloud data storage, their differences and applications.







Data lake



A data lake is a large repository of raw raw data, both unstructured and semi-structured. Data is collected from various sources and simply stored. They are not modified for a specific purpose and are not converted to any format. Analyzing this data requires lengthy pre-preparation, cleaning, and formatting to make it homogeneous. Data lakes are great resources for city governments and other organizations that store information related to infrastructure disruptions, traffic, crime, or demographics. The data can be used later to make budget changes or revise resources allocated to utilities or emergency services.



Data store



A data warehouse is data aggregated from different sources into a single central repository that unifies them in terms of quality and format. Data scientists can leverage data from storage in areas such as data mining , artificial intelligence (AI) , machine learningand, of course, in business intelligence. Data warehouses can be used in large cities to collect information about electronic transactions from various departments, including data on speeding tickets, excise taxes, and more. Data warehouses can also be used by developers to collect terabytes of data generated by automotive sensors. This will help them make the right decisions when developing technologies for autonomous driving.



Data Showcase



A data mart is a data warehouse designed for a specific circle of users in a company or its division. The data mart can be used by the marketing department of a manufacturing company to identify target audiences when developing marketing plans. It can also be used by the manufacturing department to analyze performance and error rates to create conditions for continuous process improvement. The datasets in the data mart are often used in real time for analytics and actionable results.



Lake, Warehouse, and Data Mart: Key Differences



All the repositories mentioned are used to store data, but there are significant differences between them. For example, a data warehouse and a data lake are large repositories, but a lake is usually more cost effective in terms of implementation and maintenance costs because it stores mostly unstructured data. 



Data lake architecture has evolved over the past few years and is now capable of supporting more data and cloud computing. Large amounts of data flow from different sources to a centralized repository. 



A data warehouse can be organized in one of three ways:



  1. As a managed service offered by cloud providers.
  2. , .
  3. , , .


Data in a warehouse is easier to use for different purposes than data in a lake. This is because the data in the warehouse is structured and easier to retrieve and analyze.



A data mart contains a small amount of data compared to a warehouse and a lake, which is categorized for use by a specific group of people or a division of the company. A data mart can be presented in the form of various schemes (stars, snowflakes, or vaults), which are defined by a logical data structure. The data vault format is the most flexible, versatile and scalable.



There are three types of data marts:



  1. A dependent data mart that consists of parts of an enterprise data warehouse. It contains sets of primary data for the warehouse.
  2. , , .
  3. , . .


The choice of the type and structure of the data repository largely depends on the needs and requirements of the company. If that's what matters to you, take advantage of hybrid cloud storage, which is flexible and scalable, as well as a more comprehensive, informed approach to problem solving and decision making.



IBM offers a variety of cloud storage and data mining solutions. Tanmay Sinha Program Director, Db2 Portfolio Blog Link


























All Articles