Data modeling: why you need it and how to implement it

Data modeling dramatically simplifies the interactions between developers, analysts, and marketers, as does the reporting process itself. Therefore, I translated the IBM Cloud Education article on the value of modeling and added information on my own about how to transform data for modeling.





Data modeling

Learn how data modeling uses abstraction to represent and better understand the nature of data in an enterprise information system.





What is data modeling

Data modeling is the creation of a visual representation of the entire information system or part of it. The goal is to illustrate the types of data that are used and stored in the system, the relationships between these types of data, how data is grouped and organized, and their formats and attributes.





Data models are built based on business needs. The rules and requirements for the data model are determined in advance based on feedback from the business, so they can be included in the development of a new system or adapted to an existing one.





Data can be modeled at various levels of abstraction. The process begins by collecting business requirements from stakeholders and end users. These business rules are then translated into data structures. The data model can be compared to a roadmap, an architect's blueprint, or any formal schema that contributes to a deeper understanding of what is being developed.





Data modeling uses standardized schemas and formal methods. This provides a consistent and predictable way to manage data within or outside the organization.





Ideally, data models are living documents that evolve with the needs of the business. They play an important role in supporting business processes and planning IT architecture and strategy. Data models can be shared with suppliers, partners, and colleagues.









, , - . , :





  • .





  • .





  • .





  • .





  • -.





  • , .





. . , . 





  • . : , - . . , (, ), , , . .





  • . . . agile DevOps-. , . , .





  • , . , . , , , , .





, , -. , , . :





  1. . , , , . .





  2. . , , . , «» , , , .. «» , , .





  3. . , . « » . «», ​​ . (UML).





  4. . , , . () . - , .





  5. . — , () . , , , , . , , .





  6. . — , .





(), . 





« » . , . IBM Information Management System (IMS) ​​ 1966 , . , , (XML) ().





IBM . . 1970 . , . . , .





(SQL) . . , .





ER- . ER- , . , ER-, «-» (Entity-Relationship diagram). ER- , (, ).





- - 1990- . «» — . . - , . .





. ER- , , . OLAP.





— «» «». «» ( ) ( ), . «» «», , .





CASE- , , . :





  • erwin Data Modeler — , IDEF1X, , .





  • Enterprise Architect — , , . - .





  • ER/Studio — , . , .





  • , Open ModelSphere.





, , , , Google BigQuery, Scheduled Queries AppScript. , SQL, Scheduled Queries . , - . 





SQL-, , dbt Dataform.





dbt (data build tool) is an open source framework for executing, testing, and documenting SQL queries that brings an element of software engineering to the data analysis process. It helps to optimize the work with SQL queries: use macros and JINJA templates so as not to repeat the same code snippets for the hundredth time. 





The main problem that specialized tools solve is reducing the time required for support and updates. This is achieved at the expense of ease of debugging.








All Articles