DATA WAREHOUSE ARCHITECTURE
Different data warehousing systems have different structures. Some may have an ODS (operational data store), while some may have multiple data marts. In general a Data Warehouse is used on an enterprise level, while Data Mart is used on a business division/department level. Some may have a small number of data sources, while some may have dozens of data sources. In view of this, it is far more reasonable to present the different layers of a data warehouse architecture rather than discussing the specifics of any one system.
Layers in a Data Warehouse
In general, all data warehouse systems have the following layers:
Data Source Layer
Data Extraction Layer
Staging Area
ETL Layer
Data Storage Layer
Data Logic Layer
Data Presentation Layer
Metadata Layer
System Operations Layer
The picture below shows the relationships among the different components of the data warehouse architecture:Pl z find snap
Different data warehousing systems have different structures. Some may have an ODS (operational data store), while some may have multiple data marts. In general a Data Warehouse is used on an enterprise level, while Data Mart is used on a business division/department level. Some may have a small number of data sources, while some may have dozens of data sources. In view of this, it is far more reasonable to present the different layers of a data warehouse architecture rather than discussing the specifics of any one system.
Layers in a Data Warehouse
In general, all data warehouse systems have the following layers:
Data Source Layer
Data Extraction Layer
Staging Area
ETL Layer
Data Storage Layer
Data Logic Layer
Data Presentation Layer
Metadata Layer
System Operations Layer
The picture below shows the relationships among the different components of the data warehouse architecture:Pl z find snap
Each component is discussed individually below:
Data Source Layer
This represents the different data sources that feed data into the data warehouse. The data source can be of any format -- plain text file, relational database, other types of database, Excel file, etc., can all act as a data source
Data Source Layer
This represents the different data sources that feed data into the data warehouse. The data source can be of any format -- plain text file, relational database, other types of database, Excel file, etc., can all act as a data source
Many different types of data can be a data source:
Operations -- such as sales data, HR data, product data, inventory data, marketing data, systems data.
Web server logs with user browsing data.
Internal market research data.
Third-party data, such as census data, demographics data, or survey data.
All these data sources together form the Data Source Layer.
Clearly, the goal of data warehousing is to free the information that is locked up in the operational databases and to mix it with information from other, often external, sources of data. Increasingly, large organizations are acquiring additional data from outside databases. This information includes demographic, econometric, competitive and purchasing trends.
Data Extraction Layer
Data gets pulled from the data source into the data warehouse system. There is likely some minimal data cleansing, but there is unlikely any major data transformation.
Staging Area: This is where data sits prior to being scrubbed and transformed into a data warehouse / data mart. Having one common area makes it easier for subsequent data processing / integration.
ETL Layer
ETL stands for “Extract, Transform and Load”. This is where data gains its "intelligence", as logic is applied to transform the data from a transactional nature to an analytical nature. This layer is also where data cleansing happens. The ETL design phase is often the most time-consuming phase in a data warehousing project, and an ETL tool is often used in this layer.
Data Storage Layer
This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of entities can be found here: data warehouse, data mart, and operational data store (ODS). In any given system, you may have just one of the three, two of the three, or all three types.
Data Logic Layer
This is where business rules are stored. Business rules stored here do not affect the underlying data transformation rules, but do affect what the report looks like.
Data Presentation Layer
This refers to the information that reaches the users. This can be in a form of a tabular / graphical report in a browser, an emailed report that gets automatically generated and sent every day, or an alert that warns users of exceptions, among others. Usually an OLAP tool and/or a reporting tool is used in this layer.
Operations -- such as sales data, HR data, product data, inventory data, marketing data, systems data.
Web server logs with user browsing data.
Internal market research data.
Third-party data, such as census data, demographics data, or survey data.
All these data sources together form the Data Source Layer.
Clearly, the goal of data warehousing is to free the information that is locked up in the operational databases and to mix it with information from other, often external, sources of data. Increasingly, large organizations are acquiring additional data from outside databases. This information includes demographic, econometric, competitive and purchasing trends.
Data Extraction Layer
Data gets pulled from the data source into the data warehouse system. There is likely some minimal data cleansing, but there is unlikely any major data transformation.
Staging Area: This is where data sits prior to being scrubbed and transformed into a data warehouse / data mart. Having one common area makes it easier for subsequent data processing / integration.
ETL Layer
ETL stands for “Extract, Transform and Load”. This is where data gains its "intelligence", as logic is applied to transform the data from a transactional nature to an analytical nature. This layer is also where data cleansing happens. The ETL design phase is often the most time-consuming phase in a data warehousing project, and an ETL tool is often used in this layer.
Data Storage Layer
This is where the transformed and cleansed data sit. Based on scope and functionality, 3 types of entities can be found here: data warehouse, data mart, and operational data store (ODS). In any given system, you may have just one of the three, two of the three, or all three types.
Data Logic Layer
This is where business rules are stored. Business rules stored here do not affect the underlying data transformation rules, but do affect what the report looks like.
Data Presentation Layer
This refers to the information that reaches the users. This can be in a form of a tabular / graphical report in a browser, an emailed report that gets automatically generated and sent every day, or an alert that warns users of exceptions, among others. Usually an OLAP tool and/or a reporting tool is used in this layer.
Metadata Layer
This is where information about the data stored in the data warehouse system is stored. Metadata is data about data. A logical data model would be an example of something that's in the metadata layer. A metadata tool is often used to manage metadata. Data warehouse contains huge amount of data. The metadata component contains the information like: (1) description of data warehouse; (2) rules to map, translate and transform data sources to warehouse elements; (3) the navigation paths and rules for browsing in the data warehouse; (4) the data dictionary; (5) the list of pre-designed and built-in queries available to the users etc. Record descriptions in a COBOL program DIMENSION statements in a FORTRAN program, or SQL Create statement fields are examples of metadata.
In order to have a fully functional warehouse, it is necessary to have a variety of meta-data available, data about the end-user views of data and data about the operational databases. Ideally, end-users should be able to access data from the data warehouse (or from the operational databases) without having to know where that data resides or the form in which it is stored.
System Operations Layer
This layer includes information on how the data warehouse system operates, such as ETL job status, system performance, and user access history
This is where information about the data stored in the data warehouse system is stored. Metadata is data about data. A logical data model would be an example of something that's in the metadata layer. A metadata tool is often used to manage metadata. Data warehouse contains huge amount of data. The metadata component contains the information like: (1) description of data warehouse; (2) rules to map, translate and transform data sources to warehouse elements; (3) the navigation paths and rules for browsing in the data warehouse; (4) the data dictionary; (5) the list of pre-designed and built-in queries available to the users etc. Record descriptions in a COBOL program DIMENSION statements in a FORTRAN program, or SQL Create statement fields are examples of metadata.
In order to have a fully functional warehouse, it is necessary to have a variety of meta-data available, data about the end-user views of data and data about the operational databases. Ideally, end-users should be able to access data from the data warehouse (or from the operational databases) without having to know where that data resides or the form in which it is stored.
System Operations Layer
This layer includes information on how the data warehouse system operates, such as ETL job status, system performance, and user access history
No comments:
Post a Comment