Practical approach to selecting data warehouse views using data dependencies
No Thumbnail Available
Date
2000-07-01T00:00:00Z
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Data warehouses integrate information from heterogeneous sources and enable efficient analysis of the information. The two main characteristics of data warehouses are the huge volumes of data they store and the requirement of fast access to the data. Because of the huge volumes of data, simple search techniques are not sufficient. Materialized views in data warehouses are typically complicated, based on many tables, often containing summarized information, but are very important for improving access to the data. Because data warehouses are expected to contain current information, it is also important that the data warehouse, and the views, can be easily updated periodically. The selection of materialized views changes over time, with new materialized views created and old ones dropped. So, the selection of materialized views is crucial. Most research to date has treated the selection of materialized views as an optimization problem with respect to the cost of view maintenance and/or with respect to the cost of queries. In this paper, we consider practical aspects of data warehousing. We identify problems with the star and snowflake schema and suggest solutions, that we call enhanced star schema and enhanced snowflake schema. We also identify practical problems that may arise during view design and suggest heuristics based on data dependencies that can be used to measure if one set of views is better than another set of views, or used to improve a set of views with respect to speed of access.