Optimizing ETL by a Two-level Data Staging Method

Publication: Research - peer-reviewJournal article – Annual report year: 2016

View graph of relations

In data warehousing, the data from source systems are populated into a central data warehouse (DW) through extraction, transformation and loading (ETL). The standard ETL approach usually uses sequential jobs to process the data with dependencies, such as dimension and fact data. It is a non-trivial task to process the so-called early-/late-arriving data, which arrive out of order. This paper proposes a two-level data staging area method to optimize ETL. The proposed method is an all-in-one solution that supports processing different types of data from operational systems, including early-/late-arriving data, and fast-/slowly-changing data. The introduced additional staging area decouples loading process from data extraction and transformation, which improves ETL flexibility and minimizes intervention to the data warehouse. This paper evaluates the proposed method empirically, which shows that it is more efficient and less intrusive than the standard ETL method.
Original languageEnglish
JournalInternational Journal of Data Warehousing and Mining
Volume12
Issue number3
Pages (from-to)32-50
ISSN1548-3924
StatePublished - 2016
Download as:
Download as PDF
Select render style:
APAAuthorCBE/CSEHarvardMLAStandardVancouverShortLong
PDF
Download as HTML
Select render style:
APAAuthorCBE/CSEHarvardMLAStandardVancouverShortLong
HTML
Download as Word
Select render style:
APAAuthorCBE/CSEHarvardMLAStandardVancouverShortLong
Word

ID: 125314406