Optimizing ETL by a Two-level Data Staging Method

    Research output: Contribution to journalJournal articleResearchpeer-review

    Abstract

    In data warehousing, the data from source systems are populated into a central data warehouse (DW) through extraction, transformation and loading (ETL). The standard ETL approach usually uses sequential jobs to process the data with dependencies, such as dimension and fact data. It is a non-trivial task to process the so-called early-/late-arriving data, which arrive out of order. This paper proposes a two-level data staging area method to optimize ETL. The proposed method is an all-in-one solution that supports processing different types of data from operational systems, including early-/late-arriving data, and fast-/slowly-changing data. The introduced additional staging area decouples loading process from data extraction and transformation, which improves ETL flexibility and minimizes intervention to the data warehouse. This paper evaluates the proposed method empirically, which shows that it is more efficient and less intrusive than the standard ETL method.
    Original languageEnglish
    JournalInternational Journal of Data Warehousing and Mining
    Volume12
    Issue number3
    Pages (from-to)32-50
    ISSN1548-3924
    DOIs
    Publication statusPublished - 2016

    Cite this

    @article{875fadb018d9453b97d90fdb77c88a45,
    title = "Optimizing ETL by a Two-level Data Staging Method",
    abstract = "In data warehousing, the data from source systems are populated into a central data warehouse (DW) through extraction, transformation and loading (ETL). The standard ETL approach usually uses sequential jobs to process the data with dependencies, such as dimension and fact data. It is a non-trivial task to process the so-called early-/late-arriving data, which arrive out of order. This paper proposes a two-level data staging area method to optimize ETL. The proposed method is an all-in-one solution that supports processing different types of data from operational systems, including early-/late-arriving data, and fast-/slowly-changing data. The introduced additional staging area decouples loading process from data extraction and transformation, which improves ETL flexibility and minimizes intervention to the data warehouse. This paper evaluates the proposed method empirically, which shows that it is more efficient and less intrusive than the standard ETL method.",
    author = "Xiufeng Liu and Nadeem Iftikhar and Nielsen, {Per Sieverts}",
    year = "2016",
    doi = "10.4018/IJDWM.2016070103",
    language = "English",
    volume = "12",
    pages = "32--50",
    journal = "International Journal of Data Warehousing and Mining",
    issn = "1548-3924",
    publisher = "I G I Global",
    number = "3",

    }

    Optimizing ETL by a Two-level Data Staging Method. / Liu, Xiufeng; Iftikhar, Nadeem ; Nielsen, Per Sieverts.

    In: International Journal of Data Warehousing and Mining, Vol. 12, No. 3, 2016, p. 32-50.

    Research output: Contribution to journalJournal articleResearchpeer-review

    TY - JOUR

    T1 - Optimizing ETL by a Two-level Data Staging Method

    AU - Liu, Xiufeng

    AU - Iftikhar, Nadeem

    AU - Nielsen, Per Sieverts

    PY - 2016

    Y1 - 2016

    N2 - In data warehousing, the data from source systems are populated into a central data warehouse (DW) through extraction, transformation and loading (ETL). The standard ETL approach usually uses sequential jobs to process the data with dependencies, such as dimension and fact data. It is a non-trivial task to process the so-called early-/late-arriving data, which arrive out of order. This paper proposes a two-level data staging area method to optimize ETL. The proposed method is an all-in-one solution that supports processing different types of data from operational systems, including early-/late-arriving data, and fast-/slowly-changing data. The introduced additional staging area decouples loading process from data extraction and transformation, which improves ETL flexibility and minimizes intervention to the data warehouse. This paper evaluates the proposed method empirically, which shows that it is more efficient and less intrusive than the standard ETL method.

    AB - In data warehousing, the data from source systems are populated into a central data warehouse (DW) through extraction, transformation and loading (ETL). The standard ETL approach usually uses sequential jobs to process the data with dependencies, such as dimension and fact data. It is a non-trivial task to process the so-called early-/late-arriving data, which arrive out of order. This paper proposes a two-level data staging area method to optimize ETL. The proposed method is an all-in-one solution that supports processing different types of data from operational systems, including early-/late-arriving data, and fast-/slowly-changing data. The introduced additional staging area decouples loading process from data extraction and transformation, which improves ETL flexibility and minimizes intervention to the data warehouse. This paper evaluates the proposed method empirically, which shows that it is more efficient and less intrusive than the standard ETL method.

    U2 - 10.4018/IJDWM.2016070103

    DO - 10.4018/IJDWM.2016070103

    M3 - Journal article

    VL - 12

    SP - 32

    EP - 50

    JO - International Journal of Data Warehousing and Mining

    JF - International Journal of Data Warehousing and Mining

    SN - 1548-3924

    IS - 3

    ER -