An ETL Optimization Framework Using Partitioning and Parallelization

Xiufeng Liu, Nadeem Iftikhar

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

1 Downloads (Pure)

Abstract

Extract-Transform-Load (ETL) handles large amounts of data and manages workload through dataflows. ETL dataflows are widely regarded as complex and expensive operations in terms of time and system resources. In order to minimize the time and the resources required by ETL dataflows, this paper presents an optimization framework using partitioning and parallelization. The framework first partitions an ETL dataflow into multiple execution trees according to the characteristics of ETL constructs, then within an execution tree pipelined parallelism and shared cache are used to optimize the partitioned dataflow. Furthermore, multi-threading is used in component-based optimization. The experimental results show that the proposed framework can achieve 4.7 times faster than the ordinary ETL dataflows (without using the proposed partitioning and optimization methods), and is comparable to the similar ETL tools.
Original languageEnglish
Title of host publicationProceedings of the 30th ACM Symposium on Applied Computing (SAC 2015)
Number of pages8
PublisherAssociation for Computing Machinery
Publication date2015
ISBN (Print)978-1-4503-3196-8
DOIs
Publication statusPublished - 2015
Externally publishedYes
Event30th Annual ACM/SIGAPP Symposium on Applied Computing - Salamanca, Spain
Duration: 13 Apr 201517 Apr 2015
Conference number: 30
https://www.sigapp.org/sac/sac2015/
http://www.acm.org/conferences/sac/sac2015/

Conference

Conference30th Annual ACM/SIGAPP Symposium on Applied Computing
Number30
Country/TerritorySpain
CitySalamanca
Period13/04/201517/04/2015
Internet address

Fingerprint

Dive into the research topics of 'An ETL Optimization Framework Using Partitioning and Parallelization'. Together they form a unique fingerprint.

Cite this