Abstract
Extract-Transform-Load (ETL) handles large amounts of data and
manages workload through dataflows. ETL dataflows are widely
regarded as complex and expensive operations in terms of time
and system resources. In order to minimize the time and the resources
required by ETL dataflows, this paper presents an optimization
framework using partitioning and parallelization. The framework
first partitions an ETL dataflow into multiple execution trees
according to the characteristics of ETL constructs, then within an
execution tree pipelined parallelism and shared cache are used to
optimize the partitioned dataflow. Furthermore, multi-threading is
used in component-based optimization. The experimental results
show that the proposed framework can achieve 4.7 times faster than
the ordinary ETL dataflows (without using the proposed partitioning
and optimization methods), and is comparable to the similar
ETL tools.
Original language | English |
---|---|
Title of host publication | Proceedings of the 30th ACM Symposium on Applied Computing (SAC 2015) |
Number of pages | 8 |
Publisher | Association for Computing Machinery |
Publication date | 2015 |
ISBN (Print) | 978-1-4503-3196-8 |
DOIs | |
Publication status | Published - 2015 |
Externally published | Yes |
Event | 30th Annual ACM/SIGAPP Symposium on Applied Computing - Salamanca, Spain Duration: 13 Apr 2015 → 17 Apr 2015 Conference number: 30 https://www.sigapp.org/sac/sac2015/ http://www.acm.org/conferences/sac/sac2015/ |
Conference
Conference | 30th Annual ACM/SIGAPP Symposium on Applied Computing |
---|---|
Number | 30 |
Country/Territory | Spain |
City | Salamanca |
Period | 13/04/2015 → 17/04/2015 |
Internet address |