TY - JOUR
T1 - Workflows in AiiDA
T2 - Engineering a high-throughput, event-based engine for robust and modular computational workflows
AU - Uhrin, Martin
AU - Huber, Sebastiaan P.
AU - Yu, Jusong
AU - Marzari, Nicola
AU - Pizzi, Giovanni
PY - 2021
Y1 - 2021
N2 - Over the last two decades, the field of computational science has seen a dramatic shift towards incorporating high-throughput computation and big-data analysis as fundamental pillars of the scientific discovery process. This has necessitated the development of tools and techniques to deal with the generation, storage and processing of large amounts of data. In this work we present an in-depth look at the workflow engine powering AiiDA, a widely adopted, highly flexible and database-backed informatics infrastructure with an emphasis on data reproducibility. We detail many of the design choices that were made which were informed by several important goals: the ability to scale from running on individual laptops up to high-performance supercomputers, managing jobs with runtimes spanning from fractions of a second to weeks and scaling up to thousands of jobs concurrently, and all this while maximising robustness. In short, AiiDA aims to be a Swiss army knife for high-throughput computational science. As well as the architecture, we outline important API design choices made to give workflow writers a great deal of liberty whilst guiding them towards writing robust and modular workflows, ultimately enabling them to encode their scientific knowledge to the benefit of the wider scientific community.
AB - Over the last two decades, the field of computational science has seen a dramatic shift towards incorporating high-throughput computation and big-data analysis as fundamental pillars of the scientific discovery process. This has necessitated the development of tools and techniques to deal with the generation, storage and processing of large amounts of data. In this work we present an in-depth look at the workflow engine powering AiiDA, a widely adopted, highly flexible and database-backed informatics infrastructure with an emphasis on data reproducibility. We detail many of the design choices that were made which were informed by several important goals: the ability to scale from running on individual laptops up to high-performance supercomputers, managing jobs with runtimes spanning from fractions of a second to weeks and scaling up to thousands of jobs concurrently, and all this while maximising robustness. In short, AiiDA aims to be a Swiss army knife for high-throughput computational science. As well as the architecture, we outline important API design choices made to give workflow writers a great deal of liberty whilst guiding them towards writing robust and modular workflows, ultimately enabling them to encode their scientific knowledge to the benefit of the wider scientific community.
KW - Computational workflows
KW - Data management
KW - Data sharing
KW - Database
KW - Event-based
KW - High-throughput
KW - Provenance
KW - Robust computation
U2 - 10.1016/j.commatsci.2020.110086
DO - 10.1016/j.commatsci.2020.110086
M3 - Journal article
AN - SCOPUS:85096159747
SN - 0927-0256
VL - 187
JO - Computational Materials Science
JF - Computational Materials Science
M1 - 110086
ER -