TY - JOUR
T1 - Perspectives on automated composition of workflows in the life sciences
AU - Lamprecht, Anna-Lena
AU - Palmblad, Magnus
AU - Ison, Jon
AU - Schwämmle, Veit
AU - Al Manir, Mohammad Sadnan
AU - Altintas, Ilkay
AU - Baker, Christopher J.O.
AU - Ben Hadj Amor, Ammar
AU - Capella-Gutierrez, Salvador
AU - Charonyktakis, Paulos
AU - Crusoe, Michael R.
AU - Gil, Yolanda
AU - Goble, Carole
AU - Griffin, Timothy J.
AU - Groth, Paul
AU - Ienasescu, Hans
AU - Jagtap, Pratik
AU - Kalaš, Matúš
AU - Kasalica, Vedran
AU - Khanteymoori, Alireza
AU - Kuhn, Tobias
AU - Mei, Hailiang
AU - Ménager, Hervé
AU - Möller, Steffen
AU - Richardson, Robin A.
AU - Robert, Vincent
AU - Soiland-Reyes, Stian
AU - Stevens, Robert
AU - Szaniszlo, Szoke
AU - Verberne, Suzan
AU - Verhoeven, Aswin
AU - Wolstencroft, Katherine
PY - 2021
Y1 - 2021
N2 - Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the 'big picture' of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.
AB - Scientific data analyses often combine several computational tools in automated pipelines, or workflows. Thousands of such workflows have been used in the life sciences, though their composition has remained a cumbersome manual process due to a lack of standards for annotation, assembly, and implementation. Recent technological advances have returned the long-standing vision of automated workflow composition into focus. This article summarizes a recent Lorentz Center workshop dedicated to automated composition of workflows in the life sciences. We survey previous initiatives to automate the composition process, and discuss the current state of the art and future perspectives. We start by drawing the 'big picture' of the scientific workflow development life cycle, before surveying and discussing current methods, technologies and practices for semantic domain modelling, automation in workflow development, and workflow assessment. Finally, we derive a roadmap of individual and community-based actions to work toward the vision of automated workflow development in the forthcoming years. A central outcome of the workshop is a general description of the workflow life cycle in six stages: 1) scientific question or hypothesis, 2) conceptual workflow, 3) abstract workflow, 4) concrete workflow, 5) production workflow, and 6) scientific results. The transitions between stages are facilitated by diverse tools and methods, usually incorporating domain knowledge in some form. Formal semantic domain modelling is hard and often a bottleneck for the application of semantic technologies. However, life science communities have made considerable progress here in recent years and are continuously improving, renewing interest in the application of semantic technologies for workflow exploration, composition and instantiation. Combined with systematic benchmarking with reference data and large-scale deployment of production-stage workflows, such technologies enable a more systematic process of workflow development than we know today. We believe that this can lead to more robust, reusable, and sustainable workflows in the future.
KW - Automated workflow composition
KW - Bioinformatics
KW - Computational pipelines
KW - Life sciences
KW - Scientific workflows
KW - Semantic domain modelling
KW - Workflow benchmarking
U2 - 10.12688/f1000research.54159.1
DO - 10.12688/f1000research.54159.1
M3 - Journal article
C2 - 34804501
SN - 2046-1402
VL - 10
JO - F1000Research
JF - F1000Research
M1 - 897
ER -