Abstract
The Open Access (OA) to knowledge is a principle established by the European Commission, underlying the H2020 EU Framework Programme for Research and Innovation. OA aims at optimizing the impact of publicly funded projects, by making information openly available and reusable to everyone in Europe. The Open Data (OD) policy is part of the OA strategy and is widely acknowledged as a fundamental step to support a fast track from research to innovation. Although there is a general acknowledgment for the need of OD, a mindset similar to the "not-in-my-backyard" holds back the scientific and industrial communities to implement a joint OD policy. This is partly due to the fear that sensitive and proprietary data could be misused.
To overcome this problem, the European Commission posed an important milestone by declaring that data must be at the same time “Findable, Accessible, Interoperable and Re- usable (FAIR)” [1] and “as much open as possible, and as closed as necessary”1. The Joint Programme on Wind Energy of the European Energy Research Alliance (EERA JPWind)2, represents the largest public European scientific community in the Wind Energy (WE) sector. JP WIND recognises the necessity of implementing an OD plan by setting the goal to create a data portal. The data portal will a) collect information on data from “cloud distributed” data centers, b) catalogue the collected information and c) provide end-users with tools to find data for their needs.
In this report, we focus on the first phase that lays the basis for the implementation of a Data Web Portal i.e. the information architecture to make data Findable and Interoperable. The first phase relates to making data “Findable” and “Interoperable” helping data owners to describe the data and end-users to accurately locate and retrieve the needed data. There are two components for this task: (i) Metadata (data tagging) and (ii) taxonomy for the WE sector topics, the topic related data and descriptive types of metadata.
(i) Metadata
To accurately locate specific datasets, they should be tagged with a series of information, metadata, using so-called metadata cards. Besides preserving the information on data for a future re-use, metadata are used for indexing datasets to refine their findability. Metadata are classified into three categories: descriptive, administrative and structural. Descriptive metadata provide information on e.g. what (associated topic, type of variables, etc.), where data were collected (external conditions or geographical location, etc.) or how data were collected (instruments, activity type). Administrative metadata provide information on e.g. who collected the data (data owner), access rights, links to data, etc. Structural metadata provide information on e.g. data format. In this task, we use standard metadata defined in the Dublin Core metadata element set [3].
(ii) Taxonomy
Taxonomy is the descriptive type of metadata containing terms that assign textual information to the data. In a broad sense, it is any means of organizing concepts of knowledge. The classification of disciplines into e.g. Environment, Climate, Agriculture, Engineering etc is an example of a taxonomy. In a narrow sense, taxonomy is a hierarchical classification or categorization system as we know from e.g. the classification of species. In this report, taxonomy is used to put data into the correct context by defining and hierarchically classifying the WE research area topics and organize data within topics.
A good taxonomy enables users to immediately grasp the overall structure of the knowledge domain and the associated data. Practically, taxonomy terms are used as a controlled wind energy vocabulary by data owners for tagging data in the metadata card and by end-users as “facets” to filter content progressively via a “faceted search”. Furthermore, the taxonomy insures Interoperability2.
The main deliverable of this task is a set of taxonomies: the taxonomy of the topics distinctive of the WE sector and the taxonomy of the data type relevant to different topics and taxonomies of other facets. The first step to create this was to choose the number of hierarchical levels with top-topics and sub-topics. To keep the topic taxonomy structure simple, the development of taxonomy levels ended as soon as the next narrower level reached the “data” dimension. The following case is given as an example: the topic “Siting” includes, amongst others, “wind mapping” for prospective sites. The “wind mapping” activity needs time series of wind speed and direction, and terrain roughness and orography data.
Other taxonomies were created for facets to describe data: External Conditions, Activities, Instruments, Models, and Materials. The following case is given as an example: to perform the resource assessment offshore in Denmark, wind speed and directions from long-term observations using a wind lidar are needed. The search would be:
Siting (Topic),
└Wind Resources (Subtopic),
└Offshore, (External conditions),
└Long-term monitoring (Activity type),
└Wind lidar (Instrument) and
└Wind speed and direction (Data type).
Conclusions
With metadata cards, describing data made available by each organization, data can be searched through a data portal containing a metadata catalog updated by a web crawler, i.e. a program continuously harvesting metadata cards. The data itself resides on the data owner domain and security and data management issues remain in the hand of the data owner.
A user will access the portal to submit a query containing keywords from the established vocabularies from the taxonomy of the metadata. The system will return an optimized list of available data. Data can be accessed either directly via provided download links in the metadata card or by contacting data owners.
This approach has a two-fold purpose: to make data owners feel more comfortable in sharing data by maintaining the control on data access and data use, while end-users will access information on datasets needed for a specific goal optimising time and funding. Both data owners and end-user will have the opportunity to start or reinforce collaboration activities.
To overcome this problem, the European Commission posed an important milestone by declaring that data must be at the same time “Findable, Accessible, Interoperable and Re- usable (FAIR)” [1] and “as much open as possible, and as closed as necessary”1. The Joint Programme on Wind Energy of the European Energy Research Alliance (EERA JPWind)2, represents the largest public European scientific community in the Wind Energy (WE) sector. JP WIND recognises the necessity of implementing an OD plan by setting the goal to create a data portal. The data portal will a) collect information on data from “cloud distributed” data centers, b) catalogue the collected information and c) provide end-users with tools to find data for their needs.
In this report, we focus on the first phase that lays the basis for the implementation of a Data Web Portal i.e. the information architecture to make data Findable and Interoperable. The first phase relates to making data “Findable” and “Interoperable” helping data owners to describe the data and end-users to accurately locate and retrieve the needed data. There are two components for this task: (i) Metadata (data tagging) and (ii) taxonomy for the WE sector topics, the topic related data and descriptive types of metadata.
(i) Metadata
To accurately locate specific datasets, they should be tagged with a series of information, metadata, using so-called metadata cards. Besides preserving the information on data for a future re-use, metadata are used for indexing datasets to refine their findability. Metadata are classified into three categories: descriptive, administrative and structural. Descriptive metadata provide information on e.g. what (associated topic, type of variables, etc.), where data were collected (external conditions or geographical location, etc.) or how data were collected (instruments, activity type). Administrative metadata provide information on e.g. who collected the data (data owner), access rights, links to data, etc. Structural metadata provide information on e.g. data format. In this task, we use standard metadata defined in the Dublin Core metadata element set [3].
(ii) Taxonomy
Taxonomy is the descriptive type of metadata containing terms that assign textual information to the data. In a broad sense, it is any means of organizing concepts of knowledge. The classification of disciplines into e.g. Environment, Climate, Agriculture, Engineering etc is an example of a taxonomy. In a narrow sense, taxonomy is a hierarchical classification or categorization system as we know from e.g. the classification of species. In this report, taxonomy is used to put data into the correct context by defining and hierarchically classifying the WE research area topics and organize data within topics.
A good taxonomy enables users to immediately grasp the overall structure of the knowledge domain and the associated data. Practically, taxonomy terms are used as a controlled wind energy vocabulary by data owners for tagging data in the metadata card and by end-users as “facets” to filter content progressively via a “faceted search”. Furthermore, the taxonomy insures Interoperability2.
The main deliverable of this task is a set of taxonomies: the taxonomy of the topics distinctive of the WE sector and the taxonomy of the data type relevant to different topics and taxonomies of other facets. The first step to create this was to choose the number of hierarchical levels with top-topics and sub-topics. To keep the topic taxonomy structure simple, the development of taxonomy levels ended as soon as the next narrower level reached the “data” dimension. The following case is given as an example: the topic “Siting” includes, amongst others, “wind mapping” for prospective sites. The “wind mapping” activity needs time series of wind speed and direction, and terrain roughness and orography data.
Other taxonomies were created for facets to describe data: External Conditions, Activities, Instruments, Models, and Materials. The following case is given as an example: to perform the resource assessment offshore in Denmark, wind speed and directions from long-term observations using a wind lidar are needed. The search would be:
Siting (Topic),
└Wind Resources (Subtopic),
└Offshore, (External conditions),
└Long-term monitoring (Activity type),
└Wind lidar (Instrument) and
└Wind speed and direction (Data type).
Conclusions
With metadata cards, describing data made available by each organization, data can be searched through a data portal containing a metadata catalog updated by a web crawler, i.e. a program continuously harvesting metadata cards. The data itself resides on the data owner domain and security and data management issues remain in the hand of the data owner.
A user will access the portal to submit a query containing keywords from the established vocabularies from the taxonomy of the metadata. The system will return an optimized list of available data. Data can be accessed either directly via provided download links in the metadata card or by contacting data owners.
This approach has a two-fold purpose: to make data owners feel more comfortable in sharing data by maintaining the control on data access and data use, while end-users will access information on datasets needed for a specific goal optimising time and funding. Both data owners and end-user will have the opportunity to start or reinforce collaboration activities.
Original language | English |
---|
Publisher | European Union |
---|---|
Number of pages | 28 |
DOIs | |
Publication status | Published - 2017 |