In the recent report of the European Commission’s High Level Expert Group for the European Open Science Cloud, we defined the need for a so called “Internet of FAIR data and Services” (IFDS) , referring to a virtual space where machines and people can find, access, interoperate, and thus reuse each other research outputs in a trusted, affordable, and sustainable way. The IFDS should develop following the original hourglass model  (Figure 1), which underpins the successful and scalable growth of the Internet as we know it.
The Hourglass model of the Internet architecture for details see reference 6).
Nothing in the envisioned Internet of FAIR Data and Services (IFDS) will likely be fully identical to the original Internet developments as the IFDS does not start in a greenfield and will build wherever possible on the current Internet infrastructure. However, there are clear similarities: In the classical hourglass layered systems architecture, the TCP/IP is usually placed in the narrow center of the hourglass, also referred to as the “spanning layer”. In fact, all items below the spanning layer can be broadly classified as underlying network infrastructure and all levels above the narrow waist are leading to a wide variety of applications, with both sides having maximum freedom to make implementation choices. This is a basic principle to be followed in the IFDS as well: Only set minimal necessary protocols and standards, and support a wide variety of implementation choices for data, tools, and compute elements to participate in the growing IFDS . If we now try to translate the hourglass model to the IFDS, we deal with three distinguished, basic elements to be routed in order to find each other at the right time and place, and to be maximally used and reused. We have qualified these in the three broad categories DATA, TOOLS, and COMPUTE. There are gray areas, because, for instance, software code (mainly covered under executable tools) can also be regarded as data and middle-ware could be classified as part of the compute infrastructure. We also realize that these boundaries may blur even further when data-driven and computationally assisted science will develop exponentially in the decades to come. However, for all practical purposes, we follow these practical broad definitions, and we basically want to treat all Digital Objects and the associated architecture in the IFDS according to the same principles. To ensure maximum findability off all digital objects, we here explicitly emphasize the need for sufficiently rich machine-actionable metadata such as elaborated on in the FAIR principles and in several follow-up publications , . Tools are defined mostly as software-type services that act on data, such as for instance virtual machines packaged to travel the IFDS for distributed data analytics, but also, for instance, data repositories. Compute is defined as the actual compute processing elements that are needed for the tools to act on the data in a meaningful way. So, in order to keep the distinction between these three classes of “application layer” elements clear for the discussion, we argue here that we deal with three classes, as it where, in three merging hourglasses, each with their own specific under-the-hood network and routing infrastructure. Obviously, wherever possible, generic elements of that infrastructure should be reused for all three elements. A way to express the IFDS elements (data, tools, and compute) in relation to its underlying infrastructure to the hourglass image, we here propose the propeller image (Figure 2), acknowledging that from a purely architectural perspective we could still consider that a single hourglass.
The merge of three hourglasses (data-infrastructure, tools-infrastructure and compute-infrastructure) into the image of a propeller with three blades and the underlying infrastructure. The narrow waist of the hourglass—minimal essential standards and protocols is comparable to the center of this picture.
Intuitively, the IFDS would function most fluently in case the infrastructure (where possible the existing Internet infrastructure) would operate on a strong, common, and globally interoperable networking and routing engine that could efficiently route data to tools, tools to data, and both to the needed compute. These three elements will increasingly no longer reside in large centralized super storage and HPC facilities but will be more and more distributed all over the Internet. Therefore, additional performance aspects and security issues will have to be addressed but these are not the focus of this article and will be addressed separately. Here, we will mainly focus on the construction and the role of rich, machine readable, and distributed metadata objects serving as a basis to locate, access and reuse the digital objects the metadata describe. For a more general description of metadata and the technology associated with them we recommend to read the information provided by the Center for Expanded Data Annotation and Retrieval.
A first very important aspect of our further reasoning is that we adopt the basics of the Digital Objects model and consider each digital object (from a single concept-reference, such as an identifier to a single machine-readable assertion to an entire database or software package) according to the following simplified scheme (Figure 3).
The first obvious prerequisite for the IFDS is that each digital object is assigned (and findable through) a Unique, Persistent, and Resolvable Identifier (UPRI). The specific addition of the term resolvable here indicates the need to accept multiple UPRIs to point to the same concept, so they will correctly resolve to their defined meaning. There are several initiatives underway to repair the current undesirable situation where most data and services do not even fulfill this first criterion to participate in Open Science and the IFDS in general. We should build on these initiatives, and when they become community-adopted, we can follow them as well as contribute to their development wherever appropriate. For the sake of the argument in this article, we will assume that digital objects as containers have a UPRI.
The simple Digital Object picture. The smallest conceivable Digital Object is a PID (a digital symbol referring to a particular concept. This concept could be an abstract unit of thought (in itself not a digital object) or it could refer to an actual digital object, such as another PID (could be a predicate or an object reference, but also an entire database). Each digital object that contains “information” should be adorned with metadata asserting things about the nature of that information. Here we distinguish, based on many discussions and the original DO architecture and in the context of CEDAR between “intrinsic” metadata and “user deﬁ ned” or “expanded” metadata, recognizing that sometimes the boundaries between those two may be rather arbitrary, we nevertheless believe that the distinction is practically meaningful. Typical intrinsic metadata describe the factual information that is “indisputable” about the digital object itself. For instance, assuming the digital object is a data set, the intrinsic metadata will describe the time of collection, the experiment they were part of, the creator, the equipment used to generate the raw data, the license, etc. However, in a world where Digital Objects (including research objects) will be increasingly and intensively reused by others than their creators, more subjective assertions about the digital object are also very important. These user-expanded metadata can be added by the original creators of the data, but may also be added by “reusers”, and include subjective (and traceable/citable) assertions about errors, bias detected etcetera). With the introduction of this second class of metadata, it becomes more and more important to also trace the provenance of the assertions made in the user deﬁ ned metadata. Therefore, intrinsic metadata containers, expanded metadata containers and the actual containers holding the data elements or the core (in case of for instance a workﬂ ow) could also be treated as separate but perma-linked digital objects, each with their own UPRI and thus form a stack of related metadata containers that contain (machine readable, FAIR) metadata of different nature, all asserting, however, relevant information about the data container.
However, in order to intelligently route data to tools, tools to data and both to compute (and in the future likely even mobile compute) we need more than just UPRIs for the containers. We need to describe the data or code containers with rich enough metadata in machine-readable format for both machine and humans (with lingual interface outputs and search capabilities for the latter) to Find, Access, Interoperate, and thus effectively Reuse these components of the IFDS in a myriad of combinations in near real-time. As said, for each and every concept referred to in the metadata as well as, where possible, in the data themselves we need to enforce the use of UPRIs. Still, the choice for various UPRIs (even within the same domain) for the same concept is likely to persist at least for the foreseeable future and belongs to the first degree of freedom to operate away from the center of the hourglass. However, to enable this critical degree of freedom in the IFDS, which will be even more important when we really want to support interdisciplinary research and innovation, we need very high quality, robust, and sustainable mapping services between UPRIs and human-readable terms that denote the same concept in digital objects. These mapping tables are critical infrastructure in the center of the propeller (Figure 2). A major problem is that currently, such services (for example BioPortal in the life sciences, OLS, and FAIR Sharing) are built, maintained, and funded largely by academic efforts and funded through volatile, few-year cycles of public funding, frequently even in fierce competition with “rocket science”. A very important aspect of the IFDS will be to support the process of coordination within and across implementation, training, and certification networks to minimize reinvention of redundant infrastructure components, including such things as thesauri and domain specific or generic ontologies protocols and other standards related elements of the IFDS. But, as said, we have learned that, classically, domains operate in silos and that even within domains multiple standards, vocabularies, languages, and approaches will continue to emerge. This is not only a nuisance and a lack of coordination and discipline, it is also an intrinsic part of the creative process that should be supported in order to further our knowledge and drive innovation. This means that mapping tables, libraries to choose from, community standards registries, etc. will continue to be crucial elements of the IFDS support infrastructure.
Obviously, in the ideal IFDS, where machines form the majority of the first-line users, data, services and compute should all be machine actionable, and working seamlessly together, with human intervention being as minimally as possible, so that humans can focus on final interpretation and decision making based on patterns discovered by their machines. Not all research objects are digital (for example samples in biobanks) and not all digital data need to be entirely machine actionable to make the IFDS operational. However, every digital object, in order to participate, should have as a minimum FAIR metadata. Therefore, here we discuss the potential use of the existing Knowlet technology  to represent metadata of all objects as concepts in the IFDS, including data sets, workflows and compute facilities, without discussing the FAIR level of the actual underlying data, code, etc. So in principle, FAIR metadata can assert that the data they refer to is not (yet) FAIR. It should be emphasized here that the “Knowlet” concept is not prescriptive of the format in which its constituting components are expressed, but FAIR Knowlets should obviously be machine readable and preferably machine actionable like any other FAIR digital object.