Data recording practices play a central element in research and can vary largely from lab to lab and researcher to researcher. However, everybody looking for specific data sets or experimental details in published articles or the own lab notebook will agree that it is absolutely essential to be very diligent when recording and documenting his or her own research processes and to keep up with a proper data infrastructure.
Two important sets of principles exist regarding the guidance for data management practices: FAIR and ALCOAplus. At a first glance, both seem to be kind of similar, but they have actually a different focus.
FAIR stands for Findable, Accessible, Interoperable and Reusable and was drafted at a Lorentz Center workshop in Leiden in the Netherlands in 2015. It was published the same year in Nature (LINK). These principles focus on the infrastructure for data and put a big emphasis also on metadata. In this context, metadata are used or generated to describe the actual experimental data (ED). Especially, ED sets generated in the background by software applications can get lost when the data are transferred between applications. This is also often the case when file formats are changed. Taking care of this issue is one of the core areas of FAIR.
In contrast, ALCOAplus focuses mainly on data integrity and, drafted by the WHO, is published in the WHO guidance, Section 9 Good Documentation Practice (LINK). The acronym ALCOA stands for Attributable, Legible, Contemporaneous, Original and Accurate, with the ‘plus’ referring to Complete, Consistent, Enduring Available. An example for data integrity by ALCOA is the detailed annotation of all data sets by the researcher who produced them.
Both sets of data management principles are important but they fit in a slightly different niche. Since ALCOA is especially focusing on data integrity issues, ALCOA may be more relevant for bench-work scientists in the need of properly documenting experiments according to the ALCOA guiding principles.
Some more details regarding the two principals:
The first step in (re)using data is to find them. Metadata and data should be easy to find for both humans and computers. Machine-readable metadata are essential for automatic discovery of datasets and services, so this is an essential component of the FAIRification process.
F1. (meta)data are assigned a globally unique and eternally persistent identifier.
F2. data are described with rich metadata.
F3. (meta)data are registered or indexed in a searchable resource.
F4. metadata specify the data identifier.
Once the user finds the required data, she/he needs to know how can they be accessed, possibly including authentication and authorisation.
A1 (meta)data are retrievable by their identifier using a standardized communications protocol.
A1.1 the protocol is open, free, and universally implementable.
A1.2 the protocol allows for an authentication and authorization procedure, where necessary.
A2 metadata are accessible, even when the data are no longer available.
The data usually need to be integrated with other data. In addition, the data need to interoperate with applications or workflows for analysis, storage, and processing.
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles.
I3. (meta)data include qualified references to other (meta)data.
The ultimate goal of FAIR is to optimise the reuse of data. To achieve this, metadata and data should be well-described so that they can be replicated and/or combined in different settings.
R1. meta(data) have a plurality of accurate and relevant attributes.
R1.1. (meta)data are released with a clear and accessible data usage license.
R1.2. (meta)data are associated with their provenance.
R1.3. (meta)data meet domain-relevant community standards.
The principles refer to three types of entities: data (or any digital object), metadata (information about that digital object), and infrastructure. For instance, principle F4 defines that both metadata and data are registered or indexed in a searchable resource (the infrastructure component).
The industry standard established by the WHO with the initialism known as ALCOA, which has been expanded on to ALCOA+ (currently used by the FDA, WHO, PIC/S and GAMP – Data Integrity), neatly requires all data to have the following qualities:
- Attributable— Who acquired the data or performed an action and when?
- Legible— Can you read the data and any entries?
- Contemporaneous— Was it recorded as it happened?
- Original— Is it the first place data is recorded?
- Accurate— Are all the details correct?
- Complete— Are all data included (any repeat or reanalysis performed on the sample)?
- Consistent— Are all elements in chronological order?
- Enduring— Are all recordings and notes accessible over extended period?
- Available— Can the data be accessed for review over the lifetime of the record?
Based on these attributes being adhered to the data can be trusted, this becomes both simpler and more complicated when we introduce electronic systems capable of managing all these attributes as part of the ‘meta-data’ and have to consider how we can retrieve, store and archive data across it’s entire life cycle.
Summary and overview with the references in the attached pdf: