Your data acquisition procedures must be documented in sufficient detail to allow replication by another researcher (see LMU Guidelines for Safeguarding Good Scientific Practice). Reproducible data collection processes build team expertise, reduce errors, and improve data quality and consistency.
State-of-the-art practices for reproducible data acquisition include:
- creating standard operating procedures
- recording metadata as data collection is taking place
- build in automation through programming
Metadata are data about data; they provide context to your data. Metadata such as equipment settings, environmental conditions, software versions, and calibration records should be recorded contemporaneously, not reconstructed afterward. Electronic lab notebooks, instrument logs, and automated logging all help to document your metadata.
- Create standard data acquisition procedures within the team. From step-by-step wet-lab procedure, to the settings of measuring devices and the reproducible data pre-processing using script, all regularly repeated steps should be documented and standardized to be replicated precisely by all team members.
- Share standard operating procedures through common server space such as LRZ Sync & Share, specialized online tools like protocols.io or electronic lab notebooks, or LRZ GitLab for scripts.
Your protocol should specify materials with identifying details (e.g. lot numbers, versions, sources), equipment settings, step-by-step instructions with timing, and expected outcomes at each stage. What counts as “materials” varies by field: reagent concentrations in wet lab work, scanner parameters in neuroimaging, sampling coordinates in field ecology. But the principle is the same: enough detail that someone else could replicate your procedure exactly.
Write detailed methods and reusable protocols. Write your protocol before you start, with all the details that would be needed for an exact replication (see the ReproducibiliTeach lecture on reusable protocols)
Track deviations in real time. Follow your protocol precisely, and record any deviations as they happen. When you need to adapt, note it immediately. These deviations often explain unexpected results and guide protocol improvements. Electronic lab notebooks (ELNs) make this easier by creating version-controlled, timestamped records automatically, providing an audit trail that paper cannot match.
Publish your protocols. A detailed, tested protocol is a contribution to your field. Publishing establishes priority, enables citation, and makes your methods reusable. Platforms like protocols.io provide version control and DOI assignment.
LEARN MORE
TOOLS & RESOURCES
Document both the instrument and the administration procedure completely. The data dictionary (or ‘codebook’) should record the exact version of the questionnaire used; if copyright allows it, document the item wording itself.
Questionnaires should be objective, reliable, and valid:
- Objective: Results should not depend on who administers or scores the questionnaire.
- Reliable: Responses should be consistent across repeated measurements when the underlying construct is unchanged (i.e., they should have low measurement error)
- Valid: Items should measure the intended construct rather than something else. Typical ways to assess validity include content validity, convergent and divergent construct validity, and criterion-related validity.
For well-validated instruments, published studies extensively assess reliability and validity across several populations and contexts. If these quality criteria of measurement instruments are met, your effect size and statistical power will be increased.
When data comes from instruments, sensors, or APIs (Application Programming Interface), scripting the acquisition creates a reproducible record of exactly what was collected and how. Programming languages like R or Python work well for straightforward pipelines. For more complex multi-step workflows which are common in e.g. in bioinformatics and neuroimaging, workflow managers like Snakemake ensure steps run in the correct order and can resume after failures.
Structure data correctly from the start. Variables in columns, observations in rows. This makes your data immediately interoperable with analysis tools rather than requiring cleanup later. Scripts can also automate organization, file renaming, and conversion to open formats. See 2.2 Data Management for guidelines.
Keep records of what ran and when. Include error handling so failures are recorded rather than silently corrupting data. When something fails months later, you need to know what happened. Always test acquisition scripts on sample data before production runs. A bug in your collection pipeline can invalidate an entire dataset.
Version control your code and data. This makes your methods reproducible and shareable. See 2.2.6. Version Control for details.
Field and sometimes lab researchers have to work with analogue notebooks first, and use temporary storage solution for images and video-recordings.
- Digitalize your data soon after data collection, e.g. using data entry form from a relational database such as postgreSQL
- Create automatic validation checks for data entry so e.g. values outside of expected range get immediately identified (see 2.4. Quality control)
- Transfer digital data (e.g. pictures, video recordings) from temporary storage solutions (e.g. camera storage) to permanent storage solutions (see 2.1.1. Storage)
- Check your data entries, e.g. after a day, to correct possible mistakes in transcription
- Keep analogue records to correct data entries errors at the end of the season or experiment
A high-quality systematic review, whether it includes a meta-analysis or not, uses a rigorous, transparent, and reproducible methodology to select the relevant literature and later provide a thorough summary and critical evaluation of research within a given field. Its defining features include:
- clearly defined objectives supported by an explicit and reproducible methodology
- a comprehensive, systematic search designed to identify all studies meeting the eligibility criteria
- a critical appraisal of the validity of included studies, such as through risk-of-bias assessment
- a structured presentation and synthesis of the characteristics and findings of the included studies
Data collection, in this context, consists in developing a judicious reproducible methods to selecting relevant literature.
Use tools such as
Learn more from our colleagues at the Berlin Institute of Health QUEST Center for Responsible Research using their resources for systematic reviews in biomedical research and the material of their systematic review workshop at our previous OSC summer school.