1. Plan & Design

Set the foundation for open & reliable research

Explore & Reuse Check Legal Frameworks Write Data Management Plan Design Study

Checkpoints: Study Plan Presentation & Preregistration Submission

1.1 Explore & Reuse

Any resource that inspires you or that you want to reuse and/or adapt must minimally be cited using their persistent identifiers e.g. DOIs (Digital Object Identifiers) - and otherwise URL with author, date, and time of access - and you must follow the license and/or usage agreement provided by the authors. A research output (e.g. data, code) without a license or statement granting your permission for reuse cannot legally be reused even if they appear publicly online.

Review existing literature to come up with a well-founded research question. We recommend to use open source discipline-agnostic registries like OpenAlex which contains published articles, thesis, and preprints (scholarly work that are not (yet) peer-reviewed) of all disciplines, or discipline-specific open source registries such as Europe PubMed Central for life sciences preprints and published articles.
Use a reference manager to keep track of your bibliography. Zotero is an open source software formatting your bibliography in any desired format and that can be integrated within e.g. Microsoft Word, Google Doc, or RStudio for writing reproducible manuscripts (see 3. Analyze & Collaborate).

LEARN MORE

OSC Tutorial

Introduction to Zotero

Use an open source reference manager. (1h)

TOOLS & RESOURCES

OpenAlex

All the world's research, connected and open.

Europe PubMed Central

Comprehensive access to life sciences literature.

A preregistration typically consists of a hypothesis and predictions, a plan for data collection (when relevant), and a plan for data analysis, that researchers upload onto a registry before starting their projects, often in order to increase the rigor of confirmatory research (see 1.4.1. Pre-analysis planning).

Get insight into projects that are not (yet) published, either currently ongoing or abandoned, by looking for projects that were preregistered. Projects that are left unpublished typically have a note attached to their preregistration. Some registries are discipline-specific while others are discipline-agnostic (see below).

TOOLS & RESOURCES

Open Science Framework

Registry of preregistrations. Widely used across fields.

AsPredicted

Registry of simple preregistrations

PreclinicalTrials.eu

Registry of preclinical animal study protocols.

AnimalStudyRegistry.org

Registry of animal studies.

ClinicalTrials.gov

Registry of clinical trial protocols.

How to find existing datasets?

Search for discipline-specific repositories on re3data which is a central registry of many repositories
Explore subject agnostic repositories such as DataCite, FigShare, Open Science Framework (OSF), or Zenodo.

These platforms either give you access to existing data or provide metadata and explanations on how to request access to the data.

Definition

Metadata are data about your data, such as author, date, measurement device, unit of measurement, context of data collection, etc.

How to reuse a dataset?

Review the license and data use agreement. Make sure you understand what you are allowed to do with the data and under what conditions. Even if the license does not request attribution of the authors, scholarly norms require you to cite the source of the data for any of your work based on it.
Review metadata and documentation. Make sure you know where your data comes from, how the data was collected and processed, and reflect on whether any of it poses problems for your research question.
Check what additional requirements the data sources have. Sometimes, data providers request prospective data users to submit a preregistration prior to giving access to the data (see 1.4. Study Design & Analysis Plan).
Use the metadata to plan your analysis. Review existing data dictionaries (or “codebooks”) and other documentation describing the variables, range of values, etc. If you plan to do a confirmatory analysis, do not look at the data to minimize confirmation or hindsight bias; instead, prepare a pre-analysis plan (see 1.4. Study Design & Analysis Plan).

Definition

Confirmation bias: The tendency to seek, interpret, and remember information that confirms one’s existing beliefs or expectations.

Hinsight bias Seeing past events as predictable after the outcome is known (“I knew it all along”).

TOOLS & RESOURCES

re3data

Registry of research data repositories.

DataCite Commons

Discovery tool connecting works, people, and organizations.

Fig Share

General-purpose repository for data, software, reports.

Open Science Framework

General-purpose repository for data, materials, reports.

Zenodo

General-purpose repository for data, software, reports.

Find code available for reuse archived on Zenodo, Software Heritage or actively developed on GitHub and other code repositories. Start learning Git version control now or learn to take advantage of more collaborative features on the GitHub platform in more details in 3. Analyze & Collaborate.

Important

Code publicly visible on GitHub without a license or equivalent text explicitly stating permission for reuse cannot be legally reused. It is best to ask the authors to add an open license to their repository to explicitly allow reuse (to do this, they can, for instance, add a file called LICENSE.txt with the Apache 2.0 license text - see our code publishing tutorial to learn more about licenses).

LEARN MORE

OSC Tutorial

Git Tutorial

Use version control system Git from within RStudio. (2h)

OSC Tutorial

GitHub Tutorial

Collaborative coding with GitHub and RStudio (1h)

OSC Tutorial

Code Publishing

Add README and license to a reproducible project (2h)

TOOLS & RESOURCES

Zenodo

General-purpose repository for data, software, reports.

Software Heritage

Collects, preserves, and shares software in source code form.

GitHub

Cloud-based platform to collaborate on code.

1.2 Legal Requirements

The LMU Guidelines for Safeguarding Good Scientific Practice are legally binding for all academics, researchers, research support staff, teachers, and students at LMU Munich. Only the original text in German prevails, but we provide an English summary of relevant aspects for this guide:

Appropriate level of documentation and standards to allow reproduction:

Reproducible methods must be used. (§11)
When research software is developed, its source code must be documented. (§12)

Appropriate level of documentation and standards to allow replication:

All information relevant to the production of a research result must be documented comprehensively to enable replication. (§7 and §12)
If specific professional recommendations exist for review and evaluation, the results must be documented in accordance with these respective specifications. (§12)
Individual results that do not support the hypothesis must also be documented; a selection of results is not permitted. (§12)

Public access to research results:

Apart from specific exceptions, all findings should be made public. For this, they must be described in a detailed and comprehensible manner which includes making available the research data, materials and information on which the results are based, as well as the methods used and the software employed (including appropriately licensed self-written software) according to the FAIR principles. (§13)
Data, material, software made publicly accessible must be appropriately archived, usually for a period of 10 years (§17).

In later sections, you will acquire skills in data management and reproducible workflow that will enable you to comply with these guidelines and the FAIR principles.

Definitions

The FAIR principles are defined as:

Findable: metadata should be deposited in a searchable repository and be assigned a permanent identifier
Accessible: the data is either open, or accessible upon some authentication process, or closed, but with open metadata.
Interoperable: the data is described with a standard terminology (so the dataset can be merged with other ones) and saved in a stable file format
Reusable: the data is richly documented (e.g. with a data dictionary) and is accompanied by a data usage license See https://www.go-fair.org/fair-principles/ for more information.

Metadata are data about your data, such as author, date, measurement device, unit of measurement, context of data collection, etc.

Reproducibility: The ability of a researcher to re-derive the same results using the same data and methods; also known as computational reproducibility.

Replicability: The ability of an independent researcher to achieve results consistent with teh original study by following the same experimental or analytical approach but collecting new data.

TOOLS & RESOURCES

LMU Guidelines for Safeguarding Good Scientific Practice

Implementation of the German Research Foundation's (DFG) Code of Conduct

Check all funders’ open science requirements. Funders may have additional requirements on top of those indicated in the LMU guidelines. For instance, some funding lines request a Research Data Management plan before making their second payment, some specify the extent and timing of data sharing and provide funds for such activity.
Contact the LMU Research Funding Unit to review your grant proposal and assess if your proposal is meeting your funders’ open science requirements.

Data collection and analyses involving human participants or animal subjects typically require approval from ethics committees to ensure responsible conduct and the protection of data.

Your ethics proposal will typically include information on:

Data storage and retention – outlining how data will be securely stored, backed up, and retained over time. This information can be extracted from a more detailed Research Data Management plan (see 1.3. Research Data Management Plans).
Risks if the data were leaked – identifying potential consequences for participants or the research project if confidentiality is breached.
Data anonymization – describing procedures to remove or obscure personally identifiable information to protect participant privacy (see 2.3.2. Anonymization for options, from simple techniques of anonymization to the creation of synthetic data).
Informed consent forms language – ensuring that participants clearly understand the purpose, procedures, and any potential risks of the study. Conditions for sharing their data should be clearly explained here (see 2.3.1. Informed Consent).
Power analysis to justify sample size – providing a statistical rationale for the number of participants, which supports the validity and ethical justification of the study. This, and more detailed information on the statistical plan, can be extracted from your pre-analysis plan (see 1.4.1. Pre-analysis planning and 1.4.3. Power analyses).

For data protection guidance, contact the LMU Data Protection Officer or the Research Data Management team of the University Library.

Tips for research groups to streamline this process

Share templates and example resources amongst team members. For example, include previously approved ethics proposals, approved Data Protection Impact Assessment forms, or Data Management Plans on a common server space.
Create Standard Operating Procedures (SOPs) for the team for processes such as appropriate anonymization technique for a specific data type, power analyses script for common analyses, define when a data management plan must be updated, who is responsible, and how updates are reviewed/approved and communicated.

LEARN MORE

OSC Tutorial

Data Management Plans

Overview of components, tips, and tools. (30 min)

OSC Tutorial

TBA: Data Anonymization

Implement data anonymization techniques in R. (X h)

OSC Tutorial

Power Analysis

Data simulations for GLMs, LMEs, and SEMs in R. (6h)

1.3 Research Data Management Plans

A Data Management Plan (DMP) documents how you will handle research data throughout your project. Writing a DMP prompts you to think and document decisions you might otherwise leave implicit.

Decide before data collection whether you will eventually share your data publicly (and where), in order to (i) get ethics approval on the right plan, (ii) design consent forms for participants, (iii) collect appropriate metadata for the target repository, etc.
Start with what you know, and refine the details as your project develops. Your DMP is a living document that you will refine to match the reality of your project while ensuring data protection and streamline collaborations (see 2.2. Data Management, 3.1. Data Processing & Analysis, and 4.1. FAIR Data Sharing).

Your DMP will ask:

What data will you collect or generate (types, formats, volume, sources)? See 2.1. Data Collection.
How will you describe it (metadata standards, documentation practices)? See 2.2. Data Management for these and the next questions.
How will you organize files (naming conventions, folder structure, versioning)?
Where will you store it (locations, backups, access controls)?
How will you ensure quality (validation checks, error-handling)?
How will you share outputs (repositories, licenses, embargo periods)? See our lecture “Why share data openly?” and 4.1. FAIR Data Sharing
What constraints apply (consent, anonymization, GDPR, data use agreements)? See our lecture “Maintaining privacy with open data”, 1.2.3. Ethics and 2.3. Ethics & Privacy.

The specific questions vary by discipline, data type, and funder requirements. DMP tools like RDMO guide you through the relevant questions with funder-specific templates.

Tips for research groups to streamline this process

Share templates and example DMP amongst team members on a common server space.
Create Standard Operating Procedures for the team. Define when a data management plan must be updated, who is responsible, and how updates are reviewed/approved and communicated.

LEARN MORE

OSC Lecture

Why share data openly?

An introduction to the what, why, and how to make data open (30 min)

OSC Lecture

Maintaining Privacy with Open Data

How to make data open without revealing sensitive information (1h)

OSC Tutorial

Data Management Plans

Overview of components, tips, and tools. (30 min)

TOOLS & RESOURCES

Supported at LMU

RDMO

Funder-compliant DMP templates (e.g. DFG, ERC).

RIOjournal

Examples of DMPs by discipline.

1.4 Study Design & Analysis Plan

Why should you plan your statistical plan prior to collecting data?

Humans are prone to cognitive biases such as confirmation bias (seeking information that supports existing beliefs) and hindsight bias (believing outcomes were predictable after the fact). In research, these biases can distort findings, especially when researchers make analytic decisions after seeing results. Although statistical testing typically accepts a 5% false positive rate, “researcher degrees of freedom” — choices about data collection, exclusions, transformations, sample size, covariates, etc. — can dramatically inflate false positives when decisions are made post hoc. Practices like increasing sample size until reaching statistical significance, selectively removing outliers, or trying multiple analytic strategies increase the likelihood of false-positive results. See how easy it is to find false “significant” results by using our p-hacking tool.

The core problem is that analyses guided by observed outcomes allow biases to influence decisions, making many reported effects unreliable. A key remedy is transparency and preregistration.

Benefits of preregistration

Preregistration, that is, specifying hypotheses, methods, and analysis plans before data collection or analysis, limits bias in confirmatory testing while still allowing exploratory analyses, clearly distinguishing robust hypothesis tests from hypothesis-generating work. This improves credibility, limits false positives, and often leads to better study design through early methodological feedback.

Preregistration can be beneficial for various type of studies, including:

experimental studies (i.e. studies with a manipulated variable): it will define what will be your confirmatory analysis and strengthens your claim
observational or exploratory studies: it will help you move along the exploratory-confirmatory continuum
qualitative studies: it will provide a way to document e.g. your positionality towards a subject in the course of a project.

What is included in a preregistration?

Several preregistration templates exist. While the standard Open Science Framework (OSF) preregistration template is most commonly used, some are tailored for specific field or specific methods (e.g. systematic review, qualitative work, secondary data analysis).

Your preregistration will define your study’s:

Hypothesis and predictions
Data collection procedures
Sample size and stopping rule
Variables (manipulated, measured, indices)
Statistical method (model, dependent and independent variables, covariables, transformations)
Data exclusion criteria
How to deal with missing data

A great tool to create your statistical plan, especially for early career researchers still learning statistics and needing feedback from supervisors, collaborators, or statisticians on their design, is to simulate data, and write the possible statistical tests to analyze that data (see 1.4.2. Simulation of data and 1.4.3. Power analyses). Including an analyses script (developed on simulated data) with your preregistration is optional but recommended.

To get support with planning your analytical approach, you can book a consultation with the LMU statistical consulting unit (StaBLab).

Publishing process

Once your study plan is finalized:

Submit your preregistration before collecting new or analyzing existing data. You can do so on discipline specific registries (see 1.1.2. Preregistrations) or discipline agnostic repositories such as the OSF.
Embargo your plan if you are concerned about scooping. On the OSF, your preregistration can be kept private for a predetermined amount of time, and for a maximum of 4 years.
Include your preregistration’s DOI in your manuscript. Make your registration public upon the publication of your manuscript.

Creating a preregistration improves transparency and allows for valuable early feedback from collaborators. An even stronger approach is submitting preregistrations directly to journals (then called “Registered Reports”), enabling peer review at a stage where methodological adjustments are still possible.

Registered Reports

Registered Reports are a publication format, now adopted by over 300 journals (see participating journals), where preregistrations are peer-reviewed before data collection. Reviewers evaluate the hypotheses, methods, and planned analyses, allowing methodological improvements. If the plan is approved, the journal grants in-principle acceptance, meaning publication is guaranteed provided researchers follow the protocol.

After completing the study, authors add results and discussion sections, clearly separating preregistered confirmatory analyses from exploratory ones. Final review focuses on adherence to the approved plan and the validity of conclusions, not on whether results are significant. This model shifts incentives toward asking important questions and using rigorous methods rather than chasing striking or ‘novel’ outcomes.

LEARN MORE

OSC Tutorial

TBA: Preregistration tutorial

Step-by-step guide to creating preregistration. (Xh)

TOOLS & RESOURCES

OSC Tool

P-hacking tool

Interactive app to realize how easy it is to find false "significant" results.

Center for Open Science

List of journals offering Registered Reports.

Open Science Framework

Preregistration templates, embargoes, file storage.

In our context, a computer simulation is the generation of artificial data to build up an understanding of real data and the statistical models we use to analyze them. You can simulate data to:

Test your statistical intuition or demonstrate mathematical properties you cannot easily anticipate.
Example: Check whether there are more than 5% significant effects (assuming \(\alpha = .05\)) when random data from \(H_0\) are generated.
Understand sampling theory and probability distributions or test whether you understand the underlying processes of your system.
Example: See whether simulated data drawn from specific distributions is comparable to real data.
Perform power analyses.
Example: Assess whether the sample size (within a simulation repetition) is high enough to detect a simulated effect in more than 80% of the cases. (see 1.4.3. Power analyses)
Prepare a pre-analysis plan.
Example: To strengthen your planned confirmatory analyses before collecting data, consider sharing a simulated dataset with a statistician or mentor. This allows for specific feedback on suitable statistical tests. The resulting analysis code can accompany your preregistration or registered report (see 1.4.1. Pre-analysis planning) so reviewers can clearly see your intended approach. When real data are collected, they can be directly substituted into the code to generate results.

Generating an artificial dataset in R (see our simulation tutorial) is much easier than you might think and is often very helpful, even when you need to make assumptions about variable distribution or when the parameter space is not well known.

LEARN MORE

OSC Tutorial

R Tutorial

Learn R programming. (3h)

OSC Tutorial

Data simulation in R

Easy data simulations in R. (2h)

Power analysis is relevant whether you are designing a project from scratch or running an analysis on already existing data. There are two main types of power analyses:

A priori power analysis

Simulate data to calculate the smallest sample size required to detect the smallest effect of interest. See our advanced power analyses tutorial using R.

For a very basic power calculation, you can use simple R functions if you know 3 out of 4 of these parameters:

required sample size n (usually the one missing)
desired power 1 - β (default 0.80)
the alpha level α (default 0.05)
the expected effect size (has to be estimated or extracted from the literature on the form of d, f, etc.)

To get support with pre-analysis planning, you can book a consultation with the LMU statistical consulting unit (StaBLab).

Post-hoc power analysis

Compute a post-hoc power when you are not be able to control the sample size for your project. Beware: This power computation comes in two flavors - one is legitimate, and one is flawed and not defensible.

The legitimate post-hoc power is computed with your actual n, and the same effect size that you plugged into your a-priori power analysis. This analysis gives you the achieved power to detect your assumed effect.

The flawed version of post-hoc power is called “observed power”: If an analysis yields a non-significant result, some researchers calculate the post-hoc power, but plug in the observed effect size. “Observed power”, however, is just a one‑to‑one function of the p‑value (a non-significant p-value returns a low power < 50 %, a just significant p‑value of .05 always yields a power of exactly 50%). Observed power adds no new information to the p‑value and is essentially meaningless. Do not compute this type of post-hoc power!

LEARN MORE

OSC Tutorial

R Tutorial

Learn R programming. (3h)

OSC Tutorial

Power Analyses

Data simulations for GLMs, LMEs, and SEMs in R. (6h)

1.1 Explore & Reuse

LEARN MORE

Introduction to Zotero

TOOLS & RESOURCES

OpenAlex

Europe PubMed Central

TOOLS & RESOURCES

Open Science Framework

AsPredicted

PreclinicalTrials.eu

AnimalStudyRegistry.org

ClinicalTrials.gov

How to find existing datasets?

How to reuse a dataset?

TOOLS & RESOURCES

re3data

DataCite Commons

Fig Share

Open Science Framework

Zenodo

LEARN MORE

Git Tutorial

GitHub Tutorial

Code Publishing

TOOLS & RESOURCES

Zenodo

Software Heritage

GitHub

1.2 Legal Requirements

Appropriate level of documentation and standards to allow reproduction:

Appropriate level of documentation and standards to allow replication:

Public access to research results:

TOOLS & RESOURCES

LMU Guidelines for Safeguarding Good Scientific Practice

Your ethics proposal will typically include information on:

LEARN MORE

Data Management Plans

TBA: Data Anonymization

Power Analysis

1.3 Research Data Management Plans

Your DMP will ask:

LEARN MORE

Why share data openly?

Maintaining Privacy with Open Data

Data Management Plans

TOOLS & RESOURCES

RDMO

RIOjournal

1.4 Study Design & Analysis Plan

Why should you plan your statistical plan prior to collecting data?

Benefits of preregistration

What is included in a preregistration?

Publishing process

Registered Reports

LEARN MORE

TBA: Preregistration tutorial

TOOLS & RESOURCES

P-hacking tool

Center for Open Science

Open Science Framework

LEARN MORE

R Tutorial

Data simulation in R

A priori power analysis

Post-hoc power analysis

LEARN MORE

R Tutorial

Power Analyses

Plan & Design Checklist