Making code publicly available demonstrates the reproducibility of your results and enables others to understand, verify, and build upon your analytical methods.
So far, your code was either backed up on the secured LRZ Gitlab (e.g. if your data and/or code are sensitive), or on GitHub (see 3. Analyze & Collaborate). Before the submission of a manuscript to a journal and/or upon the acceptance of a manuscript, there are small additional steps that need to be done to publish your code.
- verify the structure of your repository, the readability of your scripts, the completeness of the documentation (see Analyze & Collaborate Checklist).
- make a clean version public, e.g. on GitHub
- add a license
- get a DOI, e.g. through Zenodo
If you work with sensitive data that cannot be anonymized and shared:
- generate a simulated random dataset to allow for the published code to run (which you may have already done if you simulated data in order to prepare a preregistration, see 1.4. Study Design & Analysis Plan), or
- create a synthetic dataset with the same properties as the original dataset to allow others to re-derive an approximation of the original results and conduct further exploratory analyses.
During the course of data analyses, you created scripts and documentation such as README and data dictionary for yourself and collaborators. Before sharing publicly, review them from the perspective of someone who knows nothing about your project.
- Expand your README to include a description, involved data, computational requirements and dependencies (i.e. what software, packages, and their version, need to be installed to run the analyses), list of results.
- Document your code from an external perspective, using literate programming (e.g. Quarto) or comments.
- Double check that no sensitive information remains in your repository (e.g. code comments, history of sensitive data)
See our code publishing tutorial for more information on how to prepare your code repository for sharing.
If you work with sensitive data, you must not include the raw or processed data in the version-controlled repository to be shared.
Instead, explicitly exclude the data directory using the .gitignore file from the start. An easy solution (which, however, discards the version history for all files), is to create a new local repository that contains all project files except the data and only push that to the public repository.
Importantly, if data are removed from an existing repository, they may still remain accessible in the repository’s history, since previous states of the project can be restored. If sensitive data are accidentally committed and pushed, it is possible to rewrite the repository history to remove them retrospectively. However, this process is complex and error-prone, so it is best avoided by ensuring that sensitive data are excluded from version control from the outset.
Sharing your code allows for other researchers to clearly see which analytic methods were applied to the data. Ideally, they should also be able to rerun the code and verify the reproducibility of the reported results.
You can provide the data required to run the code in several ways:
- The real (anonymized) data. This can be done either (1) by including the dataset in the repository so that the code can access it locally and run directly, or (2) by configuring the code to retrieve the data from an external source, such as a database, API, or external data repository. A practical workflow is to include “small”/one-shot datasets directly into a combined code/data-project (e.g. on GitHub), and to store “large”/reusable datasets (which deserve their own DOI) separately in a repository specialized for research data.
- Simulated random data. This option mainly demonstrates that the code runs without errors. However, the original results cannot be verified and further analyses are not meaningful. See our data simulation tutorial.
- Synthetic data that mimic key properties of the real data. When privacy-sensitive data cannot be shared, synthetic datasets can provide a useful alternative. Compared with purely random simulated data, synthetic data can resemble the structure and characteristics of the original dataset, allowing other researchers to rerun the analyses and assess whether the main results can be reproduced. In addition, openly shared synthetic data enable exploratory analyses that may generate new hypotheses, and in some cases analyses conducted on synthetic data can approximate results obtained from the real data.
LEARN MORE
A license tells others what they can do with your code. Licensing your code consists in adding a file called LICENSE.txt next to your code, that contains the appropriate legal text. Without one, or equivalent statements, others cannot legally reuse your code, even if it is publicly available e.g. on GitHub. Common open licenses for code are:
- CC0 (Public Domain) places no restrictions. Anyone can use, modify, and redistribute without attribution. Recommended for generic code where maximum reuse is the goal.
- MIT is a simple license that requires users to credit you when they reuse, modify, and redistribute your code.
- Apache 2.0 is a safer license (covering more legal cases) that requires users to credit you and state the changes you made to the code.
Learn more about open licenses for data and code in our code publishing tutorial or using the tool ‘Choose an open source license’.
To publish you code on Zenodo, we recommend to
- Push your clean repository to GitHub (see GitHub tutorial).
- Create an account on Zenodo.
- Link your GitHub account to Zenodo. Navigate to your profile > account settings > link external accounts > GitHub > authorize Zenodo to access your GitHub account > a list of your GitHub repositories should appear. Enable the repository by toggling the switch next to it; refresh the page to check to see if the visual indicator that the repository is connected appears
- Go to GitHub and create a release. Zenodo will automatically download a .zip-ball of each new release and register a DOI.
- Add DOI badge to your README. After your first release, a DOI badge that you can include in GitHub README will appear next to your repository in your list of GitHub repositories on Zenodo. This allows others to easily find and cite your archived code
- Create new release when needed. If you make a new release, it will create a new version on Zenodo, under the same overall DOI, with a ‘sub-DOI’ to identify a specific version.
Create a Zenodo “community” for the team. After team members connect their repositories from their personal GitHub account to Zenodo, and archive releases with a DOI, these records can be added to a shared Zenodo community. A community provides a central space where all software and other research outputs produced by the group can be collected and displayed, making them easier to find, cite, and showcase at the team level.