Kymberley R.
Scroggie
ab,
Klementine J.
Burrell-Sander
ab,
Peter J.
Rutledge
ab and
Alice
Motion
*ab
aSchool of Chemistry, The University of Sydney, NSW, Australia. E-mail: alice.motion@sydney.edu.au
bDrug Discovery Initiative, The University of Sydney, NSW, Australia
First published on 18th July 2023
Electronic laboratory notebooks have expanded the utility of the paper laboratory notebook beyond that of a simple record keeping tool. Open electronic laboratory notebooks offer additional benefits to the scientific community including increased transparency, reproducibility, and integrity. A key element underpinning these benefits is facile and expedient knowledge sharing which aids communication and collaboration. In previous projects, we have used LabTrove and LabArchives as open electronic laboratory notebooks, in partnership with GitHub (an open-source web-based platform originally developed for collaborative coding) for communication and discussion. Here we present our personal experiences using GitHub as the central platform for many aspects of the scientific process, including version-controlled recording of experiments, results and interpretation, data storage, project management, workflows, communication, and collaboration. We report on the utility of GitHub as an open electronic laboratory notebook for chemistry research, and discuss our experiences employing it with the Open Source Mycetoma and Open Source Tuberculosis consortia. By outlining its features and shortcomings through their implementation in our work, we demonstrate how using GitHub as a central platform can aid the real-time sharing of knowledge and collaboration, and further democratise scientific research within both open and traditional research models.
ELNs enable knowledge sharing, facilitating faster transfer of knowledge and collaboration, which in turn expedites future knowledge generation and improves research efficiency.2,3 The digital storage of information further increases efficiency with greater longevity, readability and searchability. Despite these benefits, the shift away from paper to electronic has been an evolutionary process rather than revolutionary and scientists, particularly those in academia, have been slow to accept and adopt ELNs.4
The ability of scientists to move to electronic documentation of their work with minimal disruption has been identified as the key factor for broader acceptance of ELNs in an academic setting.5 However, the highly diverse nature of different disciplines within academia leads to a broad range of specific needs that require highly specialised or custom ELNs to affect a seamless transition. While some commercial ELNs can support many specialised requirements, their licensing and maintenance costs often put them out of reach for individual academic research groups.6,7 Instead, many have made use of generic, freely available platforms such as OneNote,8 EverNote9,10 or Google Docs,11 with others developing their own ELNs to reap the specific benefits they require.12–14
We have successfully used several different ELNs for our own work as part of different open source drug discovery consortia, including Open Source Malaria15 (http://opensourcemalaria.org/), Open Source Mycetoma16 (https://github.com/OpenSourceMycetoma) and Open Source Tuberculosis (http://opensourcetb.org/). Open source drug discovery is a new approach to drug discovery in which all aspects of research are shared publicly and in real-time (i.e. immediately as it is produced) to facilitate collaboration and knowledge sharing.17 These consortia follow the principles of open science, in which scientific knowledge is developed collaboratively and made freely accessible to any interested parties,18 and more specifically Todd's Six Laws of Open Science.19
In line with openly sharing our research, we have hosted ELNs on the open source platform LabTrove20 and the commercial ELN LabArchives (https://www.labarchives.com/) while simultaneously using GitHub (http://www.github.com/) to support discussion and collaboration. To bring together the sharing of knowledge and collaboration into a single open and central location, we have now explored the use of GitHub itself as the ELN (Fig. 1). Using GitHub as both an ELN and a hub for instant communication elevates it to the status of a “collaboratory” as envisioned by Wulf – a “centre without walls, in which the nation's researchers can perform their research without regard to geographical location, interacting with colleagues, accessing instrumentation, sharing data and computational resource, and accessing information in digital libraries”.21 This article draws on the experiences of two of the authors using GitHub as an ELN for various synthetic chemistry projects and provides preliminary findings into its usability. We report on the utility of GitHub as an open ELN, detail its features in this dimension, and discuss its implementation for open source drug discovery. We also share an ELN template GitHub repository for those considering alternative ELNs. While we have used GitHub as an open ELN and repositories are open by default we note that for projects that require confidentiality or follow a traditional research methodology, information and data can be held within closed repositories with access limited to only invited users.
The version control enabled by Git is directly transferable to ELNs. Importantly for the validity and verifiability of scientific research, using Git enables users to keep track of the who, what, when and even why: when saving changes, GitHub offer the option to provide a short description of what was changed and why the change was made. This record-keeping enables greater transparency, making it easy to see if an edit was made to fix typos, add information, or alter data, and is crucial in maintaining integrity and preventing misunderstanding or misuse of data.23 Furthermore, all activities are attributed to the user via their display name, bestowing a level of accountability and responsibility, while also ensuring that contributors receive attribution for their work.
A number of user interfaces (UIs) for Git exist, including GitHub, GitLab and Gitea, each offering slightly different user experiences. Each can be used as an ELN as described in this article however, Github is more openly accessibly and offers additional UI features (e.g.Discussions) enabling public discussion making it more suitable for hosting open source and collaborative projects.
GitHub's accessibility is also important to the open science ethos. No account or subscription is required to view work within a public GitHub repository, allowing people to access data without concerns of cost or association with institutions. Through a standard internet browser, anyone can view content as soon as it is published without the researcher needing to “share” their work, or the reader having to access any proprietary products. In contrast, to view content on GitLab it is required to have an account and be signed in, while Gitea is a self-hosted UI.
Not only is the content on GitHub openly accessible, but users can connect to content on GitHub in different ways: from the web-based site, desktop app, or mobile app. The mobile app is available for Android and iOS and is easy to use on a standard smartphone or tablet. Many popular ELNs are primarily laptop-based,23 and while no research has yet specifically examined the use of mobile apps for ELNs, we envision that this mode of access will improve record-keeping in laboratory settings due to the ease of access, portability, and ubiquity of mobile devices. Most, if not all, researchers are able to access the GitHub app on their device to swiftly read through past methods, add details and observations in the moment, or snap a photo for the ELN. A similar sentiment has been expressed by others who suggest that many researchers are likely to prefer mobile-based ELNs for their portability and extra features, like the built-in camera and option to annotate images using a stylus.24,25
We have used GitHub repositories as an ELN for both laboratory-based synthetic projects and computer-based social science projects and describe our experiences using it in the synthetic chemistry laboratory as a case study below.
Repositories can be set up either by an individual or an organisation (e.g. research group) and assigned to individuals. Within Open Source Mycetoma and Open Source Tuberculosis there are topic-specific repositories which support discussion and collaboration, while ELN repositories are created by individuals and linked to the relevant organisation's repositories. This gives researchers the freedom to organise ELN repositories in a way that suits their individual needs. For example, while multiple projects can be contained in a single repository, a researcher may choose to have multiple repositories, one for each project they are involved in. Alternatively, a research group could set up repositories for each project with all researchers working on the project contributing to the single repository. Either way, an overview of all repositories can be viewed on both the individual's and organisation's profile. This interconnectivity of related work and segregation of distinct topics makes GitHub a useful tool, not only as an ELN, but also as a platform for the presentation of research and collaboration.
Along with labels, issues are sorted into projects, consolidating all experimental work relevant to a given branch of investigation in one central location. Each project has its own landing page, accessible from the Projects tab, which contains links to all issues assigned to it as cards. Cards can be further sorted into columns and categorised. The authors favour the division into To do, In Progress, and Done categories. Using this system to organise their work, it is easy to keep track of planned, ongoing, and completed experiments and assess progress. The process can also be automated, so that performing specific actions automatically shifts cards into a new column within its assigned project. For example, one author has a workflow whereby assigning a newly created issue to a project adds the respective card to do and closing an issue moves it to Done. As with issues, projects are ‘closed’ and archived once the line of investigation is completed. This capacity for curation is an important workflow tool, as it prevents landing pages from being cluttered with obsolete links or information.
This method of automated workflow integrated into the capture of metadata at the source (the initial creation of a new issue) helps reduce the burden of curation.30 Previous work has noted the “blank canvas effect”, whereby researchers fail to add metadata due to unfamiliarity, rather than unwillingness.31 GitHub actively encourages the assignment of metadata through labels and projects categorisation, and the capture of metadata at the source. Upon creation of a new issue GitHub prompts users to add labels. This comparatively strong metadata support and active encouragement may be more effective than expecting users to create and curate their own labels without prompting.32 We suggest that the project, status and label features offered by GitHub facilitate individual project management, thus making researchers more likely to incorporate them into their ELNs for strategic reasons rather than because the system requires it.
Another important aspect of curation which is especially useful for making open-source work accessible to those not already involved in a project is the Wiki tool. This provides a place for a formal presentation of the work contained within the ELN, with pages organised according to topic. In this synthetic chemistry case study, a page in the Wiki has been dedicated to every different reaction, with this reaction landing page housing links to the notebook page for each attempt at the reaction, both successful and unsuccessful, alongside optimised methods, exemplar characterisation data and other relevant notes. These pages provide meta context and information that is often not present within the notebook pages of a typical ELN and are easily cross-linked, so that each page refers to multiple other relevant pages within the Wiki. This makes it easier for other researchers to find useful methods and data, while maintaining a high level of transparency, which improves both research integrity and reproducibility of results.
It is important to note that this method of organising data means that all attempts at a reaction are made publicly available, not just the successful experiments. This not only allows other researchers to come to their own conclusions about the results obtained, it also prevents duplication of unsuccessful efforts, as researchers can easily see what has and hasn't worked. While such complete transparency about the scientific process can appear intimidating to some researchers, it is a valuable asset of open ELNs and a powerful tool to support research integrity.29,33,34
The immediate publication of additions to notebook or wiki pages, along with the ease of sharing these updates via email or social media sites allows new ideas and experiments to be made available to the broader public almost immediately, and avoids the delays typically seen when research is shared through formal channels like research papers and conference presentations. Indeed, the real-time sharing of research through wiki pages and specific issues was particularly useful for work conducted as part of the Open Source Tuberculosis project, which involved using Twitter to seek advice and suggestions on synthetic procedures. Both specific attempts at a reaction and a proposed synthetic scheme could be easily shared online, making it straightforward for interested parties to read more and make informed recommendations based on the experimental work already completed. Sharing this work resulted in a number of useful suggestions on alternative reagents and reaction conditions for experiments, further evidence that openly sharing knowledge in real-time is a powerful collaboration tool.
Unlike other UIs for Git, GitHub also facilitates public collaboration in a forum-like structure through the Discussions tab. Discussions is a relatively new feature, and before its introduction, the authors conduced discussions through dedicated issues. However, as it is typical of any online forum, we propose that its major application in collaboration lies in its ease of access for GitHub users. Having separate, dedicated spaces for the ELN and discussion within a single central system will conceivably facilitate conversations which might not be related to a specific experiment, such as conversations about a project's overall direction, organisation of meetings, or general brainstorming and sharing of ideas. Each Discussion can be organised into appropriate categories, so that users can quickly find the conversations they are interested in without having to sift through irrelevant topics, with links to relevant issues and wiki pages as appropriate.
Additionally, the customisability of GitHub allows repositories to be set up in a way that makes it easy for unfamiliar readers to quickly acquaint themselves with both the overall project and any recent updates, aiding the collaborative process. When accessing a repository, visitors are initially directed to the Code tab, making this the ideal place for introducing the ELN's owner(s) or curator(s), and the project(s) it relates to, through a README.md file. Hyperlinks guide visitors to other relevant sites, such as researchers' websites and social media profiles, and other GitHub repositories related to the project.
Fortunately, several factors and workarounds make Markdown a less imposing challenge than it first seems. First, GitHub provides users with a truncated menu of “clickable” formatting options which insert the characters or commands required to render common format styles. In addition, there is a Preview tab available for all text entries, which shows the user how the rendered and formatted text will appear, and so speeds the learning process. Users can start by clicking a desired formatting option, and gradually learn the necessary text entry, much the same as learning a hot-key for a formatting option in Microsoft Office. Secondly, because Markdown is a commonly used markup language, many guides to using it are available online (e.g.https://www.markdownguide.org/, https://docs.github.com/en/get-started/writing-on-github/getting-started-with-writing-and-formatting-on-github). There are also apps that enable users to write and format as they normally would in programs like Microsoft Word, with the text automatically converted to a Markdown format which can then be copied across to GitHub.
Another way to circumvent the challenges inherent in learning Markdown is to build and share templates. GitHub offers the option of creating issue templates which facilitates quicker creation of new issues. This works particularly well for synthetic experiments, with the same basic template being followed when writing up most experiments. The chosen template contains the desired formatting with space for researchers to add their own methods, results and data in the appropriate sections. A template may be created by researchers themselves, sourced from other group members, or from online resources – we have created and shared a number of templates to be used in different settings. Templates are already used in other ELNs to expedite the creation of lab entries with similar or near-identical information, as occurs with repeated or parallel reactions and procedures.13,27,41 Although previous work suggests templates are not always widely adopted,20 GitHub's requirement that users work with Markdown to create posts makes them more attractive.
Many discussions of ELNs also envisage an extension of online programs to encompass the broader experimental environment, in so-called ELEs.43 These may include integration of certain workflows into an ELE, so that certain experimental parameters, conditions and results can be automatically updated in and between connected ELNs.41 Future applications in this space could expand capabilities to include functionalities of specific use for researchers, for example integration of commonly used chemistry programs like molecular structure drawing tools would make it easier to add relevant data to GitHub. More ambitious proposals include incorporating a LIMS or existing browser-based sites like Reaxys and SciFinder, allowing researchers to quickly scan the web for information on specific substances or reactions from within an interlinked ELE. While these converging functionalities are currently beyond our capacity and in some part contingent on GitHub becoming more established as a site for hosting ELNs and other aspects of scientific research, GitHub does currently offer integration with many applications and actions to automate workflows.† Furthermore, GitHub is home to software developers and thus an ideal location to recruit collaborators with the requisite knowledge to develop code-based solutions.
Importantly, it offers version control, encourages and enables the inclusion of metadata and curation, and expedites the sharing of knowledge with real-time updates and very low barriers to discussion and collaboration between interested parties. The fact that GitHub has not been designed with a single field of research in mind also makes it ideal for cross-discipline collaboration, as each discipline can adapt different elements of GitHub's functionality for their own use while maintaining the same core GitHub infrastructure. Instead of having to familiarise themselves with new ELN software and layouts, or swap between multiple ELN providers, researchers working on multidisciplinary projects can use a single, centralised service with consistent controls and familiar structure.
While there are some features which are undeniably more oriented towards coders, such as the Actions tab in which users can set up workflows using code, these features do not detract from GitHub's usefulness as an ELN, which lies mainly in its adaptability and capacity for knowledge-sharing and collaboration. To overcome the potential impediment of using Markdown, we offer guides for those looking to trial GitHub for scientific research, as well as a template repository containing an issue template with appropriate labels, and a wiki template with suggested headings and formatting. Additionally, although this work features GitHub's application in the context of open science, we note that repositories can also be made private. Such repositories are accessible only by invitation, and thus appropriate for settings in which confidentiality is required.
GitHub's practical features and free, open source nature make it an attractive alternative not just to paper-based laboratory notebooks, but also to other ELNs, which can be expensive, inflexible, exclusive, and unsuitable for openly accessible research. We therefore encourage researchers in all disciplines to trial GitHub as an ELN, and to share their experiences in using it for their own projects (https://github.com/TheBreakingGoodProject/ELN-Templates/discussions/2).
Footnote |
† As of April 1, 2022. Information obtained from https://github.com/marketplace |
This journal is © The Royal Society of Chemistry 2023 |