The GitHub code repository and collaboration platform is widely used between researchers to publish their work and to collaborate on projects source code.
Even more interestingly a few months ago GitHub announced improved support for researchers making it possible to get a Digital Object Identifier (DOI) for any GitHub repository archive.
With a DOI for your GitHub repository archive your code becomes formally citable in scientific publications.
The latest Nextflow release (0.9.0) seamlessly integrates with GitHub. This feature allows you to manage your code in a more consistent manner, or use other people’s Nextflow pipelines, published through GitHub, in a quick and transparent manner.
The idea is very simple, when you launch a script execution with Nextflow, it will look for
a file with the pipeline name you’ve specified. If that file does not exist,
it will look for a public repository with the same name on GitHub. If it is found, the
repository is automatically downloaded to your computer and the code executed. This repository
is stored in the Nextflow home directory, by default $HOME/.nextflow
, thus it will be reused
for any further execution.
You can try this feature out, having Nextflow (version 0.9.0 or higher) installed in your computer, by simply entering the following command in your shell terminal:
nextflow run nextflow-io/hello
The first time you execute this command Nextflow will download the pipeline
at the following GitHub repository https://github.com/nextflow-io/hello
,
as you don’t already have it in your computer. It will then execute it producing the expected output.
In order for a GitHub repository to be used as a Nextflow project, it must
contain at least one file named main.nf
that defines your Nextflow pipeline script.
Any Git branch, tag or commit ID in the GitHub repository can be used to specify a revision,
that you want to execute, when running your pipeline by adding the -r
option to the run command line.
So for example you could enter:
nextflow run nextflow-io/hello -r mybranch
or
nextflow run nextflow-io/hello -r v1.1
This can be very useful when comparing different versions of your project. It also guarantees consistent results in your pipeline as your source code evolves.
The following commands allows you to perform some basic operations that can be used to manage your pipelines. Anyway Nextflow is not meant to replace functionalities provided by the Git tool, you may still need it to create new repositories or commit changes, etc.
The ls
command allows you to list all the pipelines you have downloaded in
your computer. For example:
nextflow ls
This prints a list similar to the following one:
cbcrg/piper-nf
nextflow-io/hello
By using the info
command you can show information from a downloaded pipeline. For example:
$ nextflow info hello
This command prints:
repo name : nextflow-io/hello
home page : http://github.com/nextflow-io/hello
local path : $HOME/.nextflow/assets/nextflow-io/hello
main script: main.nf
revisions :
* master (default)
mybranch
v1.1 [t]
v1.2 [t]
Starting from the top it shows: 1) the repository name; 2) the project home page; 3) the local folder where the pipeline has been downloaded; 4) the script that is executed
when launched; 5) the list of available revisions i.e. branches + tags. Tags are marked with
a [t]
on the right, the current checked-out revision is marked with a *
on the left.
The pull
command allows you to download a pipeline from a GitHub repository or to update
it if that repository has already been downloaded. For example:
nextflow pull nextflow-io/examples
Downloaded pipelines are stored in the folder $HOME/.nextflow/assets
in your computer.
The clone
command allows you to copy a Nextflow pipeline project to a directory of your choice. For example:
nextflow clone nextflow-io/hello target-dir
If the destination directory is omitted the specified pipeline is cloned to a directory
with the same name as the pipeline base name (e.g. hello
) in the current folder.
The clone command can be used to inspect or modify the source code of a pipeline. You can eventually commit and push back your changes by using the usual Git/GitHub workflow.
Downloaded pipelines can be deleted by using the drop
command, as shown below:
nextflow drop nextflow-io/hello