Loading your raw data


Transferring your data to the Core

The sequencing facility will let you know once your samples are processed. They will send you an email with instructions on how to download your data.

Transfer the raw data directly to the Core's downloads folder on the Volac server. You will likely use the copy URL, curl, command. Make sure you have established a stable connection before transferring the data.

The command will take some time to run depending on the amount of data and the transfer speed. You can detach and reconnect to your tmux session at any point. This will not interfer with execution of the command.

Example

The command to transfer the raw data may look something like this:

curl -sL https://leopard.bios.cf.ac.uk/nextcloud/index.php/s/gFHjWEoIR26OZec/download > /genomics/home/vol-genomics/genome_tools/Core/downloads/2022-04-29.zip

You can choose the name of the downloaded file, in this case 2022-04-29.zip.


Unpacking your data

The data files are compressed. To extract the data files enter, substituting YYYY-MM-DD for the actual file name:

unzip YYYY-MM-DD.zip -d /genomics/home/vol-genomics/genome_tools/Core/downloads/

Information

ZIP is a common file format that's used to compress one or more files together into a single location. This reduces file size and makes it easier to transport or store. A recipient can unzip (or extract) a ZIP file after transport and use the file in the original format.


Loading your data into the input folder

Make a new sub-directory in the input directory using:

mkdir ~/genome_tools/Core/input_core/YYYY-MM-DD

Move the raw data files to the Core's inut folder using:

mv ~/genome_tools/Core/downloads/YYYY-MM-DD/*_001.fastq.gz ~/genome_tools/Core/input_core/YYYY-MM-DD

Replace YYYY-MM-DD with actual folder name. You do not need to replace the *; it is a glob character.


Checking transfer is complete

Navigate to new folder in input_core and check there are two files for every sequenced isolate.

The two files are named *_R1_001.fastq.gz and *_R2_001.fastq.gz, where * is a unique identification code assigned by the sequencing facility.

Example

Forward: AB_PS_1_S1_R1_001.fastq.gz, and Reverse: AB_PS_1_S1_R2_001.fastq.gz.

Forward: AB_PS_2_S2_R1_001.fastq.gz, and Reverse: AB_PS_2_S2_R2_001.fastq.gz

The files are called 'forward' R1, and 'reverse' R2.

Important

If your samples were processed using an Illumina MiSeq, rather than an Illumina NextSeq the sample names will include an additional _L001. This is important to know when you start the Core.


Removing the intermediate files

Delete the unzipped files from the downloads folder.

Do NOT delete YYYY-MM-DD.zip.