Loading your raw data
Transferring your data to the Core
The sequencing facility will let you know once your samples are processed. They will send you an email with instructions on how to download your data.
Transfer the raw data directly to the Core's downloads folder on the Volac server. You will likely use the copy URL, curl
, command. Make sure you have established a stable connection before transferring the data.
The command will take some time to run depending on the amount of data and the transfer speed. You can detach and reconnect to your tmux session at any point. This will not interfer with execution of the command.
Example
The command to transfer the raw data may look something like this:
curl -sL https://leopard.bios.cf.ac.uk/nextcloud/index.php/s/gFHjWEoIR26OZec/download > /genomics/home/vol-genomics/genome_tools/Core/downloads/2022-04-29.zip
You can choose the name of the downloaded file, in this case 2022-04-29.zip
.
Unpacking your data
The data files are compressed. To extract the data files enter, substituting YYYY-MM-DD
for the actual file name:
unzip YYYY-MM-DD.zip -d /genomics/home/vol-genomics/genome_tools/Core/downloads/
Information
ZIP is a common file format that's used to compress one or more files together into a single location. This reduces file size and makes it easier to transport or store. A recipient can unzip (or extract) a ZIP file after transport and use the file in the original format.
Loading your data into the input folder
Make a new sub-directory in the input directory using:
mkdir ~/genome_tools/Core/input_core/YYYY-MM-DD
Move the raw data files to the Core's inut folder using:
mv ~/genome_tools/Core/downloads/YYYY-MM-DD/*_001.fastq.gz ~/genome_tools/Core/input_core/YYYY-MM-DD
Replace YYYY-MM-DD
with actual folder name. You do not need to replace the *
; it is a glob character.
Checking transfer is complete
Navigate to new folder in input_core
and check there are two files for every sequenced isolate.
The two files are named *_R1_001.fastq.gz
and *_R2_001.fastq.gz
, where *
is a unique identification code assigned by the sequencing facility.
Example
Forward: AB_PS_1_S1_R1_001.fastq.gz
, and Reverse: AB_PS_1_S1_R2_001.fastq.gz
.
Forward: AB_PS_2_S2_R1_001.fastq.gz
, and Reverse: AB_PS_2_S2_R2_001.fastq.gz
The files are called 'forward' R1
, and 'reverse' R2
.
Important
If your samples were processed using an Illumina MiSeq, rather than an Illumina NextSeq the sample names will include an additional _L001
. This is important to know when you start the Core.
Removing the intermediate files
Delete the unzipped files from the downloads folder.
Do NOT delete YYYY-MM-DD.zip
.