data transfer to client (#1) · Epics · GIGA-Bioinfo

data transfer to client

# GIGA # ULG non GIGA https://sam.uliege.be/get.php?editeurGroupe=giga&id=420309 ### Share creation 1. Ask UDI - create a folder in /massstorage/EXT + corresponding group + possibility of samba mount - create a friend user and send password to PI - add friend user and bioinfo platform members to group 2. move data to /massstorage/EXT/FOLDER/GEN/ and protect them (same permissions as PLATFORMS folder) 3. Send email to PI (and lab members if needed) with instructions https://gitlab.uliege.be/giga-bioinfo/user-guides-wiki/-/wikis/faq/Connection-to-SEGI-storage 4. ask Raphaël to create empty file with f userID on cluster to prevent connection ? ### Data sharing Move data to /massstorage/EXT/FOLDER/GEN/RawData or Bioinfo and change permissions ```bash TargetFolder=/massstorage/EXT/GEN/RawData/<ProjectName> chgrp -R "giga - ptf_gen_hts" ${TargetFolder} find ${TargetFolder} -type d -exec chmod 2775 {} + find ${TargetFolder} -type f -exec chmod a+r,g+w,o-w {} + ``` # Non ULG ## Owncloud 1. move data to /massstorage/NGS/HTS/Fastq_Shared_external/<PIorGroupName> /massstorage/PTF/GEN/NGS/HTS/Fastq_SharedExternal/ 1. mount demultiplexing or analysis folder on owncloud 2. ask Ersen to force the indexation of your home on OwnCloud (for Alice and Wouter: overnight indexation => wait at least 1 night) 3. create links (for fastq, direct link to the parent folder, without any sub-folders in it!!!!) 4. send links and instructions below to use ```bash ### Procedure to download fastq using wget # (1) move to destination folder and download files adding /download?path=%2F to the link I sent # Example: wget https://cloud.med.uliege.be/index.php/s/r3gbTo4h4scKKAa/download?path=%2F # (2) Rename the downloaded file named 'download\?path\=%2F' and add zip extension mv download\?path\=%2F download.zip # (3) Unzip the archive unzip download.zip ``` TO DO: write explanation on how to mount and create links ## Diagenode ```bash # log with f038426 on nasgw1 or nasgw2, then screen -R diagenode sftp -P 3022 GIGA@94.140.182.196 # data_transfer sftp> lcd /massstorage/PTF/GEN/NGS/HTS/HTS/NovaSeqData/210721_A00801_0134_BHGMLVDSX2/Demult/Data/diagenode sftp> cd data_upload/ sftp> mkdir 210721_A00801_0134_BHGMLVDSX2 sftp> cd 210721_A00801_0134_BHGMLVDSX2 sftp> mkdir Data sftp> cd Data sftp> put -pr ./* # Use reput when transfer was interupted ? # wait sftp> lcd /massstorage/PTF/GEN/NGS/HTS/HTS/NovaSeqData/210721_A00801_0134_BHGMLVDSX2/Demult/QC/diagenode sftp> cd ../ sftp> mkdir QC sftp> cd QC sftp> put -pr ./* # wait sftp> ``` ## Mosameat ```bash # log with f038426 on nasgw1 or nasgw2, then screen -R mosameat SourceFolder=Path/To/Source/ # ex /massstorage/PTF/GEN/NGS/HTS/Fastq_SharedExternal/Mosameat/* TargetFolder=Path/To/Target/in/bucket # ex RawData/ if it doesn't exist, it will be created ~/google-cloud-sdk/bin/gsutil cp -r $SourceFolder gs://130721/$TargetFolder # -r = reccursive # if resuming a transfer that has been interrupted or adding a few files to already transfered ones, use -n (= skip files already in bucket) # NB "gsutil -m cp" allows multithreading but I don't want to use several CPUs and too much bandwidth on the mass storage! (see https://cloud.google.com/storage/docs/gsutil/addlhelp/TopLevelCommandLineOptions ) ``` ### Warnings: - Currently linked to alice.mayer.bs@gmail.com google account - To write scripts, see https://cloud.google.com/sdk/docs/scripting-gcloud - First time I used the following procedure: ```bash ~/google-cloud-sdk/bin/gcloud init # copy/paste the link into a browser and log with authorised google account # copy the code into the terminal ``` - May need to log with ```bash ~/google-cloud-sdk/bin/gcloud login # copy/paste the link into a browser and log with authorised google account # copy the code into the terminal ``` - to see the bucket on web browser: https://console.cloud.google.com/storage/browser/130721 ## Jean-Claude Twizere collaborators Last time, Jean-Claude said they will try again with owncloud+wget and come back to us if it doesn't work. I haven't had any feedback yet (August 2021). Last year they manually downloaded the fastq one by one because wget gave too many corrupted files. That was in September 2020 so it may have been before the overnight forced re-indexation of our cloud. Otherwise they proposed a sftp solution that didn't work at the time but I could try to solve it with Raphaël. Leur bioinfo demandait de décompresser avant d'envoyer (et de faire md5sum) mais ça je vais refuser. ![image](/uploads/b39df35b7f62ecdef1009e7db32ee994/image.png)

epic