update FAQ,contact and cluster authored by Bouquieaux Marie-Catherine's avatar Bouquieaux Marie-Catherine
...@@ -74,7 +74,6 @@ needs to keep (for legal reasons for example, or data that have been fully analy ...@@ -74,7 +74,6 @@ needs to keep (for legal reasons for example, or data that have been fully analy
published yet, in case a reviewer is asking to redo part of the analysis with a published yet, in case a reviewer is asking to redo part of the analysis with a
new software or different options) but do not need to use/access quickly anymore. new software or different options) but do not need to use/access quickly anymore.
**Is there a size limit for sending data to archive?** **Is there a size limit for sending data to archive?**
- There is no upper limit to the size of an archive. But if you want to send several Terabytes of data, - There is no upper limit to the size of an archive. But if you want to send several Terabytes of data,
please organize the main folder into subfolders containing data that are likely to be retrieved please organize the main folder into subfolders containing data that are likely to be retrieved
...@@ -86,7 +85,7 @@ before to archive them. ...@@ -86,7 +85,7 @@ before to archive them.
**Warning about hardlinks** **Warning about hardlinks**
NB1: If you don't know what a hardlink is, you probably don't have any (it's actually quite rare to have some in data). NB1: If you don't know what a hardlink is, you probably don't have any (it's actually quite rare to have some in data).
NB2: If you made links using `ln -s` command, you made a softlinks and not a hardlink. NB2: If you made links using `ln -s` command, you made a softlink and not a hardlink.
If you have some hardlinks in your archive folder and if other occurrences of the same file are in your project folder, If you have some hardlinks in your archive folder and if other occurrences of the same file are in your project folder,
be aware that once the file will be truncated, it will be so in all locations be aware that once the file will be truncated, it will be so in all locations
(everywhere where you have a hardlink pointing to that file). (everywhere where you have a hardlink pointing to that file).
...@@ -95,19 +94,18 @@ the file will be restored on disk, which means that ...@@ -95,19 +94,18 @@ the file will be restored on disk, which means that
1. you need to have enough space in the folder to store it (or the retrieve will fail) 1. you need to have enough space in the folder to store it (or the retrieve will fail)
2. opening it the first time will be very slow 2. opening it the first time will be very slow
Don't hesitate to ask the bioinformatic teams if you have any question or want to Don't hesitate to ask the [Bioinformatic teams](contacts) if you have any question or want to
discuss your specific utilisation of hardlinks. discuss your specific utilisation of hardlinks.
**NB:** In some circumstances, some files may be sent offline even if the user didn't ask for it. **NB:** In some circumstances, some files may be sent offline even if the user didn't ask for it.
See the [automatic archiving](mass-storage/mass-storage-backup-archive#automatic-archiving) for more information. See the [automatic archiving](mass-storage/mass-storage-backup-archive#automatic-archiving) for more information.
## On demand archiving ## On demand archiving
### Procedure to send data for archiving ### Procedure to send data for archiving
Files/folders should be properly organized before being sent for archiving. Files/folders should be properly organized before being sent for archiving.
Don't hesitate to contact the [Bioinformatic team](mass-storage/mass-storage-contacts) Don't hesitate to contact the [Bioinformatics team](contacts)
if you need help for any of these steps. if you need help for any of these steps.
The procedure to send files for archiving is: The procedure to send files for archiving is:
...@@ -116,8 +114,8 @@ your team folder on the mass storage. This need to be done only the very first t ...@@ -116,8 +114,8 @@ your team folder on the mass storage. This need to be done only the very first t
2. Determine which files/folders you want to send for archiving and organize them so that 2. Determine which files/folders you want to send for archiving and organize them so that
- All related files that are likely to be retrieved together (if a retrieval is ever needed) are in the same (sub)folder with a meaningful name - All related files that are likely to be retrieved together (if a retrieval is ever needed) are in the same (sub)folder with a meaningful name
- Big files (typically 200Mb and more) are compressed as much as possible as explained [here](mass-storage/mass-storage-compression) - Big files (typically 200Mb and more) are compressed as much as possible as explained [here](mass-storage/mass-storage-compression)
- Numerous small files (typically thousands of files smaller than 4Mb) are grouped into archive files as explained [here](mass-storage/mass-storage-compression) - Numerous small files (typically several thousands of files smaller than 4Mb) are grouped into archive files as explained [here](mass-storage/mass-storage-compression)
3. Create a subfolder in the "ARCHIVE" folder with a meaningful name (for example the name of the project, the date and any specific information). 3. Create a subfolder in the "ARCHIVES" folder with a meaningful name (for example the name of the project, the date and any specific information).
**WARNING: don't use any space or special character in the folder names !!!!** **WARNING: don't use any space or special character in the folder names !!!!**
4. Move in that folder the data you want to send for archiving (organised as described above) 4. Move in that folder the data you want to send for archiving (organised as described above)
**WARNING:** if your data are on the mass storage it's very important to move them (using `mv` and not rsync or cp)!!! **WARNING:** if your data are on the mass storage it's very important to move them (using `mv` and not rsync or cp)!!!
...@@ -132,26 +130,25 @@ Of note, we strongly recommend to have both a tree and file with a description o ...@@ -132,26 +130,25 @@ Of note, we strongly recommend to have both a tree and file with a description o
as a list of file names might not be enough for you to exactly know what is in each folder. as a list of file names might not be enough for you to exactly know what is in each folder.
7. Contact the UDIMED/UDIGIGA by filling a form at https://sam.med.uliege.be/ 7. Contact the UDIMED/UDIGIGA by filling a form at https://sam.med.uliege.be/
The form must contain the following pieces of information: The form must contain the following pieces of information:
- path to the data to archive (or the name of the team and the name of the folder to archive) - path to the folder to archive (or at least the name of the team and the name of the folder to archive)
- the number of years that the data must be kept (default is 5 years) - the number of years that the data must be kept (default is 5 years)
- name of the PI in charge - name of the PI in charge
8. The UDIMED/UDIGIGA will send your data to the archiving system. 8. The UDIMED/UDIGIGA IT specialist will send your data to the archiving system.
Once that's done, you won't be able to enter the archived folder and to see your data anymore. Once that's done, you won't be able to enter the archived folder and to see your data anymore.
**IMPORTANT CONSIDERATIONS** **IMPORTANT CONSIDERATIONS**
- To move your data to the "ARCHIVES" folder, it's recommended to log into the mass storage with - To move your data to the "ARCHIVES" folder, it's recommended to log into the mass storage with
`ssh u123456@massstorage.giga.priv` (replace u123456 by your university userID). `ssh u123456@massstorage.giga.priv` (replace u123456 by your university userID).
Then you should run a `screen` session to prevent any interruption of the transfer Then you should run a `screen` session to prevent any interruption of the transfer
if you lose your connection to the mass storage. if you lose your connection to the mass storage.
If you don't know how to run a screen session or move file from a terminal or if you are not sure If you don't know how to run a screen session or move file from a terminal or if you are not sure
of the method you should use, please contact the bioinformatic team. of the method you should use, please contact the [Bioinformatic team](contacts).
- Moving data to the "ARCHIVES" folder is not enough for them to be truncated. - Moving data to the "ARCHIVES" folder is not enough for them to be truncated.
You have to ask the UDIMED/UDIGIGA to archive them. You have to ask the UDIMED/UDIGIGA IT specialist to archive them.
Data in the "ARCHIVE" folder that haven't been truncated still takes up space on disk and are therefore Data in the "ARCHIVES" folder that haven't been truncated still takes up space on disk and are therefore
still taken into account for the billing of your disk usage. still taken into account for the billing of your disk usage.
- Each truncated file is occupying 4kb of space on disk. So, if you archive 1 million of individual files, - Each truncated file is occupying 4kb of space on disk. So, if you archive 1 million of individual files,
the remaining volume on disk will be 4Gb (ish). However, if you have hundreds of millions of small files, the remaining volume on disk will be 4Gb (ish). However, if you have hundreds of thousands of small files,
you'll save more space by grouping them into one file as explained [here](mass-storage/mass-storage-compression) you'll save more space by grouping them into one file as explained [here](mass-storage/mass-storage-compression)
### Procedure to retrieve data from archiving ### Procedure to retrieve data from archiving
...@@ -160,17 +157,14 @@ The procedure to retrieve archived files/folders is: ...@@ -160,17 +157,14 @@ The procedure to retrieve archived files/folders is:
1. Contact the UDIMED/UDIGIGA by filling a form at https://sam.med.uliege.be/ 1. Contact the UDIMED/UDIGIGA by filling a form at https://sam.med.uliege.be/
The form must contain the following pieces of information: The form must contain the following pieces of information:
- the path to folder you want to retrieve (or the name of the team and the name of the folder) - the path to folder you want to retrieve (or at least the name of the team and the name of the folder)
- if you don't want to retrieve the whole folder but only some part of it, don't forget to mention which subfolder - if you don't want to retrieve the whole folder but only some part of it, don't forget to mention which subfolder
- the name of the PI in charge - the name of the PI in charge
2. The UDIMED/UDIGIGA will check there is enough space left on disk to store your data. 2. The UDIMED/UDIGIGA will check there is enough space left on disk to store your data.
If this is the case, they will retrieve your data and give you access to them. If this is the case, they will retrieve your data and give you access to them.
**Note**: This operation may take several days. This means that you need to **anticipate** the need of those data. **Note**: This operation may take several days. This means that you need to **anticipate** the need of those data.
## Automatic archiving ## Automatic archiving
Once the storage will reach 80% of its maximum storage capacity, oldest data will Once the storage will reach 80% of its maximum storage capacity, oldest data will
...@@ -188,7 +182,4 @@ slow process, as these data will first need to migrate from tape to disk before ...@@ -188,7 +182,4 @@ slow process, as these data will first need to migrate from tape to disk before
to be accessible again. The time required will depend of the ongoing backing up to be accessible again. The time required will depend of the ongoing backing up
of other files. In optimal conditions, it shouldn't take more than 1h to access a 1 TB file. of other files. In optimal conditions, it shouldn't take more than 1h to access a 1 TB file.
# [Contacts](contacts)
\ No newline at end of file
# [Contacts](mass-storage/mass-storage-contacts)
\ No newline at end of file