... | ... | @@ -74,7 +74,6 @@ needs to keep (for legal reasons for example, or data that have been fully analy |
|
|
published yet, in case a reviewer is asking to redo part of the analysis with a
|
|
|
new software or different options) but do not need to use/access quickly anymore.
|
|
|
|
|
|
|
|
|
**Is there a size limit for sending data to archive?**
|
|
|
- There is no upper limit to the size of an archive. But if you want to send several Terabytes of data,
|
|
|
please organize the main folder into subfolders containing data that are likely to be retrieved
|
... | ... | @@ -86,7 +85,7 @@ before to archive them. |
|
|
|
|
|
**Warning about hardlinks**
|
|
|
NB1: If you don't know what a hardlink is, you probably don't have any (it's actually quite rare to have some in data).
|
|
|
NB2: If you made links using `ln -s` command, you made a softlinks and not a hardlink.
|
|
|
NB2: If you made links using `ln -s` command, you made a softlink and not a hardlink.
|
|
|
If you have some hardlinks in your archive folder and if other occurrences of the same file are in your project folder,
|
|
|
be aware that once the file will be truncated, it will be so in all locations
|
|
|
(everywhere where you have a hardlink pointing to that file).
|
... | ... | @@ -95,19 +94,18 @@ the file will be restored on disk, which means that |
|
|
1. you need to have enough space in the folder to store it (or the retrieve will fail)
|
|
|
2. opening it the first time will be very slow
|
|
|
|
|
|
Don't hesitate to ask the bioinformatic teams if you have any question or want to
|
|
|
Don't hesitate to ask the [Bioinformatic teams](contacts) if you have any question or want to
|
|
|
discuss your specific utilisation of hardlinks.
|
|
|
|
|
|
**NB:** In some circumstances, some files may be sent offline even if the user didn't ask for it.
|
|
|
See the [automatic archiving](mass-storage/mass-storage-backup-archive#automatic-archiving) for more information.
|
|
|
|
|
|
|
|
|
## On demand archiving
|
|
|
|
|
|
### Procedure to send data for archiving
|
|
|
|
|
|
Files/folders should be properly organized before being sent for archiving.
|
|
|
Don't hesitate to contact the [Bioinformatic team](mass-storage/mass-storage-contacts)
|
|
|
Don't hesitate to contact the [Bioinformatics team](contacts)
|
|
|
if you need help for any of these steps.
|
|
|
|
|
|
The procedure to send files for archiving is:
|
... | ... | @@ -116,8 +114,8 @@ your team folder on the mass storage. This need to be done only the very first t |
|
|
2. Determine which files/folders you want to send for archiving and organize them so that
|
|
|
- All related files that are likely to be retrieved together (if a retrieval is ever needed) are in the same (sub)folder with a meaningful name
|
|
|
- Big files (typically 200Mb and more) are compressed as much as possible as explained [here](mass-storage/mass-storage-compression)
|
|
|
- Numerous small files (typically thousands of files smaller than 4Mb) are grouped into archive files as explained [here](mass-storage/mass-storage-compression)
|
|
|
3. Create a subfolder in the "ARCHIVE" folder with a meaningful name (for example the name of the project, the date and any specific information).
|
|
|
- Numerous small files (typically several thousands of files smaller than 4Mb) are grouped into archive files as explained [here](mass-storage/mass-storage-compression)
|
|
|
3. Create a subfolder in the "ARCHIVES" folder with a meaningful name (for example the name of the project, the date and any specific information).
|
|
|
**WARNING: don't use any space or special character in the folder names !!!!**
|
|
|
4. Move in that folder the data you want to send for archiving (organised as described above)
|
|
|
**WARNING:** if your data are on the mass storage it's very important to move them (using `mv` and not rsync or cp)!!!
|
... | ... | @@ -132,26 +130,25 @@ Of note, we strongly recommend to have both a tree and file with a description o |
|
|
as a list of file names might not be enough for you to exactly know what is in each folder.
|
|
|
7. Contact the UDIMED/UDIGIGA by filling a form at https://sam.med.uliege.be/
|
|
|
The form must contain the following pieces of information:
|
|
|
- path to the data to archive (or the name of the team and the name of the folder to archive)
|
|
|
- path to the folder to archive (or at least the name of the team and the name of the folder to archive)
|
|
|
- the number of years that the data must be kept (default is 5 years)
|
|
|
- name of the PI in charge
|
|
|
8. The UDIMED/UDIGIGA will send your data to the archiving system.
|
|
|
8. The UDIMED/UDIGIGA IT specialist will send your data to the archiving system.
|
|
|
Once that's done, you won't be able to enter the archived folder and to see your data anymore.
|
|
|
|
|
|
|
|
|
**IMPORTANT CONSIDERATIONS**
|
|
|
- To move your data to the "ARCHIVES" folder, it's recommended to log into the mass storage with
|
|
|
`ssh u123456@massstorage.giga.priv` (replace u123456 by your university userID).
|
|
|
Then you should run a `screen` session to prevent any interruption of the transfer
|
|
|
if you lose your connection to the mass storage.
|
|
|
If you don't know how to run a screen session or move file from a terminal or if you are not sure
|
|
|
of the method you should use, please contact the bioinformatic team.
|
|
|
of the method you should use, please contact the [Bioinformatic team](contacts).
|
|
|
- Moving data to the "ARCHIVES" folder is not enough for them to be truncated.
|
|
|
You have to ask the UDIMED/UDIGIGA to archive them.
|
|
|
Data in the "ARCHIVE" folder that haven't been truncated still takes up space on disk and are therefore
|
|
|
You have to ask the UDIMED/UDIGIGA IT specialist to archive them.
|
|
|
Data in the "ARCHIVES" folder that haven't been truncated still takes up space on disk and are therefore
|
|
|
still taken into account for the billing of your disk usage.
|
|
|
- Each truncated file is occupying 4kb of space on disk. So, if you archive 1 million of individual files,
|
|
|
the remaining volume on disk will be 4Gb (ish). However, if you have hundreds of millions of small files,
|
|
|
the remaining volume on disk will be 4Gb (ish). However, if you have hundreds of thousands of small files,
|
|
|
you'll save more space by grouping them into one file as explained [here](mass-storage/mass-storage-compression)
|
|
|
|
|
|
### Procedure to retrieve data from archiving
|
... | ... | @@ -160,17 +157,14 @@ The procedure to retrieve archived files/folders is: |
|
|
|
|
|
1. Contact the UDIMED/UDIGIGA by filling a form at https://sam.med.uliege.be/
|
|
|
The form must contain the following pieces of information:
|
|
|
- the path to folder you want to retrieve (or the name of the team and the name of the folder)
|
|
|
- the path to folder you want to retrieve (or at least the name of the team and the name of the folder)
|
|
|
- if you don't want to retrieve the whole folder but only some part of it, don't forget to mention which subfolder
|
|
|
- the name of the PI in charge
|
|
|
2. The UDIMED/UDIGIGA will check there is enough space left on disk to store your data.
|
|
|
If this is the case, they will retrieve your data and give you access to them.
|
|
|
|
|
|
|
|
|
**Note**: This operation may take several days. This means that you need to **anticipate** the need of those data.
|
|
|
|
|
|
|
|
|
|
|
|
## Automatic archiving
|
|
|
|
|
|
Once the storage will reach 80% of its maximum storage capacity, oldest data will
|
... | ... | @@ -188,7 +182,4 @@ slow process, as these data will first need to migrate from tape to disk before |
|
|
to be accessible again. The time required will depend of the ongoing backing up
|
|
|
of other files. In optimal conditions, it shouldn't take more than 1h to access a 1 TB file.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
# [Contacts](mass-storage/mass-storage-contacts) |
|
|
\ No newline at end of file |
|
|
# [Contacts](contacts) |
|
|
\ No newline at end of file |