... | ... | @@ -68,7 +68,7 @@ library will be moved there, so that the 2 copies will be in different locations |
|
|
**Which type of files can be sent to the offline archiving system?**
|
|
|
Tape libraries are designed for long term storage of data. They are more stable
|
|
|
and a lot cheaper than disk space. However, they are also slower in term of writing/reading
|
|
|
capacity, so retrieving archived data can take up to a week.
|
|
|
capacity, so retrieving archived data can take several days.
|
|
|
Therefore, files sent to the offline archiving system should be files that the user
|
|
|
needs to keep (for legal reasons for example, or data that have been fully analyzed but not
|
|
|
published yet, in case a reviewer is asking to redo part of the analysis with a
|
... | ... | @@ -80,11 +80,24 @@ new software or different options) but do not need to use/access quickly anymore |
|
|
please organize the main folder into subfolders containing data that are likely to be retrieved
|
|
|
together and keep a record of the tree structure, so that we don't need to retrieve
|
|
|
the whole archive if you need only some of the files.
|
|
|
- Given that the process to archive and restore data is quite laborious, the minimum
|
|
|
size you can send to archive is 500Gb. If your experiments typically generate less
|
|
|
- The minimum size you can send to archive is 500Gb. If your experiments typically generate less
|
|
|
than 500Gb of data, you can wait to have several experiments (eventually in separated subfolder)
|
|
|
before to archive them.
|
|
|
|
|
|
**Warning about hardlinks**
|
|
|
NB1: If you don't know what a hardlink is, you probably don't have any (it's actually quite rare to have some in data).
|
|
|
NB2: If you made links using `ln -s` command, you made a softlinks and not a hardlink.
|
|
|
If you have some hardlinks in your archive folder and if other occurrences of the same file are in your project folder,
|
|
|
be aware that once the file will be truncated, it will be so in all locations
|
|
|
(everywhere where you have a hardlink pointing to that file).
|
|
|
The side effect of this is that if you open the copy in your project folder,
|
|
|
the file will be restored on disk, which means that
|
|
|
1. you need to have enough space in the folder to store it (or the retrieve will fail)
|
|
|
2. opening it the first time will be very slow
|
|
|
|
|
|
Don't hesitate to ask the bioinformatic teams if you have any question or want to
|
|
|
discuss your specific utilisation of hardlinks.
|
|
|
|
|
|
**NB:** In some circumstances, some files may be sent offline even if the user didn't ask for it.
|
|
|
See the [automatic archiving](mass-storage/mass-storage-backup-archive#automatic-archiving) for more information.
|
|
|
|
... | ... | @@ -106,7 +119,11 @@ your team folder on the mass storage. This need to be done only the very first t |
|
|
- Numerous small files (typically thousands of files smaller than 4Mb) are grouped into archive files as explained [here](mass-storage/mass-storage-compression)
|
|
|
3. Create a subfolder in the "ARCHIVE" folder with a meaningful name (for example the name of the project, the date and any specific information).
|
|
|
**WARNING: don't use any space or special character in the folder names !!!!**
|
|
|
4. Move in that folder the data you want to send for archiving (organised as described above)
|
|
|
4. Move in that folder the data you want to send for archiving (organised as described above)
|
|
|
**WARNING:** if your data are on the mass storage it's very important to move them (using `mv` and not rsync or cp)!!!
|
|
|
This move should be done directly on the mass storage (see important considerations below) and not from the cluster.
|
|
|
If you want to archive data that are currently on another disk (for example gallia or CECI cluster), you need to
|
|
|
transfer them using rsync or cp/scp.
|
|
|
5. Wait until you have at least 500Gb of data to send for archiving
|
|
|
(eventually grouping separated project in separated sub-folder).
|
|
|
6. Keep a record of what you have sent for archiving, for example in a text file explaining what's in each folder.
|
... | ... | @@ -123,7 +140,13 @@ Once that's done, you won't be able to enter the archived folder and to see your |
|
|
|
|
|
|
|
|
**IMPORTANT CONSIDERATIONS**
|
|
|
- Moving data to the "ARCHIVE" folder is not enough for them to be truncated.
|
|
|
- To move your data to the "ARCHIVES" folder, it's recommended to log into the mass storage with
|
|
|
`ssh u123456@massstorage.giga.priv` (replace u123456 by your university userID).
|
|
|
Then you should run a `screen` session to prevent any interruption of the transfer
|
|
|
if you lose your connection to the mass storage.
|
|
|
If you don't know how to run a screen session or move file from a terminal or if you are not sure
|
|
|
of the method you should use, please contact the bioinformatic team.
|
|
|
- Moving data to the "ARCHIVES" folder is not enough for them to be truncated.
|
|
|
You have to ask the UDIMED/UDIGIGA to archive them.
|
|
|
Data in the "ARCHIVE" folder that haven't been truncated still takes up space on disk and are therefore
|
|
|
still taken into account for the billing of your disk usage.
|
... | ... | |