Changes
Page history
modification ms-backup-archive
authored
Dec 09, 2020
by
Bouquieaux Marie-Catherine
Show whitespace changes
Inline
Side-by-side
mass-storage/mass-storage-backup-archive.md
View page @
54c62ed2
...
@@ -68,7 +68,7 @@ library will be moved there, so that the 2 copies will be in different locations
...
@@ -68,7 +68,7 @@ library will be moved there, so that the 2 copies will be in different locations
**Which type of files can be sent to the offline archiving system?**
**Which type of files can be sent to the offline archiving system?**
Tape libraries are designed for long term storage of data. They are more stable
Tape libraries are designed for long term storage of data. They are more stable
and a lot cheaper than disk space. However, they are also slower in term of writing/reading
and a lot cheaper than disk space. However, they are also slower in term of writing/reading
capacity, so retrieving archived data can take
up to a week
.
capacity, so retrieving archived data can take
several days
.
Therefore, files sent to the offline archiving system should be files that the user
Therefore, files sent to the offline archiving system should be files that the user
needs to keep (for legal reasons for example, or data that have been fully analyzed but not
needs to keep (for legal reasons for example, or data that have been fully analyzed but not
published yet, in case a reviewer is asking to redo part of the analysis with a
published yet, in case a reviewer is asking to redo part of the analysis with a
...
@@ -80,11 +80,24 @@ new software or different options) but do not need to use/access quickly anymore
...
@@ -80,11 +80,24 @@ new software or different options) but do not need to use/access quickly anymore
please organize the main folder into subfolders containing data that are likely to be retrieved
please organize the main folder into subfolders containing data that are likely to be retrieved
together and keep a record of the tree structure, so that we don't need to retrieve
together and keep a record of the tree structure, so that we don't need to retrieve
the whole archive if you need only some of the files.
the whole archive if you need only some of the files.
-
Given that the process to archive and restore data is quite laborious, the minimum
-
The minimum size you can send to archive is 500Gb. If your experiments typically generate less
size you can send to archive is 500Gb. If your experiments typically generate less
than 500Gb of data, you can wait to have several experiments (eventually in separated subfolder)
than 500Gb of data, you can wait to have several experiments (eventually in separated subfolder)
before to archive them.
before to archive them.
**Warning about hardlinks**
NB1: If you don't know what a hardlink is, you probably don't have any (it's actually quite rare to have some in data).
NB2: If you made links using
`ln -s`
command, you made a softlinks and not a hardlink.
If you have some hardlinks in your archive folder and if other occurrences of the same file are in your project folder,
be aware that once the file will be truncated, it will be so in all locations
(everywhere where you have a hardlink pointing to that file).
The side effect of this is that if you open the copy in your project folder,
the file will be restored on disk, which means that
1.
you need to have enough space in the folder to store it (or the retrieve will fail)
2.
opening it the first time will be very slow
Don't hesitate to ask the bioinformatic teams if you have any question or want to
discuss your specific utilisation of hardlinks.
**NB:**
In some circumstances, some files may be sent offline even if the user didn't ask for it.
**NB:**
In some circumstances, some files may be sent offline even if the user didn't ask for it.
See the
[
automatic archiving
](
mass-storage/mass-storage-backup-archive#automatic-archiving
)
for more information.
See the
[
automatic archiving
](
mass-storage/mass-storage-backup-archive#automatic-archiving
)
for more information.
...
@@ -107,6 +120,10 @@ your team folder on the mass storage. This need to be done only the very first t
...
@@ -107,6 +120,10 @@ your team folder on the mass storage. This need to be done only the very first t
3.
Create a subfolder in the "ARCHIVE" folder with a meaningful name (for example the name of the project, the date and any specific information).
3.
Create a subfolder in the "ARCHIVE" folder with a meaningful name (for example the name of the project, the date and any specific information).
**WARNING: don't use any space or special character in the folder names !!!!**
**WARNING: don't use any space or special character in the folder names !!!!**
4.
Move in that folder the data you want to send for archiving (organised as described above)
4.
Move in that folder the data you want to send for archiving (organised as described above)
**WARNING:**
if your data are on the mass storage it's very important to move them (using
`mv`
and not rsync or cp)!!!
This move should be done directly on the mass storage (see important considerations below) and not from the cluster.
If you want to archive data that are currently on another disk (for example gallia or CECI cluster), you need to
transfer them using rsync or cp/scp.
5.
Wait until you have at least 500Gb of data to send for archiving
5.
Wait until you have at least 500Gb of data to send for archiving
(eventually grouping separated project in separated sub-folder).
(eventually grouping separated project in separated sub-folder).
6.
Keep a record of what you have sent for archiving, for example in a text file explaining what's in each folder.
6.
Keep a record of what you have sent for archiving, for example in a text file explaining what's in each folder.
...
@@ -123,7 +140,13 @@ Once that's done, you won't be able to enter the archived folder and to see your
...
@@ -123,7 +140,13 @@ Once that's done, you won't be able to enter the archived folder and to see your
**IMPORTANT CONSIDERATIONS**
**IMPORTANT CONSIDERATIONS**
-
Moving data to the "ARCHIVE" folder is not enough for them to be truncated.
-
To move your data to the "ARCHIVES" folder, it's recommended to log into the mass storage with
`ssh u123456@massstorage.giga.priv`
(replace u123456 by your university userID).
Then you should run a
`screen`
session to prevent any interruption of the transfer
if you lose your connection to the mass storage.
If you don't know how to run a screen session or move file from a terminal or if you are not sure
of the method you should use, please contact the bioinformatic team.
-
Moving data to the "ARCHIVES" folder is not enough for them to be truncated.
You have to ask the UDIMED/UDIGIGA to archive them.
You have to ask the UDIMED/UDIGIGA to archive them.
Data in the "ARCHIVE" folder that haven't been truncated still takes up space on disk and are therefore
Data in the "ARCHIVE" folder that haven't been truncated still takes up space on disk and are therefore
still taken into account for the billing of your disk usage.
still taken into account for the billing of your disk usage.
...
...
...
...