... | ... | @@ -4,23 +4,23 @@ |
|
|
|
|
|
# Backup procedure
|
|
|
|
|
|
Personal spaces (home) and team's folders are backed on a regular basis on a tape
|
|
|
library situated in a different location (mass storage disks are located in the
|
|
|
GIGA, B35, while the tape library is situated at the SEGI, B26, and in February 2021,
|
|
|
a second tape library will be added at the CHU datacenter, B35).
|
|
|
|
|
|
It’s an automatized procedure in which any file that has been modified then left
|
|
|
unchanged for at least 2h will enter a “backing-up queue” and will be backed up
|
|
|
as soon as technically possible. In most cases, this means that newly modified
|
|
|
files will be backed up after 2h of inactivity, but the delay could be longer if
|
|
|
large files are currently being backed up, if a large number of files have been
|
|
|
modified recently or if the system is momentarily down for maintenance. In all
|
|
|
Personal spaces (home) and team's folders are backed on a regular basis on a tape
|
|
|
library situated in a different location (mass storage disks are located in the
|
|
|
GIGA, B35, while the tape library is situated at the SEGI, B26, and in February 2021,
|
|
|
a second tape library will be added at the CHU datacenter, B35).
|
|
|
|
|
|
It’s an automatized procedure in which any file that has been modified then left
|
|
|
unchanged for at least 2h will enter a “backing-up queue” and will be backed up
|
|
|
as soon as technically possible. In most cases, this means that newly modified
|
|
|
files will be backed up after 2h of inactivity, but the delay could be longer if
|
|
|
large files are currently being backed up, if a large number of files have been
|
|
|
modified recently or if the system is momentarily down for maintenance. In all
|
|
|
these exceptional cases, it could take several hours before a file is actually backed up.
|
|
|
|
|
|
The system will keep a maximum of 25 versions of each file for a maximum of 28 days.
|
|
|
These will be the 25 last versions, so if a file is backed up 12 times a day,
|
|
|
the oldest recoverable version will be only 2 days old. Previous versions of a
|
|
|
file that have been saved more than 28 days ago will be deleted from the system.
|
|
|
The system will keep a maximum of 25 versions of each file for a maximum of 28 days.
|
|
|
These will be the 25 last versions, so if a file is backed up 12 times a day,
|
|
|
the oldest recoverable version will be only 2 days old. Previous versions of a
|
|
|
file that have been saved more than 28 days ago will be deleted from the system.
|
|
|
|
|
|
In other words, it is possible to recover any previous version of a file if
|
|
|
- that version has been backed up (i.e. stayed inactive for at least 2h after having been modified and saved)
|
... | ... | @@ -31,155 +31,154 @@ If a file is deleted from the disk, the last backed-up version will be kept on t |
|
|
|
|
|
Users have to ask the UDI GIGA-MED IT specialist to recover their last backed-up file, or, when possible, one of the previous versions.
|
|
|
|
|
|
**IMPORTANT**
|
|
|
As the tape system can only handle a limited amount of data per hour, any action
|
|
|
impacting 1 Tb of data or more, for example copying a large dataset or downloading
|
|
|
large files, must be reported **before** being performed to the UDI GIGA-MED IT
|
|
|
department so that these large changes do not affect the proper functioning of
|
|
|
the operations. This rule also applied when large number of small files are created
|
|
|
**IMPORTANT**
|
|
|
As the tape system can only handle a limited amount of data per hour, any action
|
|
|
impacting 1 Tb of data or more, for example copying a large dataset or downloading
|
|
|
large files, must be reported **before** being performed to the UDI GIGA-MED IT
|
|
|
department so that these large changes do not affect the proper functioning of
|
|
|
the operations. This rule also applied when large number of small files are created
|
|
|
or modified (e.g. several thousands of files of less than 1 Mb).
|
|
|
|
|
|
### nobackup folders
|
|
|
|
|
|
For files that do not require to be backed up, such as temporary files, a specific
|
|
|
folder can be created. This folder must be called "nobackup". It must be written
|
|
|
exactly like this, without space and in lowercase. Otherwise it will still be backed up.
|
|
|
For files that do not require to be backed up, such as temporary files, a specific
|
|
|
folder can be created. This folder must be called "nobackup". It must be written
|
|
|
exactly like this, without space and in lowercase. Otherwise it will still be backed up.
|
|
|
|
|
|
# Offline data archiving
|
|
|
|
|
|
**Why is it useful to send data to the offline archiving system?**
|
|
|
Since the disk space on the mass storage is finite and expensive and the
|
|
|
**Why is it useful to send data to the offline archiving system?**
|
|
|
Since the disk space on the mass storage is finite and expensive and the
|
|
|
amount of data we are producing grows exponentially, users are encouraged to send files
|
|
|
they need to keep but don't need to access anymore to our offline archiving
|
|
|
system in order to release some space on disk.
|
|
|
they need to keep but don't need to access anymore to our offline archiving
|
|
|
system in order to release some space on disk.
|
|
|
|
|
|
**How does it work?**
|
|
|
**How does it work?**
|
|
|
As explained in the backup section of this page, each file present on disk is
|
|
|
also copied on 2 tapes libraries. Once a file is sent to the offline archiving system,
|
|
|
the copy on disk is truncated, but there is still 2 copies on the 2 tape libraries.
|
|
|
also copied on 2 tapes libraries. Once a file is sent to the offline archiving system,
|
|
|
the copy on disk is truncated, but there is still 2 copies on the 2 tape libraries.
|
|
|
Archived files are not directly accessible to users, but can be restored on disk if needed.
|
|
|
Of note, the restoration of an archive is obviously possible only if there is enough space on disk to store it.
|
|
|
|
|
|
**Where will the archived data be stored?**
|
|
|
Currently, both tape libraries are located in the same robot at the SEGI. However,
|
|
|
in February 2021, we'll add a new robot at the CHU datacenter, and one of the tape
|
|
|
library will be moved there, so that the 2 copies will be in different locations.
|
|
|
|
|
|
**Which type of files can be sent to the offline archiving system?**
|
|
|
Tape libraries are designed for long term storage of data. They are more stable
|
|
|
and a lot cheaper than disk space. However, they are also slower in term of writing/reading
|
|
|
capacity, so retrieving archived data can take several days.
|
|
|
Therefore, files sent to the offline archiving system should be files that the user
|
|
|
needs to keep (for legal reasons for example, or data that have been fully analyzed but not
|
|
|
published yet, in case a reviewer is asking to redo part of the analysis with a
|
|
|
**Where will the archived data be stored?**
|
|
|
Currently, both tape libraries are located in the same robot at the SEGI. However,
|
|
|
in February 2021, we'll add a new robot at the CHU datacenter, and one of the tape
|
|
|
library will be moved there, so that the 2 copies will be in different locations.
|
|
|
|
|
|
**Which type of files can be sent to the offline archiving system?**
|
|
|
Tape libraries are designed for long term storage of data. They are more stable
|
|
|
and a lot cheaper than disk space. However, they are also slower in term of writing/reading
|
|
|
capacity, so retrieving archived data can take several days.
|
|
|
Therefore, files sent to the offline archiving system should be files that the user
|
|
|
needs to keep (for legal reasons for example, or data that have been fully analyzed but not
|
|
|
published yet, in case a reviewer is asking to redo part of the analysis with a
|
|
|
new software or different options) but do not need to use/access quickly anymore.
|
|
|
|
|
|
**Is there a size limit for sending data to archive?**
|
|
|
- There is no upper limit to the size of an archive. But if you want to send several Terabytes of data,
|
|
|
please organize the main folder into subfolders containing data that are likely to be retrieved
|
|
|
please organize the main folder into subfolders containing data that are likely to be retrieved
|
|
|
together and keep a record of the tree structure, so that we don't need to retrieve
|
|
|
the whole archive if you need only some of the files.
|
|
|
- The minimum size you can send to archive is 500Gb. If your experiments typically generate less
|
|
|
than 500Gb of data, you can wait to have several experiments (eventually in separated subfolder)
|
|
|
before to archive them.
|
|
|
the whole archive if you need only some of the files.
|
|
|
- The minimum size you can send to archive is 500Gb. If your experiments typically generate less
|
|
|
than 500Gb of data, you can wait to have several experiments (eventually in separated subfolder)
|
|
|
before to archive them.
|
|
|
|
|
|
**Warning about hardlinks**
|
|
|
**Warning about hardlinks**
|
|
|
NB1: If you don't know what a hardlink is, you probably don't have any (it's actually quite rare to have some in data).
|
|
|
NB2: If you made links using `ln -s` command, you made a softlink and not a hardlink.
|
|
|
If you have some hardlinks in your archive folder and if other occurrences of the same file are in your project folder,
|
|
|
be aware that once the file will be truncated, it will be so in all locations
|
|
|
(everywhere where you have a hardlink pointing to that file).
|
|
|
The side effect of this is that if you open the copy in your project folder,
|
|
|
the file will be restored on disk, which means that
|
|
|
If you have some hardlinks in your archive folder and if other occurrences of the same file are in your project folder,
|
|
|
be aware that once the file will be truncated, it will be so in all locations
|
|
|
(everywhere where you have a hardlink pointing to that file).
|
|
|
The side effect of this is that if you open the copy in your project folder,
|
|
|
the file will be restored on disk, which means that
|
|
|
1. you need to have enough space in the folder to store it (or the retrieve will fail)
|
|
|
2. opening it the first time will be very slow
|
|
|
|
|
|
Don't hesitate to ask the [Bioinformatic teams](contacts) if you have any question or want to
|
|
|
Don't hesitate to ask the [Bioinformatic teams](contacts) if you have any question or want to
|
|
|
discuss your specific utilisation of hardlinks.
|
|
|
|
|
|
**NB:** In some circumstances, some files may be sent offline even if the user didn't ask for it.
|
|
|
**NB:** In some circumstances, some files may be sent offline even if the user didn't ask for it.
|
|
|
See the [automatic archiving](mass-storage/mass-storage-backup-archive#automatic-archiving) for more information.
|
|
|
|
|
|
## On demand archiving
|
|
|
|
|
|
### Procedure to send data for archiving
|
|
|
|
|
|
Files/folders should be properly organized before being sent for archiving.
|
|
|
Don't hesitate to contact the [Bioinformatics team](contacts)
|
|
|
if you need help for any of these steps.
|
|
|
Files/folders should be properly organized before being sent for archiving.
|
|
|
Don't hesitate to contact the [Bioinformatics team](contacts)
|
|
|
if you need help for any of these steps.
|
|
|
|
|
|
The procedure to send files for archiving is:
|
|
|
1. If not already done, ask the UDIMED/UDIGIGA (https://sam.med.uliege.be/) to create an "ARCHIVES" folder in
|
|
|
1. If not already done, ask the UDIMED/UDIGIGA (https://sam.med.uliege.be/) to create an "ARCHIVES" folder in
|
|
|
your team folder on the mass storage. This need to be done only the very first time.
|
|
|
2. Determine which files/folders you want to send for archiving and organize them so that
|
|
|
- All related files that are likely to be retrieved together (if a retrieval is ever needed) are in the same (sub)folder with a meaningful name
|
|
|
- Big files (typically 200Mb and more) are compressed as much as possible as explained [here](mass-storage/mass-storage-compression)
|
|
|
- Numerous small files (typically several thousands of files smaller than 4Mb) are grouped into archive files as explained [here](mass-storage/mass-storage-compression)
|
|
|
3. Create a subfolder in the "ARCHIVES" folder with a meaningful name (for example the name of the project, the date and any specific information).
|
|
|
3. Create a subfolder in the "ARCHIVES" folder with a meaningful name (for example the name of the project, the date and any specific information).
|
|
|
**WARNING: don't use any space or special character in the folder names !!!!**
|
|
|
4. Move in that folder the data you want to send for archiving (organised as described above)
|
|
|
**WARNING:** if your data are on the mass storage it's very important to move them (using `mv` and not rsync or cp)!!!
|
|
|
This move should be done directly on the mass storage (see important considerations below) and not from the cluster.
|
|
|
If you want to archive data that are currently on another disk (for example gallia or CECI cluster), you need to
|
|
|
4. Move in that folder the data you want to send for archiving (organised as described above)
|
|
|
**WARNING:** if your data are on the mass storage it's very important to move them (using `mv` and not rsync or cp)!!!
|
|
|
This move should be done directly on the mass storage (see important considerations below) and not from the cluster.
|
|
|
If you want to archive data that are currently on another disk (for example gallia or CECI cluster), you need to
|
|
|
transfer them using rsync or cp/scp.
|
|
|
5. Wait until you have at least 500Gb of data to send for archiving
|
|
|
(eventually grouping separated project in separated sub-folder).
|
|
|
6. Keep a record of what you have sent for archiving, for example in a text file explaining what's in each folder.
|
|
|
5. Wait until you have at least 500Gb of data to send for archiving
|
|
|
(eventually grouping separated project in separated sub-folder).
|
|
|
6. Keep a record of what you have sent for archiving, for example in a text file explaining what's in each folder.
|
|
|
You can also make a list of the files using the linux `tree` command (with the help of the bioinformatic platforms if needed).
|
|
|
Of note, we strongly recommend to have both a tree and file with a description of what's in the archive,
|
|
|
Of note, we strongly recommend to have both a tree and file with a description of what's in the archive,
|
|
|
as a list of file names might not be enough for you to exactly know what is in each folder.
|
|
|
7. Contact the UDIMED/UDIGIGA by filling a form at https://sam.med.uliege.be/
|
|
|
7. Contact the UDIMED/UDIGIGA by filling a form at https://sam.med.uliege.be/
|
|
|
The form must contain the following pieces of information:
|
|
|
- path to the folder to archive (or at least the name of the team and the name of the folder to archive)
|
|
|
- the number of years that the data must be kept (default is 5 years)
|
|
|
- name of the PI in charge
|
|
|
8. The UDIMED/UDIGIGA IT specialist will send your data to the archiving system.
|
|
|
8. The UDIMED/UDIGIGA IT specialist will send your data to the archiving system.
|
|
|
Once that's done, you won't be able to enter the archived folder and to see your data anymore.
|
|
|
|
|
|
**IMPORTANT CONSIDERATIONS**
|
|
|
- To move your data to the "ARCHIVES" folder, it's recommended to log into the mass storage with
|
|
|
`ssh u123456@massstorage.giga.priv` (replace u123456 by your university userID).
|
|
|
Then you should run a `screen` session to prevent any interruption of the transfer
|
|
|
- To move your data to the "ARCHIVES" folder, it's recommended to log into the mass storage with
|
|
|
`ssh u123456@massstorage.giga.priv` (replace u123456 by your university userID).
|
|
|
Then you should run a `screen` session to prevent any interruption of the transfer
|
|
|
if you lose your connection to the mass storage.
|
|
|
If you don't know how to run a screen session or move file from a terminal or if you are not sure
|
|
|
If you don't know how to run a screen session or move file from a terminal or if you are not sure
|
|
|
of the method you should use, please contact the [Bioinformatic team](contacts).
|
|
|
- Moving data to the "ARCHIVES" folder is not enough for them to be truncated.
|
|
|
You have to ask the UDIMED/UDIGIGA IT specialist to archive them.
|
|
|
Data in the "ARCHIVES" folder that haven't been truncated still takes up space on disk and are therefore
|
|
|
still taken into account for the billing of your disk usage.
|
|
|
- Each truncated file is occupying 4kb of space on disk. So, if you archive 1 million of individual files,
|
|
|
the remaining volume on disk will be 4Gb (ish). However, if you have hundreds of thousands of small files,
|
|
|
- **Moving data to the "ARCHIVES" folder is not enough for them to be truncated.**
|
|
|
You have to ask the UDIMED/UDIGIGA IT specialist to archive them.
|
|
|
Data in the "ARCHIVES" folder that haven't been truncated still takes up space on disk!!!
|
|
|
- Each truncated file is occupying 4kb of space on disk. So, if you archive 1 million of individual files,
|
|
|
the remaining volume on disk will be 4Gb (ish). However, if you have hundreds of thousands of small files,
|
|
|
you'll save more space by grouping them into one file as explained [here](mass-storage/mass-storage-compression)
|
|
|
|
|
|
### Procedure to retrieve data from archiving
|
|
|
|
|
|
The procedure to retrieve archived files/folders is:
|
|
|
|
|
|
1. Contact the UDIMED/UDIGIGA by filling a form at https://sam.med.uliege.be/
|
|
|
1. Contact the UDIMED/UDIGIGA by filling a form at https://sam.med.uliege.be/
|
|
|
The form must contain the following pieces of information:
|
|
|
- the path to folder you want to retrieve (or at least the name of the team and the name of the folder)
|
|
|
- if you don't want to retrieve the whole folder but only some part of it, don't forget to mention which subfolder
|
|
|
- the name of the PI in charge
|
|
|
2. The UDIMED/UDIGIGA will check there is enough space left on disk to store your data.
|
|
|
If this is the case, they will retrieve your data and give you access to them.
|
|
|
2. The UDIMED/UDIGIGA will check there is enough space left on disk to store your data.
|
|
|
If this is the case, they will retrieve your data and give you access to them.
|
|
|
|
|
|
**Note**: This operation may take several days. This means that you need to **anticipate** the need of those data.
|
|
|
**Note**: This operation may take several days. This means that you need to **anticipate** the need of those data.
|
|
|
|
|
|
## Automatic archiving
|
|
|
|
|
|
Once the storage will reach 80% of its maximum storage capacity, oldest data will
|
|
|
automatically be truncated from the disk in order to save space. Once that happens,
|
|
|
there will still be 2 copies of the file on tape but only the beginning of the file
|
|
|
and its metadata will stay on disk. Therefore, the file name will still be visible
|
|
|
in the tree view but the file itself will be on tape.
|
|
|
Once the storage will reach 80% of its maximum storage capacity, oldest data will
|
|
|
automatically be truncated from the disk in order to save space. Once that happens,
|
|
|
there will still be 2 copies of the file on tape but only the beginning of the file
|
|
|
and its metadata will stay on disk. Therefore, the file name will still be visible
|
|
|
in the tree view but the file itself will be on tape.
|
|
|
|
|
|
This procedure will affect only files of more than 4Mb and will start with files
|
|
|
that haven’t been modified or open for at least 270 days. If that’s not
|
|
|
This procedure will affect only files of more than 4Mb and will start with files
|
|
|
that haven’t been modified or open for at least 270 days. If that’s not
|
|
|
enough, “younger” files may be affected too.
|
|
|
|
|
|
It means that opening data that haven’t been used for a long time will become a
|
|
|
slow process, as these data will first need to migrate from tape to disk before
|
|
|
to be accessible again. The time required will depend of the ongoing backing up
|
|
|
It means that opening data that haven’t been used for a long time will become a
|
|
|
slow process, as these data will first need to migrate from tape to disk before
|
|
|
to be accessible again. The time required will depend of the ongoing backing up
|
|
|
of other files. In optimal conditions, it shouldn't take more than 1h to access a 1 TB file.
|
|
|
|
|
|
# [Contacts](contacts) |
|
|
\ No newline at end of file |
|
|
# [Contacts](contacts) |