|
---
|
|
---
|
|
title: Configuring a standard experiment for NIC5
|
|
title: Configuring a standard experiment for NIC5
|
|
---
|
|
---
|
|
|
|
|
|
Thanks to GitLab CI/CD features and thanks to [Jacamar CI](https://ecp-ci.gitlab.io/docs/admin/jacamar/introduction.html), you can configure a standard experiment for BAMHBI to run on a cluster which you will be able to kickstart from ULiège GitLab with a simple click. For the time being, all CI/CD experiments of the MAST will be run on the NIC5 cluster, notably to benefit from the many MAST resources already stored on NIC5.
|
|
Thanks to GitLab CI/CD features and thanks to [Jacamar CI](https://ecp-ci.gitlab.io/docs/admin/jacamar/introduction.html), you can configure a standard experiment for BAMHBI to run on a cluster which you will be able to kickstart from ULiège GitLab with a simple click. For the time being, all CI/CD experiments of the MAST will be run on the NIC5 cluster, notably to benefit from the many MAST resources already stored on NIC5.
|
|
|
|
|
|
This tutorial details the requirements you should meet beforehand, for both security and storage concerns, then how you can write a suitable CI/CD configuration to create your experiment.
|
|
This tutorial details the requirements you should meet beforehand, for both security and storage concerns, then how you can write a suitable CI/CD configuration to create your experiment.
|
... | @@ -28,9 +29,9 @@ In the next lines, the repository where the BAMHBI code you want to use is locat |
... | @@ -28,9 +29,9 @@ In the next lines, the repository where the BAMHBI code you want to use is locat |
|
|
|
|
|
### Storage concerns
|
|
### Storage concerns
|
|
|
|
|
|
There are two solutions for retrieving products of your CI/CD experiment running on a cluster such as NIC5. On the one hand, you can store them directly on the cluster at a selected location, if you have enough disk space for that. On the other hand, you can download them as _job artifacts_, i.e., files that amount essentially as the outputs of a CI/CD job.
|
|
There are two solutions for retrieving products of your CI/CD experiment running on a cluster such as NIC5. On the one hand, you can store them directly on the cluster at a selected location, if you have enough disk space for that. On the other hand, you can download them as _job artifacts_, i.e., files that amount essentially as the outputs of a CI/CD job. [You can learn more about job artifacts here.](https://docs.gitlab.com/ci/jobs/job_artifacts/)
|
|
|
|
|
|
The SEGI (SErvice Général d'Informatique), who manages the ULiège network and its services, including GitLab, also expects us to make good use of the disk space provided for GitLab projects. In other words, **you should be careful with the size of your _job artifacts_**, if you use any.
|
|
The SEGI (SErvice Général d'Informatique), who manages the ULiège network and its services, including GitLab, also expects us to make good use of the disk space provided for GitLab projects. In other words, **you should be careful with the size of your job artifacts**, if you use any.
|
|
|
|
|
|
1) You should **be able to quantify the volume of data you want to produce and use**. By default, jobs artifacts are restricted to 100 Mo, but this limit can be risen by SEGI for specific repositories if needed. Knowing in advance how much data you want to get from your CI/CD pipeline will help in configuring such a limit.
|
|
1) You should **be able to quantify the volume of data you want to produce and use**. By default, jobs artifacts are restricted to 100 Mo, but this limit can be risen by SEGI for specific repositories if needed. Knowing in advance how much data you want to get from your CI/CD pipeline will help in configuring such a limit.
|
|
|
|
|
... | @@ -102,4 +103,60 @@ my_nic5_job: |
... | @@ -102,4 +103,60 @@ my_nic5_job: |
|
|
|
|
|
By step 4 in the previous section, the job is normally already good to run on NIC5. However, it's **strongly recommended** to add a few additional keywords to have a better control of your job.
|
|
By step 4 in the previous section, the job is normally already good to run on NIC5. However, it's **strongly recommended** to add a few additional keywords to have a better control of your job.
|
|
|
|
|
|
* Make sure to **include ``when: manual`` in the very first stage of your pipeline**. This will ensure your pipeline will start only when you ask, and not at every commit. This matters especially at MAST, as there may be eventually multiple pre-configured standard experiments, and we don't want, obviously, to re-run all of them at each commit. |
|
* Make sure to **include ``when: manual`` in the very first stage of your pipeline**. This will ensure your pipeline will start only when you ask (manually), and not at every commit. This matters especially at MAST, as there may be eventually multiple pre-configured standard experiments, and we don't want, obviously, to re-run all of them at each commit.
|
|
\ No newline at end of file |
|
|
|
|
|
* **To mitigate random crashes that may happen at job start-up, use the ``retry`` keyword** to tell your job to retry a few times. You can also tell it to retry on specific error codes. Typically, random crashes at start-up issue exit code 1, so the next job excerpt will retry up to two times the job if such error occurs.
|
|
|
|
|
|
|
|
```
|
|
|
|
retry:
|
|
|
|
max: 2
|
|
|
|
exit_codes: 1
|
|
|
|
```
|
|
|
|
|
|
|
|
### Technical considerations
|
|
|
|
|
|
|
|
The ``script`` part of your job can be virtually identical to what you would write as a script to submit on a cluster by yourself. You nevertheless have to pay attention to where you get your forcings from, where you will write your outputs (e.g., on the cluster directly or as job artifacts), among others. Here is some advice.
|
|
|
|
|
|
|
|
* **Consider writing the more complex instructions to Bash scripts you will call in your CI/CD job** (i.e., nested scripts). In addition to not being suitable for long sequences of instructions, the YAML syntax (i.e., the syntax of the CI/CD configuration) has some limits. For instance, characters such as `|` or `:` have their own meaning in YAML, so you may struggle running your typical ``mpirun`` command (among others). A simple workaround is to write such command in a short script (which may also contain your ``module load`` instructions) called in your YAML script. [You can find an example of such nested script here.](https://gitlab.uliege.be/especes/mast/nemo4.2.0-bamhbi/-/blob/main/test_cases/lr_cluster/scripts/run_nemo_with_mpi.sh)
|
|
|
|
|
|
|
|
* **Before calling ``mpirun``, use ``ulimit -l unlimited``.** The exact reason why this instruction is needed is unclear at the moment, as it's not needed when you submit a script by yourself on NIC5. This may be fixed by the CÉCI later.
|
|
|
|
|
|
|
|
* **Do not hesitate to store side files (e.g., namelist files, nested scripts) on the cluster or on the repository.** In particular, with the former possibility, you will let other users know how you have configured you experiment, which will help them to reproduce your setting or results. [An example of this practice can be found on the Nemo4.2.0-Bamhbi repository.](https://gitlab.uliege.be/especes/mast/nemo4.2.0-bamhbi/-/tree/main/test_cases/lr_cluster)
|
|
|
|
|
|
|
|
* **Store large files (e.g., atmospherical forcings) in specific directories on the cluster.** GitLab repositories are not suitable to store large files. [Zenodo.org](https://zenodo.org/) is a more suitable option, though the limited network capabilities (small bandwidth, no DNS look-up) of NIC5 compute nodes prevent downloading archives from zenodo.org. This is why it's recommended, for know, to store forcings at a known location, then copy (or alias) them where your job runs.
|
|
|
|
|
|
|
|
### More elaborate pipelines
|
|
|
|
|
|
|
|
As with regular GitLab CI/CD jobs, you can split your standard experiment in several stages. You can for instance run a first job that will compile the model and then perform the actual simulation as a subsequent job. In all cases, do not forget to use the ``needs`` keyword, which instructs GitLab to only run a job when a former job (or multiple jobs) have successfully completed, cf. the example below (from [Nemo4.2.0-Bamhbi CI/CD configuration](https://gitlab.uliege.be/especes/mast/nemo4.2.0-bamhbi/-/blob/main/.gitlab-ci.yml)).
|
|
|
|
|
|
|
|
```
|
|
|
|
...
|
|
|
|
|
|
|
|
lr_nic5_run:
|
|
|
|
stage: run
|
|
|
|
id_tokens:
|
|
|
|
SITE_ID_TOKEN:
|
|
|
|
aud: https://gitlab.uliege.be/especes/mast/nemo4.2.0-bamhbi
|
|
|
|
tags:
|
|
|
|
- nic5
|
|
|
|
- compute
|
|
|
|
- slurm
|
|
|
|
variables:
|
|
|
|
SCHEDULER_PARAMETERS: "--ntasks=114 --cpus-per-task=1 --mem-per-cpu=1024 --time=2:00:00"
|
|
|
|
needs:
|
|
|
|
- lr_nic5_compile
|
|
|
|
script:
|
|
|
|
- mkdir -p /scratch/ulg/mast/mast/bamhbi_cicd/LR
|
|
|
|
- export CICD_HOME=/scratch/ulg/mast/mast/bamhbi_cicd/LR
|
|
|
|
- export ZENODO_MIRROR=/scratch/ulg/mast/mast/zenodo
|
|
|
|
- mv LR-runnable.tar.gz ${CICD_HOME}
|
|
|
|
...
|
|
|
|
```
|
|
|
|
|
|
|
|
Finally, it's worth noting you can absolutely mix NIC5 jobs with jobs running on GitLab. You can find again an example in the [Nemo4.2.0-Bamhbi CI/CD configuration](https://gitlab.uliege.be/especes/mast/nemo4.2.0-bamhbi/-/blob/main/.gitlab-ci.yml), where the outputs from the NIC5 job running the simulation are sent to a containerized job running Python code to produce various figures to assess the model. The obvious constraint of this approach is that the inputs of the GitLab job must be entirely contained in the artifacts of the NIC5 job, as the GitLab job will not have access to the NIC5 environment.
|
|
|
|
|
|
|
|
## And then, we push the button
|
|
|
|
|
|
|
|
Once you have configured your CI/CD pipeline for NIC5, and if you used the ``when: manual`` keyword as recommended, all you have to do next is to literally push the button in the pipeline display, which you may access from the root of the repository (by clicking the CI/CD icon), or by accessing the latest pipeline in **Build > Pipelines**. Do not hesitate to take example on the [Nemo4.2.0-Bamhbi CI/CD configuration](https://gitlab.uliege.be/especes/mast/nemo4.2.0-bamhbi/-/blob/main/.gitlab-ci.yml) to write your own for NIC5 !
|
|
|
|
|
|
|
|
 |
|
|
|
\ No newline at end of file |