diff --git a/TraiNMT_abridged.ipynb b/TraiNMT_abridged.ipynb
new file mode 100644
index 0000000000000000000000000000000000000000..7e7aa4b845db2a1b0fe07b47dd97a67fd2fb8ae4
--- /dev/null
+++ b/TraiNMT_abridged.ipynb
@@ -0,0 +1,762 @@
+{
+  "nbformat": 4,
+  "nbformat_minor": 0,
+  "metadata": {
+    "colab": {
+      "provenance": []
+    },
+    "kernelspec": {
+      "name": "python3",
+      "display_name": "Python 3"
+    },
+    "language_info": {
+      "name": "python"
+    },
+    "accelerator": "GPU",
+    "gpuClass": "standard"
+  },
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "source": [
+        "# EntraÃ®ner son propre programme de traduction automatique neuronale\n",
+        "\n",
+        "## Ou, comment contraindre un outil de TA au romanesque\n",
+        "\n",
+        "<br>\n",
+        "\n",
+        "Version abrÃ©gÃ©e du cahier d'exercices prÃ©parÃ© Ã  l'occasion du colloque&nbsp;:\n",
+        "\n",
+        "*Traduction littÃ©raire et intelligence artificielle&nbsp;: thÃ©orie, pratique, crÃ©ation*\n",
+        "\n",
+        "21 octobre 2022 &mdash; Paris, France\n",
+        "\n",
+        "<br>\n",
+        "\n",
+        "Damien Hansen\n",
+        "\n",
+        "Centre Interdisciplinaire de Recherche en Traduction et en InterprÃ©tation (UniversitÃ© de LiÃ¨ge, Belgique)\n",
+        "\n",
+        "Laboratoire d'Informatique de Grenoble (UniversitÃ© Grenoble Alpes, France)\n",
+        "\n",
+        "<br>\n",
+        "\n",
+        "Ce contenu est diffusÃ© sous la licence CC BY-SA 4.0.\n",
+        "\n",
+        "Du moment que l'Å“uvre originale est dÃ»ment crÃ©ditÃ©e et que l'Å“uvre partagÃ©e est diffusÃ©e avec la mÃªme licence, vous Ãªtes libres de&nbsp;:\n",
+        "- partager &mdash; copier, distribuer et communiquer le matÃ©riel par tous moyens et sous tous formats&nbsp;;\n",
+        "- adapter &mdash; remixer, transformer et crÃ©er Ã  partir du matÃ©riel\n",
+        "pour toute utilisation, y compris commerciale."
+      ],
+      "metadata": {
+        "id": "X03ZBj1mAJAd"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Ã‰tape 1 &mdash; Installation"
+      ],
+      "metadata": {
+        "id": "TfL4sFfOAP8V"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Installation des outils nÃ©cessaires&nbsp;:\n",
+        "- `OpusFilter` (tÃ©lÃ©chargement des corpus)&nbsp;;\n",
+        "- `fast-mosestokenizer` (tokenisation)&nbsp;;\n",
+        "- `SentencePiece` (segmentation en sous-mots)&nbsp;;\n",
+        "- `OpenNMT-py` (entraÃ®nement du systÃ¨me)&nbsp;;\n",
+        "- `sacreBLEU` (Ã©valuation)."
+      ],
+      "metadata": {
+        "id": "AHoIaZ01orNz"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "KAmADpz1OVD0"
+      },
+      "outputs": [],
+      "source": [
+        "rm -rf sample_data\n",
+        "pip install opusfilter opennmt-py==2.3.0\n",
+        "mkdir datasets subword tools output output/{log,models,tensor,translations,vocab}"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Installation manuelle des modules fast-mosestokenizer et SentencePiece pour utilisation dans Colab&nbsp;:"
+      ],
+      "metadata": {
+        "id": "XB-k15MJoJVx"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "apt-get install libgoogle-perftools-dev\n",
+        "apt-get install libglib2.0\n",
+        "cd tools\n",
+        "git clone https://github.com/google/sentencepiece.git \n",
+        "cd sentencepiece\n",
+        "mkdir build\n",
+        "cd build\n",
+        "cmake ../\n",
+        "make -j $(nproc)\n",
+        "make install\n",
+        "ldconfig -v\n",
+        "cd ../../\n",
+        "git clone https://github.com/mingruimingrui/fast-mosestokenizer.git \n",
+        "cd fast-mosestokenizer\n",
+        "git clone https://code.googlesource.com/re2\n",
+        "cd re2\n",
+        "make\n",
+        "make install\n",
+        "cd ../\n",
+        "mkdir build\n",
+        "cd build\n",
+        "cmake ../\n",
+        "make install\n",
+        "cd ../../../"
+      ],
+      "metadata": {
+        "id": "7hE6jcjz8Yu-"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "TÃ©lÃ©chargement de quelques corpus couramment utilisÃ©s pour la paire anglais-franÃ§ais grÃ¢ce au module [OpusFilter](https://github.com/Helsinki-NLP/OpusFilter) (~ 15&ndash;20 minutes)&nbsp;:\n",
+        "\n",
+        "<details>\n",
+        "<summary>Note</summary>\n",
+        "Le module produit une erreur pour la version brute des corpus <i>GlobalVoices</i> et <i>News-Commentary</i>, nous utilisons donc ici une version diffÃ©rente qui est peu plus longue Ã  tÃ©lÃ©charger.\n",
+        "</details>"
+      ],
+      "metadata": {
+        "id": "UjKo3gBAqY_V"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "with open('corpus.yaml', 'w', encoding='utf-8') as config:\n",
+        "  config.write('''common:\n",
+        "\n",
+        "  output_directory: datasets\n",
+        "\n",
+        "steps:\n",
+        "\n",
+        "  - type: opus_read\n",
+        "    parameters:\n",
+        "      corpus_name: Books\n",
+        "      source_language: en\n",
+        "      target_language: fr\n",
+        "      release: v1\n",
+        "      preprocessing: raw\n",
+        "      src_output: books_raw.en\n",
+        "      tgt_output: books_raw.fr\n",
+        "      suppress_prompts: true\n",
+        "\n",
+        "  - type: opus_read\n",
+        "    parameters:\n",
+        "      corpus_name: Europarl\n",
+        "      source_language: en\n",
+        "      target_language: fr\n",
+        "      release: v8\n",
+        "      preprocessing: raw\n",
+        "      src_output: europarl_raw.en\n",
+        "      tgt_output: europarl_raw.fr\n",
+        "      suppress_prompts: true\n",
+        "\n",
+        "  - type: opus_read\n",
+        "    parameters:\n",
+        "      corpus_name: GlobalVoices\n",
+        "      source_language: en\n",
+        "      target_language: fr\n",
+        "      release: v2018q4\n",
+        "      preprocessing: xml\n",
+        "      src_output: globalvoices_raw.en\n",
+        "      tgt_output: globalvoices_raw.fr\n",
+        "      suppress_prompts: true\n",
+        "\n",
+        "  - type: opus_read\n",
+        "    parameters:\n",
+        "      corpus_name: News-Commentary\n",
+        "      source_language: en\n",
+        "      target_language: fr\n",
+        "      release: v16\n",
+        "      preprocessing: xml\n",
+        "      src_output: news_raw.en\n",
+        "      tgt_output: news_raw.fr\n",
+        "      suppress_prompts: true\n",
+        "\n",
+        "  - type: opus_read\n",
+        "    parameters:\n",
+        "      corpus_name: TED2020\n",
+        "      source_language: en\n",
+        "      target_language: fr\n",
+        "      release: v1\n",
+        "      preprocessing: raw\n",
+        "      src_output: ted_raw.en\n",
+        "      tgt_output: ted_raw.fr\n",
+        "      suppress_prompts: true''')\n",
+        "config.close()\n",
+        "\n",
+        "opusfilter corpus.yaml\n",
+        "rm ./datasets/*.{gz,zip}"
+      ],
+      "metadata": {
+        "id": "-JIzslVIrtad"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Ã‰tape 2 &mdash; PrÃ©paration des donnÃ©es"
+      ],
+      "metadata": {
+        "id": "SduJfP1m5-xS"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Tokenisation et normalisation avec le module [fast-mosestokenizer](https://github.com/mingruimingrui/fast-mosestokenizer)&nbsp;:"
+      ],
+      "metadata": {
+        "id": "3khJ7E3QquaF"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "for file in datasets/*_raw.en ;\\\n",
+        "do filename=$(basename $file _raw.en) ;\\\n",
+        "mosestokenizer -N en < datasets/${filename}_raw.en > datasets/${filename}.en ;\\\n",
+        "done\n",
+        "\n",
+        "for file in datasets/*_raw.fr ;\\\n",
+        "do filename=$(basename $file _raw.fr) ;\\\n",
+        "mosestokenizer -N fr < datasets/${filename}_raw.fr > datasets/${filename}.fr ;\\\n",
+        "done\n",
+        "\n",
+        "rm datasets/*_raw.{en,fr}"
+      ],
+      "metadata": {
+        "id": "5D5wO_JeqbV_"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "CrÃ©ation d'un modÃ¨le de segmentation en sous-mots via [SentencePiece](https://github.com/google/sentencepiece) (~ 15 minutes)&nbsp;:"
+      ],
+      "metadata": {
+        "id": "OKRpaCPrr5Jr"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "spm_train \\\n",
+        "  --input=datasets/books.en,datasets/europarl.en,datasets/globalvoices.en,datasets/news.en,datasets/ted.en\\\n",
+        "  --model_prefix=./subword/unigram_en\\\n",
+        "  --vocab_size=16000\\\n",
+        "  --character_coverage=1.0\\\n",
+        "  --model_type=unigram\n",
+        "\n",
+        "spm_train \\\n",
+        "  --input=datasets/books.fr,datasets/europarl.fr,datasets/globalvoices.fr,datasets/news.fr,datasets/ted.fr\\\n",
+        "  --model_prefix=./subword/unigram_fr\\\n",
+        "  --vocab_size=16000\\\n",
+        "  --character_coverage=1.0\\\n",
+        "  --model_type=unigram"
+      ],
+      "metadata": {
+        "id": "oDwBTeqo83lL"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "CrÃ©ation d'un sous-corpus d'entraÃ®nement, de validation et de test&nbsp;:"
+      ],
+      "metadata": {
+        "id": "a29V4dAqwh-Z"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "head -n -11000 datasets/books.en > datasets/trn.en; truncate -s -1 datasets/trn.en\n",
+        "head -n -11000 datasets/books.fr > datasets/trn.fr; truncate -s -1 datasets/trn.fr\n",
+        "\n",
+        "tail -n 11000 datasets/books.en > datasets/val.en ; truncate -s -1 datasets/val.en\n",
+        "tail -n 11000 datasets/books.fr > datasets/val.fr ; truncate -s -1 datasets/val.fr\n",
+        "\n",
+        "tail -n 1000 datasets/val.en > datasets/tra.en ; head -n -1000 datasets/val.en > temp.txt ; mv temp.txt datasets/val.en\n",
+        "tail -n 1000 datasets/val.fr > datasets/tra.fr ; head -n -1000 datasets/val.fr > temp.txt ; mv temp.txt datasets/val.fr"
+      ],
+      "metadata": {
+        "id": "QPBW3Ushw1vM"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Ã‰tape 3 &mdash; EntraÃ®nement"
+      ],
+      "metadata": {
+        "id": "amt6QopgzP6F"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Fichier de configuration pour le modÃ¨le de TA gÃ©nÃ©rique&nbsp;:"
+      ],
+      "metadata": {
+        "id": "FAZgO6wLzRhF"
+      }
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "rd4eo4ZY2vnZ"
+      },
+      "outputs": [],
+      "source": [
+        "with open('config_train.yaml', 'w', encoding='utf-8') as config:\n",
+        "  config.write('''# Data output:\n",
+        "overwrite: false\n",
+        "save_data: ./output/vocab/voc\n",
+        "src_vocab: ./output/vocab/voc.vocab.src\n",
+        "tgt_vocab: ./output/vocab/voc.vocab.tgt\n",
+        "\n",
+        "# Training corpora:\n",
+        "data:\n",
+        "    europarl:\n",
+        "        path_src: ./datasets/europarl.en\n",
+        "        path_tgt: ./datasets/europarl.fr\n",
+        "        transforms: [filtertoolong, sentencepiece]\n",
+        "        weight: 4\n",
+        "    globalvoices:\n",
+        "        path_src: ./datasets/globalvoices.en\n",
+        "        path_tgt: ./datasets/globalvoices.fr\n",
+        "        transforms: [filtertoolong, sentencepiece]\n",
+        "        weight: 1\n",
+        "    news:\n",
+        "        path_src: ./datasets/news.en\n",
+        "        path_tgt: ./datasets/news.fr\n",
+        "        transforms: [filtertoolong, sentencepiece]\n",
+        "        weight: 1\n",
+        "    ted:\n",
+        "        path_src: ./datasets/ted.en\n",
+        "        path_tgt: ./datasets/ted.fr\n",
+        "        transforms: [filtertoolong, sentencepiece]\n",
+        "        weight: 2\n",
+        "    valid:\n",
+        "        path_src: ./datasets/val.en\n",
+        "        path_tgt: ./datasets/val.fr\n",
+        "        transforms: [filtertoolong, sentencepiece]\n",
+        "src_seq_length: 200\n",
+        "tgt_seq_length: 200\n",
+        "skip_empty_level: silent\n",
+        "src_subword_model: ./subword/unigram_en.model\n",
+        "tgt_subword_model: ./subword/unigram_fr.model\n",
+        "src_subword_vocab: ./subword/unigram_en.vocab\n",
+        "tgt_subword_vocab: ./subword/unigram_fr.vocab\n",
+        "src_subword_alpha: 0.5\n",
+        "tgt_subword_alpha: 0.5\n",
+        "\n",
+        "# Training parameters:\n",
+        "batch_type: \"tokens\"\n",
+        "batch_size: 4096\n",
+        "valid_batch_size: 16\n",
+        "batch_size_multiple: 1\n",
+        "max_generator_batches: 0\n",
+        "accum_count: [3]\n",
+        "accum_steps: [0]\n",
+        "train_steps: 100000\n",
+        "valid_steps: 10000\n",
+        "report_every: 500\n",
+        "save_checkpoint_steps: 10000\n",
+        "queue_size: 10000\n",
+        "bucket_size: 32768\n",
+        "\n",
+        "# Optimization\n",
+        "model_dtype: \"fp32\"\n",
+        "optim: \"adam\"\n",
+        "learning_rate: 2\n",
+        "warmup_steps: 8000\n",
+        "decay_method: \"noam\"\n",
+        "average_decay: 0.0005\n",
+        "adam_beta2: 0.998\n",
+        "max_grad_norm: 0\n",
+        "label_smoothing: 0.1\n",
+        "param_init: 0\n",
+        "param_init_glorot: true\n",
+        "normalization: \"tokens\"\n",
+        "\n",
+        "# Model\n",
+        "encoder_type: transformer\n",
+        "decoder_type: transformer\n",
+        "enc_layers: 6\n",
+        "dec_layers: 6\n",
+        "heads: 8\n",
+        "rnn_size: 512\n",
+        "word_vec_size: 512\n",
+        "transformer_ff: 2048\n",
+        "dropout_steps: [0]\n",
+        "dropout: [0.1]\n",
+        "attention_dropout: [0.1]\n",
+        "position_encoding: true\n",
+        "\n",
+        "# Model output:\n",
+        "save_model: ./output/models/train\n",
+        "\n",
+        "# Logs:\n",
+        "log_file: ./output/log/train\n",
+        "tensorboard: true\n",
+        "tensorboard_log_dir: ./output/tensor/train\n",
+        "\n",
+        "# GPU settings:\n",
+        "world_size: 1\n",
+        "gpu_ranks: [0]\n",
+        "\n",
+        "# Reproducibility:\n",
+        "seed: 123''')\n",
+        "config.close()"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Fichier de configuration pour le modÃ¨le de TA affinÃ©&nbsp;:"
+      ],
+      "metadata": {
+        "id": "wgn6piww1vns"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "with open('config_tuned.yaml', 'w', encoding='utf-8') as config:\n",
+        "  config.write('''# Data output:\n",
+        "overwrite: false\n",
+        "save_data: ./output/vocab/voc\n",
+        "src_vocab: ./output/vocab/voc.vocab.src\n",
+        "tgt_vocab: ./output/vocab/voc.vocab.tgt\n",
+        "\n",
+        "# Training corpora:\n",
+        "data:\n",
+        "    europarl:\n",
+        "        path_src: ./datasets/europarl.en\n",
+        "        path_tgt: ./datasets/europarl.fr\n",
+        "        transforms: [filtertoolong, sentencepiece]\n",
+        "        weight: 1\n",
+        "    globalvoices:\n",
+        "        path_src: ./datasets/globalvoices.en\n",
+        "        path_tgt: ./datasets/globalvoices.fr\n",
+        "        transforms: [filtertoolong, sentencepiece]\n",
+        "        weight: 1\n",
+        "    news:\n",
+        "        path_src: ./datasets/news.en\n",
+        "        path_tgt: ./datasets/news.fr\n",
+        "        transforms: [filtertoolong, sentencepiece]\n",
+        "        weight: 1\n",
+        "    ted:\n",
+        "        path_src: ./datasets/ted.en\n",
+        "        path_tgt: ./datasets/ted.fr\n",
+        "        transforms: [filtertoolong, sentencepiece]\n",
+        "        weight: 1\n",
+        "    books:\n",
+        "        path_src: ./datasets/trn.en\n",
+        "        path_tgt: ./datasets/trn.fr\n",
+        "        transforms: [filtertoolong, sentencepiece]\n",
+        "        weight: 5\n",
+        "    valid:\n",
+        "        path_src: ./datasets/val.en\n",
+        "        path_tgt: ./datasets/val.fr\n",
+        "        transforms: [filtertoolong, sentencepiece]\n",
+        "src_seq_length: 200\n",
+        "tgt_seq_length: 200\n",
+        "skip_empty_level: silent\n",
+        "src_subword_model: ./subword/unigram_en.model\n",
+        "tgt_subword_model: ./subword/unigram_fr.model\n",
+        "src_subword_vocab: ./subword/unigram_en.vocab\n",
+        "tgt_subword_vocab: ./subword/unigram_fr.vocab\n",
+        "src_subword_alpha: 0.5\n",
+        "tgt_subword_alpha: 0.5\n",
+        "\n",
+        "# Training parameters:\n",
+        "batch_type: \"tokens\"\n",
+        "batch_size: 4096\n",
+        "valid_batch_size: 16\n",
+        "batch_size_multiple: 1\n",
+        "max_generator_batches: 0\n",
+        "accum_count: [3]\n",
+        "accum_steps: [0]\n",
+        "train_steps: 150000\n",
+        "valid_steps: 5000\n",
+        "report_every: 100\n",
+        "save_checkpoint_steps: 5000\n",
+        "queue_size: 10000\n",
+        "bucket_size: 32768\n",
+        "train_from: ./output/models/train_step_100000.pt\n",
+        "\n",
+        "# Optimization\n",
+        "model_dtype: \"fp32\"\n",
+        "optim: \"adam\"\n",
+        "learning_rate: 2\n",
+        "warmup_steps: 8000\n",
+        "decay_method: \"noam\"\n",
+        "average_decay: 0.0005\n",
+        "adam_beta2: 0.998\n",
+        "max_grad_norm: 0\n",
+        "label_smoothing: 0.1\n",
+        "param_init: 0\n",
+        "param_init_glorot: true\n",
+        "normalization: \"tokens\"\n",
+        "\n",
+        "# Model\n",
+        "encoder_type: transformer\n",
+        "decoder_type: transformer\n",
+        "enc_layers: 6\n",
+        "dec_layers: 6\n",
+        "heads: 8\n",
+        "rnn_size: 512\n",
+        "word_vec_size: 512\n",
+        "transformer_ff: 2048\n",
+        "dropout_steps: [0]\n",
+        "dropout: [0.1]\n",
+        "attention_dropout: [0.1]\n",
+        "position_encoding: true\n",
+        "\n",
+        "# Model output:\n",
+        "save_model: ./output/models/tuned\n",
+        "\n",
+        "# Logs:\n",
+        "log_file: ./output/log/tuned\n",
+        "tensorboard: true\n",
+        "tensorboard_log_dir: ./output/tensor/tuned\n",
+        "\n",
+        "# GPU settings:\n",
+        "world_size: 1\n",
+        "gpu_ranks: [0]\n",
+        "\n",
+        "# Reproducibility:\n",
+        "seed: 123''')\n",
+        "config.close()"
+      ],
+      "metadata": {
+        "id": "HupO64dt2vMj"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "CrÃ©ation des fichiers de vocabulaire utilisÃ©s par [OpenNMT](https://github.com/OpenNMT/OpenNMT-py)&nbsp;:"
+      ],
+      "metadata": {
+        "id": "27pl8wJQ3wJu"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "onmt_build_vocab --config config_tuned.yaml --n_sample -1"
+      ],
+      "metadata": {
+        "id": "Zhx5JWK_324G"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "EntraÃ®nement du systÃ¨me gÃ©nÃ©rique&nbsp;:"
+      ],
+      "metadata": {
+        "id": "cpnxR0YS-a7_"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "onmt_train --config config_train.yaml"
+      ],
+      "metadata": {
+        "id": "IrfWAbIt5S0e"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "EntraÃ®nement du systÃ¨me affinÃ©&nbsp;:"
+      ],
+      "metadata": {
+        "id": "aLRi6ZaAAHLj"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "onmt_train --config config_tuned.yaml"
+      ],
+      "metadata": {
+        "id": "XGPaZbuF5SIB"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "## Ã‰tape 4 &mdash; Traduction et Ã©valuation"
+      ],
+      "metadata": {
+        "id": "dlyT9dWGAud3"
+      }
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Segmentation en sous-mots du texte soumis au systÃ¨me&nbsp;:"
+      ],
+      "metadata": {
+        "id": "rcprLVCa2AzX"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "spm_encode \\\n",
+        "  --model=subword/unigram_en.model \\\n",
+        "  --output_format=piece \\\n",
+        "  < datasets/tra.en \\\n",
+        "  > datasets/tra_sub.en"
+      ],
+      "metadata": {
+        "id": "DujCQvZM1XJH"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Traduction pour chaque point de sauvegarde de nos deux systÃ¨mes&nbsp;:"
+      ],
+      "metadata": {
+        "id": "OcCRM_Lk2P-o"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "for checkpoint in output/models/*.pt ;\\\n",
+        "do filename=$(basename $checkpoint .pt) ;\\\n",
+        "echo \"# Translating checkpoint\" ${filename} ;\\\n",
+        "onmt_translate \\\n",
+        "  --verbose \\\n",
+        "  --model $checkpoint \\\n",
+        "  --src datasets/tra_sub.en \\\n",
+        "  --output output/translations/${filename}_sub.txt ;\\\n",
+        "done"
+      ],
+      "metadata": {
+        "id": "id3FeyUE2mq1"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "DÃ©tokenisation des traductions produites par nos systÃ¨mes&nbsp;:"
+      ],
+      "metadata": {
+        "id": "uph7ub_H3kff"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "for file in output/translations/*_sub.txt ;\\\n",
+        "do filename=$(basename $file _sub.txt) ;\\\n",
+        "spm_decode \\\n",
+        "\t--model=subword/unigram_fr.model \\\n",
+        "\t--input_format=piece \\\n",
+        "\t< output/translations/${filename%.*}_sub.txt \\\n",
+        "\t> output/translations/${filename%.*}_tok.txt ;\\\n",
+        "done\n",
+        "\n",
+        "for file in output/translations/*_tok.txt ;\\\n",
+        "do filename=$(basename $file _tok.txt) ;\\\n",
+        "mosestokenizer -D fr \\\n",
+        "\t< output/translations/${filename%.*}_tok.txt \\\n",
+        "\t> output/translations/${filename%.*}.txt ;\\\n",
+        "done\n",
+        "\n",
+        "rm output/translations/*sub.txt\n",
+        "rm output/translations/*tok.txt"
+      ],
+      "metadata": {
+        "id": "3d5z-018BiyX"
+      },
+      "execution_count": null,
+      "outputs": []
+    },
+    {
+      "cell_type": "markdown",
+      "source": [
+        "Ã‰valuation et sÃ©lection du meilleur modÃ¨le avec [sacreBLEU](https://github.com/mjpost/sacrebleu)&nbsp;:"
+      ],
+      "metadata": {
+        "id": "lBwMRO4j3325"
+      }
+    },
+    {
+      "cell_type": "code",
+      "source": [
+        "sacrebleu datasets/tra.fr \\\n",
+        "\t--input output/translations/*.txt \\\n",
+        "\t--language-pair en-fr \\\n",
+        "\t--metrics bleu chrf ter \\\n",
+        "\t--chrf-word-order 2 \\\n",
+        "\t--tokenize 13a \\\n",
+        "\t--width 2 \\\n",
+        "\t--format text"
+      ],
+      "metadata": {
+        "id": "efbvsvZwBPBA"
+      },
+      "execution_count": null,
+      "outputs": []
+    }
+  ]
+}
\ No newline at end of file