... | ... | @@ -6,21 +6,21 @@ |
|
|
|
|
|
Files should be named and organised in a way that indicates their content and specifies their relationship to other files. File names must describe, at a glance, what the document is about, making it easier to browse them more effectively and efficiently.
|
|
|
|
|
|
The directory tree must be clear with explicit (meaningful) and unique folder names. Using numbers can sometimes help organising the tree but it is recommended that the numbers are preceded with 0's to ensure that files/folders are listed in numerical order (examples: 01-Folder_x 02-Folder_y).
|
|
|
The directory tree must be clear with explicit (meaningful) and unique folder names. Using numbers can sometimes help organising the tree but it is recommended that the numbers are preceded with 0's to ensure that files/folders are listed in numerical order (examples: ``01-Folder_x``, ``02-Folder_y``).
|
|
|
|
|
|
Folders with precious data that will be needed for publication and folders containing temporary files should be clearly defined to facilitate data management and avoid losing important files.
|
|
|
|
|
|
All data must be properly organised and annotated so that they are accessible to current and future members working on the corresponding project.
|
|
|
|
|
|
For this purpose, it is useful to provide in each project or experiment folder an additional file describing the organisation and/or content of the data files. For these meta-data, it is recommended to use text-only files, and not binary files like Word, Excell, PDF. For example if you use a .csv/.tsv files (comma/tab separated values) for tables instead of a Excel file, you will be able to edit it with Excel as usual and bioinformatics tools will be able to read it too, while Excel files are readable only by very Microsoft and OpenOffice software. For free text, you can also use .txt or .md (markdown) files, and .json files for structured data and key/value pairs.
|
|
|
For this purpose, it is useful to provide in each project or experiment folder an additional file describing the organisation and/or content of the data files. For these meta-data, it is recommended to use text-only files, and not binary files like Word, Excell, PDF. For example if you use a ``.csv``/``.tsv`` files ([comma](https://en.wikipedia.org/wiki/Comma-separated_values)/[tab](https://en.wikipedia.org/wiki/Tab-separated_values) separated values) for tables instead of a Excel file, you will be able to edit it with Excel as usual and bioinformatics tools will be able to read it too, while Excel files are readable only by very Microsoft and OpenOffice software. For free text, you can also use ``.txt`` or ``.md`` ([markdown](https://en.wikipedia.org/wiki/Markdown)) files, and ``.json`` ([JavaScript Object Notation](https://en.wikipedia.org/wiki/JSON)) files for structured data and key/value pairs.
|
|
|
|
|
|
|
|
|
|
|
|
# File naming
|
|
|
|
|
|
Because the mass storage is a Linux-based infrastructure, users should use Linux-friendly names for files and directories. This implies avoiding non-English characters such as accents and symbols as well as spaces and tabulations. This is important because some analysis tools won’t accept non-English characters, but also to facilitate data management by users and system administrators. Files and folders with names not following these rules may cause problems in system maintenance and in backup procedures, leading to the absence of backed up versions of the file and loss of data in case of problem with the main drive.
|
|
|
Because the mass storage is a Linux-based infrastructure, users should use Linux-friendly names for files and directories. This implies avoiding non-English characters such as accents and symbols as well as spaces and tabulations. This is important because some analysis tools WILL NOT accept non-English characters, but also to facilitate data management by users and system administrators. Files and folders with names not following these rules may cause problems in system maintenance and in backup procedures, leading to the absence of backed up versions of the file and loss of data in case of problem with the main drive.
|
|
|
|
|
|
Of note, all file names are case sensitive, so test.txt, Test.txt and TEST.txt are three different files.
|
|
|
Of note, all file names are case sensitive, so ``test.txt``, ``Test.txt`` and ``TEST.txt`` are three different files.
|
|
|
|
|
|
NB: This also means that it is dangerous to rely on the alphabetical order, where the case (‘a’ vs ’A’) may not be accounted in the same way across systems/softwares. So files created on Windows may not be listed in the same order on Linux.
|
|
|
|
... | ... | @@ -39,13 +39,13 @@ They must **never** contain: |
|
|
* Japanese, Chinese, Korean, Greek, Hebraic, Arabic or other non-English characters and ideograms
|
|
|
* Diacritics such as accent, umlaut, tilde or cedilla (é, è, ê, ä, á, í, ñ, õ, ç, etc.)
|
|
|
* special characters such as , ; . : ! ? / \ “ # [ ] > < % * = & $ | ^ ( ) { }
|
|
|
* ystem reserved keywords such as CON, PRN, AUX, CLOCK$, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9
|
|
|
* system reserved keywords such as CON, PRN, AUX, CLOCK$, NUL, COM1, COM2, COM3, COM4, COM5, COM6, COM7, COM8, COM9, LPT1, LPT2, LPT3, LPT4, LPT5, LPT6, LPT7, LPT8, and LPT9
|
|
|
|
|
|
|
|
|
## Recommendations
|
|
|
|
|
|
* Keep file names short, meaningful and easily understandable to others. Limit them to 25 characters in length if possible. Short but meaningful is best.
|
|
|
* Do not use “empty words” like « le, la, les, un, une, des, et, ou, the, a, an, and, or, etc.»
|
|
|
* Do not use “empty words” like "le", "la", "les", "un", "une", "des", "et", "ou", "the", "a", "an", "and", "or", etc.
|
|
|
* Dates should always follow the format YYYYMMDD (e.g. 20190625) (ISO8601 norm). Start the filename with the date if it is important to store or sort files
|
|
|
in chronological order.
|
|
|
* Avoid unnecessary repetition and redundancy in file names and paths.
|
... | ... | |