Project Files
docs / LICENSING.md
MIT (see top-level package.json and README.md).
datasets/ (gitignored)| Path | Source | Licence | What we use it for |
|---|---|---|---|
datasets/ai4privacy-400k/ | ai4privacy/pii-masking-400k on HuggingFace | Proprietary — "Academic Use" only, non-commercial, no redistribution, no derivative works, watermarked | Local-only evaluation. Aggregate metrics may be reported (e.g. in our README), individual samples must not. |
datasets/nir_validate/ | betagouv/nir_validate, © SGMAP | MIT | Reference implementation for NIR mod-97 validation; source of fixtures embedded in src/detectors.test.ts. |
datasets/french-ssn/ | aymericbouzy/french-ssn, © Aymeric Bouzy | MIT (declared in package.json; no top-level LICENSE file but intent is explicit) | Cross-validation of NIR checksum convention; source of extreme-key fixtures in src/detectors.test.ts. |
The AI4Privacy licence is unusually strict. Read it in full at
datasets/ai4privacy-400k/LICENSE. The relevant constraints:
Access to this dataset is granted exclusively for academic research and non-commercial purposes, under the stipulation that AI4Privacy is acknowledged in any scholarly output that leverages this dataset. To utilize this dataset beyond these conditions, including any form of redistribution, uploading to databases, sharing through any medium, or the creation and dissemination of derivative works, an explicit written licence must be obtained from AI4Privacy.
Practical rules we follow:
eval/inspect-tel-fp.ts (or any
other diagnostic that prints raw text) into a tracked file.The dataset is watermarked: leaking even one sample is detectable.
The src/detectors.test.ts header carries an attribution:
The actual values copied are:
| Value | Source | Purpose |
|---|---|---|
2550814168025 + key 38 | nir_validate | Calvados nominal case |
255082A168025 + key 97 | nir_validate | Corsica 2A |
2890478342163 + key 49 | french-ssn | nominal, mid-range key |
289042A342163 + key 90 | french-ssn | Corsica 2A |
289042B342163 + key 20 | french-ssn | Corsica 2B |
2890478342212 + key 97 | french-ssn | extreme key (wraps at 97) |
2890478342211 + key 01 | french-ssn | extreme key (wraps at 01) |
2890478342210 + key 02 | french-ssn | extreme key |
MIT permits this use freely. The attribution comment in the test file is sufficient under the licence.
// NIR fixtures derived from MIT-licensed reference validators:
// - betagouv/nir_validate (© SGMAP, MIT)
// - aymericbouzy/french-ssn (© Aymeric Bouzy, MIT)