DIN_91379

DIN 91379

DIN 91379

Unicode subset for Europe


The DIN standard DIN 91379: "Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM"[1] defines a normative subset of Unicode Latin characters, sequences of base characters and diacritic signs, and special characters for use in names of persons, legal entities, products, addresses etc. The standard defines a normative mapping of Latin letters to base letters A-Z as an extension of the recommendations of ICAO.[2]

In the informative part of the standard, a set of extended characters is defined, which includes Greek and Cyrillic letters as well as other special characters for names of legal entities and product names.

Languages and scripts supported

The subset supports all official languages of European Union countries as well as the official languages of Iceland, Liechtenstein, Norway, Switzerland, and also the German minority languages. To allow the transliteration of names in other writing systems to the Latin script according to the relevant ISO standards all necessary diacritic signs are provided. However, this support is not complete.

The standard supports the necessary characters for entries in the civil status register, which has replaced the civil status book. According to the Law on the Convention of September 13, 1973 on the indication of family names and first names in the civil status books[3] information in Latin characters is to be taken over true to the letter with all diacritic marks and information in other characters is to be reproduced by transliteration, if possible in accordance with ISO standards.

In addition to the normative characters the standard defines subsets of extended characters that contain modern Greek letters for Greece and Cyprus, Cyrillic letters for Bulgaria and special characters for names of products and legal entities.

Conforming applications may support additional characters, however for interface agreements or registers it may be appropriate to support only a final subset of characters and sequences based on this standard.[4]

The text of the predecessor, DIN SPEC 91379,[5] explanations and lists of characters and sequences as Excel and XML files can be found in Koordinierungsstelle für IT-Standards (KoSIT).[4] This reference contains also an XML schema file with patterns to check conformance of text to subsets defined in this standard. Lists of characters and sequences of DIN SPEC 91379 and DIN 91379 as plain text files are available via GitHub in DIN 91379 Characters and Sequences.[6] The DIN contains few additional characters and sequences.[6][1]

Application of the standard

The compliance to this standard will be mandatory for German authorities and organisations in the exchange of data between authorities or with citizens and business from Nov 1, 2024.[7]

The architecture guideline for German federal IT demands in the version from July 2022 the usage of the predecessor DIN SPEC 91379.[8]

Continuous text and historic letters are not in the scope of this norm.[1]

Structure of the standard

The DIN standard consists of a normative[9] and an informative[9] part.

The requirements in the normative part are binding for all compliant systems. In the normative part, the letters for processing names with basic Latin letters and diacritics are specified. All compliant systems must support these letters. Furthermore, a mapping of the normative letters to the basic Latin letters A-Z is defined.

A compliant system may support additional letters in addition to the normative letters.

The recommendations in the informative part are not binding for compliant systems. The informative part determines a UNICODE subset of extended letters, e.g. for legal entities, product names and for data exchange in the EU. In addition the informative part defines data types that can be used for checking data fields.

Normative part

Compliance

To be compliant to this norm, it is required to

  • support all normative letters and sequences at all processing stages,
  • use the encoding UTF-8 at interfaces, and
  • normalize the characters according to Unicode normalization form C (NFC).[1]

Normative letters

Any conforming IT system must be able to process the normative letters in all name fields. This includes the collection, storage, transmission, display, and printout.

The normative character groups are given below. The associated characters can also be found in DIN 91379 Characters and Sequences for machine processing.[6] The following tables of characters were generated from the XML file chars.xml in the DIN appendix.

Latin letters (bll)

These letters must be supported to represent names, especially personal names.

More information Code Points, Name ...

Non-letters N1 (bnlreq)

These characters must be supported to represent names, especially personal names.

More information Code Points, Name ...

Non-letters N2 (bnl)

These characters must be supported to represent names in a broader sense, e. g. place names, street names, house numbers, legal entity names, and product names. They are not suitable for personal names.

More information Code Points, Name ...

Non-letters N3 (bnlopt)

These letters are included for backwards compatibility with the standard Latin characters in Unicode. Version 1.1.1.[10]

They are not relevant for personal names or other names, only for legal entity names and product names.

More information Code Points, Name ...

Non-letters N4 (bnlnot)

These whitespace letters are unsuitable for representing names, but they must be processed.

The letter NO-BREAK SPACE is necessary to prevent a line break in special names that could change the meaning. The other letters are included for backwards compatibility with the standard Latin characters in Unicode. Version 1.1.1.[10]

More information Code Points, Name ...

Deprecated letters

Existing documents and register entries contain deprecated letters that are no longer used today. These letters must be supported by compliant IT systems. When creating new entries, deprecated letters should not be used.

More information Deprecated, Replacement ...

Normative mapping of Latin letters to basic letters (search form)

A normative mapping of all normative letters to the basic Latin letters A–Z is given below. This mapping is required, for example, for the machine-readable zone of passports. Another application is the creation of search forms, so that names can be found even if they are spelled differently or without specifying the diacritics.

The following table is based on table 9 of DIN 91379 and chapter 6, table A of the ICAO specifications for machine-readable travel documents.[2] The table was created with the information from the XML file chars.xml in the DIN 91379 appendix.

Entries that appear in the ICAO specification and in table 9 of DIN are marked with ICAO in the Mapping column, additional entries in table 9 of the DIN are marked with EXT. In the Type column, ID is specified for entries that describe an identity mapping, and MAP for other mappings.

More information Source, Destination ...

Informative part

Extended letters

Each conforming IT system should be able to handle the extended letters for all name fields. This includes the collection, storage, transmission, display, and printout.

Greek letters (gl)

For cross-border data exchange, every IT system should support Greek letters in name fields.

More information Code Points, Name ...

Cyrillic letters (cl)

For cross-border data exchange, every IT system should support Cyrillic letters in name fields for Bulgarian names.

More information Code Points, Name ...

Non-letters E1 (enl)

These letters should be supported for legal entity names and product names.

More information Code Points, Name ...

Technical data types (informative)

For information, technical data types are defined as subsets of the letters defined in the standard. These can be used for interface agreements, for technical checks or as a basis for creating your own data types. An implementation as an XML schema type is included in the din-91379-datatypes.xsd file attached to the standard. This implementation is also freely available under the CC BY-ND license as part of the XOEV library.[11]

More information Data type, Latin Letters (blla) ...

Added letters

Compared to DIN SPEC 91379, some additional letters have been included, only two of these letters are not deprecated.

More information Code Points, Name ...

Current state

Current results of the standardization process include the specification DIN SPEC 91379 in March 2019 and final DIN standard in August 2022. Efforts are being made to further develop it into a European CEN standard.[4]

Open-source software supporting DIN 91379

  • Free Java library for creating and editing PDF supporting DIN 91379:
  • Free converter from XSL formatting objects to PDF
  • Free Fonts for DIN 91379
    • Arimo[16][17]
    • Noto Latin, Greek, Cyrillic,[18] see also issue "Combining comma above right" at wrong position[19]
    • Sudo coding font[20]

References

  1. "DIN 91379:2022-08: Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM" (in German). Beuth Verlag (now DIN Media GmbH). August 2022.
  2. "Gesetz zu dem Übereinkommen vom 13. September 1973 über die Angabe von Familiennamen und Vornamen in den Personenstandsbüchern" [Law on the Convention of September 13, 1973 on the indication of family names and first names in civil status books] (PDF). Bundesgesetzblatt 1976 No. 48 (in German). Bundesanzeiger Verlag. 1976-09-03. Retrieved 2024-04-10.
  3. Koordinierungsstelle für IT-Standards (KoSIT). "String.Latin+ 1.2: eine kommentierte und erweiterte Fassung der DIN SPEC 91379. Inklusive einer umfangreichen Liste häufig gestellter Fragen. Herausgegeben von der Fachgruppe String.Latin. (zip, 1.7 MB)" [String.Latin+ 1.2: Commented and extended version of DIN SPEC 91379.] (in German). Retrieved 2022-03-19.
  4. "DIN 91379 Characters and Sequences". 19 August 2022. Retrieved 2022-08-19 via GitHub.
  5. IT-Planungsrat (2022-11-10). "Beschluss 2022/51 – String.Latin" [Decision 2022/51 – String.Latin] (in German). Retrieved 2022-12-22.
  6. Der Beauftragte der Bundesregierung für Informationstechnik. "Architekturrichtlinie für die IT des Bundes – Technische Spezifikationen zur Architekturrichtlinie –" [Architecture guideline for federal IT – Technical specifications for the architecture guideline –] (PDF) (in German). Retrieved 2022-10-08.
  7. "Lateinische Zeichen in Unicode. Version 1.1.1" (PDF). Koordinierungsstelle für IT-Standards (KoSIT). 2012-01-27. Retrieved 2024-04-23.
  8. "din-norm-91379-datatypes.xsd". XOEV-Bibliothek. Koordinierungsstelle für IT-Standards (KoSIT). 2022-10-14. Retrieved 2023-04-30.
  9. "Accents, DIN 91379, non Latin scripts". May 10, 2022 via GitHub.
  10. "Mirror of Apache FOP". Feb 9, 2023 via GitHub.
  11. "Arimo". Google Fonts.
  12. "Noto Latin, Greek, Cyrillic". Feb 9, 2023 via GitHub.
  13. "Sudo Coding Font". Aug 30, 2023.

Share this article:

This article uses material from the Wikipedia article DIN_91379, and is written by contributors. Text is available under a CC BY-SA 4.0 International License; additional terms may apply. Images, videos and audio are available under their respective licenses.