Collating Sequences [ MPE XL Native Language Programmer's Guide ] MPE/iX 5.0 Documentation
MPE XL Native Language Programmer's Guide
Appendix B Collating Sequences
Collating is defined as arranging character strings into order (usually
alphabetic). To do this, a mechanism must be available that, given two
character strings, decides which one comes first. In Native Language
Support (NLS) this mechanism is the NLCOLLATE intrinsic.
NOTE This appendix deals with collating or lexical ordering and does not
include matching. For matching purposes, there is generally a
difference between A and a.
Look at the full ROMAN8 character set and consider that all these
characters can appear in every European language. Even if a character
does not exist in a language, it can still show up in names and/or
addresses. It is quite useful to address a letter to Spain correctly,
even if it originates in Germany. Therefore, the full ROMAN8 character
set is considered to be used in all languages, and a collating sequence
has been defined for all characters in the ROMAN8 character set for the
languages it supports. Table B-1 lists the collating sequence for
American-English, Canadian-French, Danish, Dutch, English, Finnish,
French, German, Italian, Norwegian, Portuguese, Spanish, and Swedish.
All characters in an alpha or numeric group collate the same. These
characters usually differ only in uppercase versus lowercase priority, or
accent priority. (Refer to Table B-2 for collating sequences.) In
sorting, they are initially considered the same. If characters in the
two strings do not determine which string comes first, then the
priorities of characters are used to determine the order. Refer to Table
B-1 for examples of collating sequence priority.
Table B-1. Collating Sequence Priority
---------------------------------------------------------------------------------------------
| | |
| Example | Priority Explanation |
| Sorted Strings | |
| | |
---------------------------------------------------------------------------------------------
| | |
| aEb, aEc | The third character in each string is different. The "b" precedes |
| | the "c". |
| | |
---------------------------------------------------------------------------------------------
| | |
| aeb,aEb | The characters in the two strings are identical, so accent priority |
| | determines the order. The "e" precedes the "E". |
| | |
---------------------------------------------------------------------------------------------
| | |
| abc, Abd | The last characters in the strings are different. The "c" precedes |
| | the "d". |
| | |
---------------------------------------------------------------------------------------------
| | |
| aBc, abc | The characters in the two strings are the same, so the uppercase |
| | priority determines the order. The "B" precedes the "b". |
| | |
---------------------------------------------------------------------------------------------
Table B-2 displays the collating sequence in three ways:
* The graphic representation of the character.
* The decimal equivalent of the character's binary value.
* A description of the character.
Table B-2. Collating Sequence
Table B-2. Collating Sequence (continued)
Table B-2. Collating Sequence (continued)
Table B-2. Collating Sequence (continued)
Table B-2. Collating Sequence (continued)
Table B-2. Collating Sequence (continued)
Table B-2. Collating Sequence (continued)
NOTE The (uppercase AE ligature) and (lowercase ae ligature) are
expanded for collating purposes to AE or ae and collates as:
ad AE Ae aE ae AF
The beta (sharp s) is expanded for collating purposes to ss and
collates according to the German standard as:
sr ss st
Table B-3 through Table B-6 show the language-dependent variations to the
collating sequence.
MPE/iX 5.0 Documentation