HPlogo MPE XL Native Language Programmer's Guide: 900 Series HP 3000 Computer Systems

Appendix B Collating Sequences

» 

Technical documentation

Complete book in PDF
» Feedback

 » Table of Contents

 » Index

Collating is defined as arranging character strings into order (usually alphabetic). To do this, a mechanism must be available that, given two character strings, decides which one comes first. In Native Language Support (NLS) this mechanism is the NLCOLLATE intrinsic.

NOTE: This appendix deals with collating or lexical ordering and does not include matching. For matching purposes, there is generally a difference between A and a.

Look at the full ROMAN8 character set and consider that all these characters can appear in every European language. Even if a character does not exist in a language, it can still show up in names and/or addresses. It is quite useful to address a letter to Spain correctly, even if it originates in Germany. Therefore, the full ROMAN8 character set is considered to be used in all languages, and a collating sequence has been defined for all characters in the ROMAN8 character set for the languages it supports. Table B-1 “Collating Sequence Priority” lists the collating sequence for American-English, Canadian-French, Danish, Dutch, English, Finnish, French, German, Italian, Norwegian, Portuguese, Spanish, and Swedish.

All characters in an alpha or numeric group collate the same. These characters usually differ only in uppercase versus lowercase priority, or accent priority. (Refer to Table B-2 “Collating Sequence” for collating sequences.) In sorting, they are initially considered the same. If characters in the two strings do not determine which string comes first, then the priorities of characters are used to determine the order. Refer to Table B-1 “Collating Sequence Priority” for examples of collating sequence priority.

Table B-1 Collating Sequence Priority

ExamplePriority Explanation
Sorted 
Strings 
aEb, aEcThe third character in each string is different. The "b" precedes the "c".
aéb,aEbThe characters in the two strings are identical, so accent priority determines the order. The "é" precedes the "E".
abc, AbdThe last characters in the strings are different. The "c" precedes the "d".
aBc, abcThe characters in the two strings are the same, so the uppercase priority determines the order. The "B" precedes the "b".

 

Table B-2 “Collating Sequence” displays the collating sequence in three ways:

  • The graphic representation of the character.

  • The decimal equivalent of the character's binary value.

  • A description of the character.

Table B-2 Collating Sequence

CharacterDecimalDescription
 Equivalent 
 32Space
 160Do not use
048Zero
149One
250Two
351Three
452Four
553Five
654Six
755Seven
856Eight
957Nine
A65Uppercase A
a97Lowercase a
Á224Uppercase A acute
á196Lowercase a acute
À161Uppercase A grave
à200Lowercase a grave
Â162Uppercase A circumflex
â192Lowercase a circumflex
Ä216Uppercase A umlaut/diaeresis
ä204Lowercase a umlaut/diaeresis
Å208Uppercase A degree
å212Lowercase a degree
Ã225Uppercase A tilde
ã226Lowercase a tilde
B66Uppercase B
b98Lowercase b
C67Uppercase C
c99Lowercase c
Ç180Uppercase C cedilla
ç181Lowercase c cedilla
D68Uppercase D
d100Lowercase d
Đ227Uppercase D stroke
đ228Lowercase d stroke
E69Uppercase E
e101Lowercase e
É220Uppercase E acute
é197Lowercase e acute
È163Uppercase E grave
è201Lowercase e grave
Ê164Uppercase E circumflex
ê193Lowercase e circumflex
Ë165Uppercase E umlaut/diaeresis
ë205Lowercase e umlaut/diaeresis
F70Uppercase F
f102Lowercase f
G71Uppercase G
g103Lowercase g
H72Uppercase H
h104Lowercase h
I73Uppercase I
i105Lowercase i
Í229Uppercase I acute
í213Lowercase i acute
Ì230Uppercase I grave
ì217Lowercase i grave
Î166Uppercase I circumflex
î209Lowercase i circumflex
Ï167Uppercase I umlaut/diaeresis
ï221Lowercase i umlaut/diaeresis
J74Uppercase J
j106Lowercase j
K75Uppercase K
k107Lowercase k
L76Uppercase L
l108Lowercase l
M77Uppercase M
m109Lowercase m
N78Uppercase N
n109Lowercase n
Ñ182Uppercase N tilde
ñ183Lowercase n tilde
O79Uppercase O
o110Lowercase o
Ó231Uppercase O acute
ó198Lowercase o acute
Ò232Uppercase O grave
ò202Lowercase o grave
Ô223Uppercase O circumflex
ô194Lowercase o circumflex
Ö218Uppercase O umlaut/diaeresis
ö206Lowercase o umlaut/diaeresis
Õ233Uppercase O tilde
õ234Lowercase o tilde
Ø210Uppercase O crossbar
ø214Lowercase o crossbar
P80Uppercase P
p112Lowercase p
Q81Uppercase Q
q113Lowercase q
R82Uppercase R
r114Lowercase r
S83Uppercase S
s115Lowercase s
Š235Uppercase S caron
š236Lowercase s caron
T84Uppercase T
t116Lowercase t
U85Uppercase U
u117Lowercase u
Ú237Uppercase U acute
ú199Lowercase u acute
Ù173Uppercase U grave
ù203Lowercase u grave
Û174Uppercase U circumflex
û195Lowercase u circumflex
Ü219Uppercase U umlaut/diaeresis
ü207Lowercase u umlaut/diaeresis
V86Uppercase V
v118Lowercase v
W87Uppercase W
w119Lowercase w
X88Uppercase X
x120Lowercase x
Y89Uppercase Y
y121Lowercase y
Ÿ238Uppercase Y umlaut/diaeresis
[yuml ]239Lowercase /diaeresis
Z90Uppercase Z
z122Lowercase z
Þ240Uppercase thorn
þ241Lowercase thorn
 177-178Currently undefined
 242-245Currently undefined
(40Left parenthesis
)41Right parenthesis
[91Left bracket
]93Right bracket
{123Left brace
}125Right brace
«251Left guillemets
»253Right guillemets
<60Less than sign
>62Greater than sign
=61Equal sign
+43Plus
-45Minus
±254Plus/Minus
¼247One quarter
½248One half
°179Degree (ring)
%37Percent sign
*42Asterisk
.46Period (point)
,44Comma
;59Semicolon
:58Colon
¿185Inverse question mark
?63Question mark
¡184Inverse exclamation point
!33Exclamation point
/47Slant
\92Reverse slant
|124Vertical bar
@64Commercial at
&38Ampersand
#35Number sign (hash)
§189Section
$36U. S. dollar sign
¢191U.S. cent sign
£187British pound sign
£ 175Italian lira sign
¥188Japanese yen sign
ƒ190Dutch guilder sign
 186General currency sign
"34Double quote
'96Opening single quote
'39Closing single quote
^96Caret
~126Tilde
´168Acute grave
`169Accent grave
^170Accent circumflex
¨171Umlaut/Diaeresis
~172Tilde accent
_95Underscore
246Long dash
176Overline
a249Feminine ordinal sign
o250Masculine ordinal sign
[squf]252Solid
 0-31Control codes
 127DEL
 128-159Undefined control codes
 255Do not use

 

NOTE: The Æ (uppercase AE ligature) and æ (lowercase ae ligature) are expanded for collating purposes to AE or ae and collates as:
ad AE Ae aE ae AF
The ß (sharp s) is expanded for collating purposes to ss and collates according to the German standard as:
sr ss st

Table B-3 “Spanish Language-Dependent Variations” through Table B-6 “Finnish Language-Dependent Variations” show the language-dependent variations to the collating sequence.

Feedback to webmaster