| ||
|
free Oracle DBA tutorial
Oracle Jobs Ask A Question SQL Statement Tuning Backup and Recovery Concepts Oracle 11g New Features Oracle E Suite & Others Oracle Data Guard Oracle DBA FAQ |
Oracle provides several different types of sort, and can achieve a linguistically correct sort as well as the new multilingual ISO standard (14651), which is designed to handle many languages at the same time.
When characters used in other languages are present, a binary sort generally does not produce reasonable results. For example, an ascending ORDER BY query would return the character strings ABC, ABZ, BCD, ÄBC, in the sequence, when the Ä has a higher numeric value than B in the character encoding scheme. For languages using Chinese characters, a binary sort is not usually linguistically meaningful.
Oracle offers two kinds of linguistic sorts:
Oracle makes two passes when comparing strings in monolingual sorts. The first pass is to compare the major value of entire string from the major table and the second pass is to compare the minor value from the minor table. Each major table entry contains the Unicode codepoint and major value. Usually, letters with the same appearance will have the same major value.
Multilingual Linguistic Sorts
Oracle9i provides multilingual linguistic sorts so that Oracle DBA can sort more than one language as part of one sort. This is useful for certain regions or languages that have complex sorting rules or global multilingual databases. Additionally, Oracle9i still supports all the sort orders defined by previous releases.
For Asian language data or multilingual data, Oracle provides a sorting mechanism based on an ISO standard (ISO14651) and the Unicode 3.0 standard. Multilingual linguistic sorts also work for Asian language sorts ordered by the number of strokes, PinYin, or radicals. In addition, they can handle canonical equivalence and surrogate codepoint pairs. Oracle DBA can define up to 1.1 million codepoints in one sort.
For example, in Oracle9i, a French sort is supported, but the new multilingual linguistic sort for French can also be applied by changing the sort order from French to French_M, where _M represents the new ISO standard (ISO 14651) for multilingual sorting. By doing so, the sorting order will be based on the GENERIC_M sorting order (ISO standard) and will be able to sort at the secondary level from right to left. Oracle Corporation recommends using a multilingual linguistic sort if the tables contain multilingual data. If the tables contain only pure French, for memory usage concern, a French monolingual sort may get better performance. You have to make a trade-off between extensibility and performance.
To use a multilingual linguistic sort, Oracle DBA can specify the default sort order by setting the environment variable NLS_SORT or using the NLSSORT function.
% setenv NLS_SORT='French_M'
or
NLSSORT('','NLS_SORT=French_M')
Oracle DBA can create new linguistic indexes based on multilingual linguistic sorts just as Oracle DBA did for monolingual linguistic sorts. However, if you do not want to change every index hint of your SQL statements, but you do need to use those new multilinguistic sorts. Here are the steps:
DROP INDEX index_table1;
A primary level sort distinguishes between base characters, such as the difference between characters a and b. It is up to individual locales to define if a is before b, b is before a, or they are equal. The binary representation of the characters is completely irrelevant. If a character is an ignorable character, it is assigned a primary level weight (or order) of zero, which means it is ignored at the primary level. Ignorable characters on all other levels are also defined by the use of weight zero. At the primary level, the following words are not distinguished between uppercase and lowercase words and can appear in different orders.
Bat
Secondary Level Sorts
A secondary level sort distinguishes between base characters (the primary level sort), plus it distinguishes the different diacritical marks on a given base character. For example, the character Ä differs from the character A only because it has a diacritical mark. Thus, Ä and A differ on the secondary level but they are the same on the primary level because they are derived from the same base character A.
SELECT words FROM rdictionary; --with a secondary level sort
A tertiary level sort distinguishes between base characters (primary level sort), diacritical marks (secondary level sort), and the different cases of characters. It can also include difference of special characters such as +, -, and *. For example, characters a and A are different on the tertiary level and equal on the primary and secondary levels because they only have a case difference. Another example is that characters ä and A are equal on the primary level and different on secondary and tertiary levels. The final example is that the primary order for the dash character - is 0. That is, it is ignored on the primary and secondary levels. If a dash is compared with another character whose primary level order is nonzero, for example, the character u, then no result for the primary level is available because u is not compared with anything. In this case, Oracle finds a difference between - and u, but only at the tertiary level. For example:
SELECT words FROM rdictionary; --with a tertiary level sort
When sorting, some characters can or should be ignored. For example, a dash in multi-lingual could be treated the same for sorting purposes as multilingual. These characters are called ignorable characters. There are two kinds of ignorable characters: accents and punctuation.
Examples of ignorable accent characters:
kaa -- kaa and ka-- are the same
Canonical Equivalence
One Unicode code point may be equivalent to a sequence of base character code points plus combining characters (accent marks) code points, regardless of locales in use. This is called the Unicode canonical equivalence. For example, ä equals its base letter a and a combining character diaeresis. A linguistic flag, Canonical_equivalence=TRUE, which you can set in the definition file, indicates that all canonical equivalence rules defined in Unicode 3.0 need to be applied. You can change this flag to FALSE to speed up the comparison and ordering functions if all the data is in its composed form.
Surrogate Characters
You can extend UTF-16 and UTF-8 to encode more than 1 million characters. These extended characters are called surrogate pairs. Multilingual linguistic sorts can support up to one million surrogate pairs. However, surrogate characters cannot be defined as contracting characters, expanding characters, or context-sensitive characters.
Reverse Secondary Sorting
In French, the sorting of strings with accented characters compares base characters from left to right, but compares accented characters from right to left. For example, by default, an accented character is placed after its unaccented variant. Then the two strings Èdit and Edít are in proper French order. They are equal on the primary level, and the secondary order is determined by examining accented characters from right to left. Individual locales can request that the accented characters and related diacritical marks be sorted with the right-to-left rule. This is specified in a locale specification file by using a linguistic flag.
Character Rearrangement for Thai/Lao Characters
In Thai and Lao, some characters must first be swapped with their following character before sorting. Normally, these types of character are symbols representing vowel sounds, and the next character will be a consonant. Consonants and vowels must be swapped before sorting. A linguistic flag is used to enable you to specify all the characters to be swapped.
Base Letters
Base letters are defined in a base letter table, which maps each letter to its base letter. For example, a, A, ä, and Ä all map to a, which is the base letter. This concept is particularly relevant for working with Oracle9i Text.
Special Letters
Special letters is the term used in monolingual sorts. They are called expanding characters in multilingual sorts.
Special Combination Letters
Special combination letters is the term used in monolingual sorts. They are called contracting letters in multilingual sorts.
Special Uppercase Letters
One lowercase letter may map to multiple uppercase letters. For example, in traditional German, the uppercase letters for ß are SS.
Special cases like these are also handled when converting uppercase characters to lowercase, and vice versa. Such case-conversion issues are handled by the NLS_UPPER, NLS_LOWER, and NLS_INITCAP functions, according to the conventions established by the linguistic sort sequence. (The standard functions UPPER, LOWER, and INITCAP cannot handle these special cases.)
Special Lowercase Letters
Oracle supports special lowercase letters, so one letter may map to multiple base letters. An example is the Turkish uppercase I becoming a small, dotless i.
You can create a function-based index that uses languages other than English. The index does not change the linguistic sort order determined by NLS_SORT. The index simply improves the performance. An example is:
CREATE INDEX nls_index ON my_table (NLSSORT(name, 'NLS_SORT = German'));
So
SELECT * FROM my_table WHERE NLSSORT(name) IS NOT NULL
returns the result much faster than without an index.
If you want to use a single linguistic index or multiple linguistic indexes, some requirements must be met for the linguistic index. The first requirement is that the QUERY_REWRITE_ENABLED initialization parameter must be set to TRUE. This is not a specific requirement for linguistic indexes, but for all function-based indexes. Here is an example of setting QUERY_REWRITE_ENABLED:
ALTER SESSION SET QUERY_REWRITE_ENABLED=TRUE;
The second requirement, which is specific to linguistic indexes, is that NLS_COMP needs to be set to ANSI. There are various ways to set NLS_COMP. For example:
ALTER SESSION SET NLS_COMP = ANSI;
The third requirement is that NLS_SORT needs to indicate the linguistic definition you want to use for the linguistic sort. If you want a French linguistic sort order, NLS_SORT needs to be set to FRENCH. If you want a German linguistic sort order, NLS_SORT needs to beset to GERMAN.
There are various ways to set NLS_SORT. Although the following example uses an ALTER SESSION statement, it is probably better for you to set NLS_SORT as a client environment variable so that you can use the same SQL statements for all languages and different linguistic indexes can be used, based on NLS_SORT being set in the client environment. The following is an example of setting NLS_SORT:
ALTER SESSION SET NLS_SORT='FRENCH';
The fourth requirement is that you need to use the cost-based optimizer with the optimizer mode set to FIRST_ROWS, because linguistic indexes are not recognized by the rule-based optimizer. The following is an example of setting the optimizer mode:
ALTER SESSION SET OPTIMIZER_MODE = FIRST_ROWS;
The last thing is that you need to set WHERE NLSSORT(column_name) to IS NOT NULL when you want to use ORDER BY column_name, where column_name is the column with the linguistic index. This is necessary only when you use an ORDER BY clause.
The following example shows how to set up a French linguistic index. For NLS_SORT, you may want to set it in the client environment variable instead of with the ALTER SESSION statement.
ALTER SESSION SET QUERY_REWRITE_ENABLED=TRUE;
Case-Insensitive Searching
You can create a function-based index that improves the performance of case-insensitive searches. For example:
CREATE INDEX case_insensitive_ind ON my_table(NLS_UPPER(empname));
Customizing Linguistic Sorts
You can customize sorting with the Locale Builder Utility.
Previous Chapter | Next Chapter
More Tutorials on Oracle dba ... Want to share or request Oracle Tutorial articles to become a Oracle DBA. Direct your requests to webmaster@oracleonline.info |
|