Foreign Language Issues Guide

Right to Left Languages

It appears that to actually get “Arabic” style typing you must set the component alignment (locale seems to do NOTHING) to RIGHT_TO_LEFT and type in a font that is used for right to left typing. Just typing in that font doesn’t seem to work though (at least not via the Alt+numeric keyboard method) but does work via cut and paste!

Control+Shift+O toggles orientation in Swing components! But to truly type RTL you still must use characters(Arabic/Hebrew) that have a RTL orientation. In other words the right to left orientation depends on the characters you are typing not the orientation of the component (which will shift left and right justification).

Copy/Cut & Paste between applications is unreliable. Java appears to get the right to left character order correct but MS Word doesn’t.

Collation (sorting)

Sorting rules are language dependent. This is solved in Java by using a Collator not String.equals() or String.compareTo():

Collator collate = Collator.getInstance(locale);

Then use:

if ( collate.compare(string1,string2) > 0 ) ….

A Java collection would be sorted with:

Collections.sort(list,collate);

A collator can be further refined by changing the “strength” from the default of Collator.TERTIARY to another setting.

Java’s Collator also can handle “decomposition”. Decomposition is the difference between á and a‘ – both appear as á, however one is a single character and the other is a character plus a combining mark. Collator strength of IDENTICAL considers these as different characters, all other strengths will not see the combining mark as a separate letter.

Java 6 adds java.text.Normalizer – this allows strings to be normalized so that equivalent visual characters become normalized. An example is: “the Unicode character 'Ç' (LATIN CAPITAL LETTER C WITH CEDILLA) has the Unicode character value U+00C7. The Unicode character sequence U+0043 U+0327 also creates the 'Ç' character. The sequence contains the character values for LATIN CAPITAL LETTER C followed by the COMBINING CEDILLA. The single character and the character sequence are canonically equivalent because they are visually indistinguishable and mean exactly the same for the purposes of text comparison and rendering.”

Chinese characters are sorted by the order in which the strokes are drawn and the total number of strokes. [Page 22]

Latin alphabets – German letters such as “Ä” and “A” are the same letter but have a different sound (that in itself seems strange since the purpose of a letter is essentially to record a sound!), whereas Swedish “Ä” is a completely different letter! [Page 39] So when sorting Swedish ä will come at the end of the alphabet, whereas in German ä will sort to the beginning.

Note: Collator will perform a modern Spanish sort, if you desire traditional Spanish sorting (where “ch” is treated as one letter between c and d and also “ll” is treated as one character between l and m) then you must customize your collator. [Page 183] There are other unique cases like this for other languages such as “combining characters” (German ü is equivalent to ue, etc). [Page 185]

Searching

Searching has the same issues as sorting but Java lacks a built in solution. Description of how to write a custom “indexOf” is described on page 190.

Text Boundaries

Some languages don’t use “spaces” to separate words like English does. To properly parse such text requires usage of the BreakIterator class.

Fonts

The “Bitstream Cyberbit” font (12.5 MB) supports 30,000 glyphs and is available from: ftp://ftp.netscape.com/pub/communicator/extras/fonts/windows


References:

Unless otherwise noted, page references refer to “Java Internationalization” by Andrew Deitsch & David Czarnecki (March 2001).

http://scripts.sil.org/cms/scripts/page.php?site_id=nrsi&item_id=WSI_Guidelines_Sec_7 – provides good information about keyboards and data entry methods.

http://mindprod.com/jgloss/hebrew.html - comments from one person that tested Hebrew in Java.

http://java.sun.com/j2se/1.4.2/docs/api/java/text/Bidi.html - Sun class that tracks right to left and left to right orientation of a string.

http://www.oreilly.com/catalog/javaint/ - book on Java Internationalization

 

http://forum.java.sun.com/thread.jspa?threadID=181875&messageID=2090007

javax.swing.JTextPane
Specifically you will have to use the StyleConstants.Alignment attribute with a value of StyleConstants.ALIGN_JUSTIFY constant for the paragraph.
Here is an article which explains how to use the AttributeSets and Styles
http://java.sun.com/products/jfc/tsc/articles/text/attributes/
The actual code will look something like this SimpleAttributeSet sa = new SimpleAttributeSet();
StyleConstants.setAlignment(sa, StyleConstants.ALIGN_JUSTIFY);
textPane.getStyledDocument().setParagraphAttributes(0,/*document length*/,sa,false);