International Components for Unicode
International Components For Unicode (ICU) is an open source project of mature C/C++ and Java libraries for Unicode support, software internationalization and software globalization. ICU is widely portable to many operating systems and environments. It gives applications the same results on all platforms and between C/C++ and Java software. The ICU project is an open source development project that is sponsored, supported and used by IBM and many other companies.
Some of the services that it provides are the following.
- Text: Unicode text handling, full character properties and character set conversions
- Analysis: Unicode regular expressions; full Unicode sets; character, word and line boundaries
- Comparison: Language sensitive collation and searching
- Transformations: normalization, upper/lowercase, script transliterations
- Locales: Comprehensive locale data and resource bundle architecture
- Complex Text Layout: Arabic, Hebrew, Indic and Thai
- Time: Multi-calendar and time zone
- Formatting and Parsing: dates, times, numbers, currencies, messages and rule based
Origin and Development
The initial work for ICU came from a company called Taligent, which later became a part of IBM. This software project grew into the JDK 1.1 internationalization APIs, which was contributed to Sun Microsystems by the ICU team. A large portion of this still exists as the java.text.* package. ICU was released as an open source project in 1999 under the name "IBM Classes for Unicode". It was later renamed to "International Components For Unicode".
Originally, ICU was written completely in Java. Its functionality was later rewritten and extended in C and C++ to fill in the internationalization short comings that are inherent in those languages. Usually an operating system provides this functionality, but support for such internationalization APIs and functionality is usually inconsistent or lacking between various operating systems.
The Java version exists today as ICU4J, and the C/C++ version exists today as ICU4C. The project and both main sub-projects continue to be developed for the most advanced Unicode/i18n support.