|
RFC 603Title: Add Unicode as a String Constant Submitted by: Curtis Parks Description of ProblemAdopting the UTF-8 encoding of ISO/IEC 10646-1, together with the new Global, would provide full internationalization for IGES while also providing full backward compatibility with file systems, parsers, and other software that rely on US-ASCII values.While XML, for one, specifies 10646 in UTF-8 or UTF-16, it is not clear if the -16 will be used much given the clear advantage of the -8 encoding. An Internet Society RFC presents their acceptance of this encoding: "Character values from 0000 0000 to 0000 007F (US ASCII repertoire) correspond to octets 00 to 7F (7 bit US-ASCII values). A direct consequence is that a plain ASCII string is also a valid UTF-8 string."The above was quoted from ftp://ftp.isi.edu/in-notes/rfc2279.txt Note: The Proposed Solution implements a proposal originated by Ed A. Reid on 4/24/00.
Proposed SolutionAdd to References:[IETF98] F. Yergeau, UTF-8, a Transformation Format of ISO 10646, Internet Engineering Task Force (IETF) RFC 2279, URL ftp://ftp.isi.edu/in-notes/rfc2279.txt, January 1998 (URL valid November 2000). Replace the 2.2 title to read: "Section 2.2 File Formats" Add into 2.2.2.3 String Data Type: Add to Table 1: Add Section 2.2.4.3.27 Character Set Identifier Flag. This "required, default" field specifies the character set used in string data types. The default is 1; which is interpreted as the ASCII character set. A value of 2 specifies the Unicode (ISO/IEC 10646-1) multi-octet character set, encoded using the UTF-8 character encoding scheme [IETF98]. Note that the universal character set (UCS) encoded in UTF-8 (UCS transformation formats) is the 8-bit encoding which also preserves backward compatibility with the full US-ASCII repertoire. Change sentence in figures 2 and 3 to read: Posted for comment 9/20/00 |