首页 oracle-判断utf8



oracle-判断utf8Subject: Determining if the data is valid UTF-8   Doc ID: Note:162608.1 Type: PROBLEM   Last Revision Date: 18-NOV-2004 Status: REVIEWED Problem Description ------------------- JDBC programs often get exceptions when converting character data from the database...

Subject: Determining if the data is valid UTF-8   Doc ID: Note:162608.1 Type: PROBLEM   Last Revision Date: 18-NOV-2004 Status: REVIEWED Problem Description ------------------- JDBC programs often get exceptions when converting character data from the database even though it can be viewed in SQLPlus. i.e. java.sql.SQLException: Fail to convert between UTF8 and UCS2: failUTF8Conv Solution Description -------------------- Verify that the data is comprised of valid UTF-8 characters. SQL> SELECT dump(utf8_column) FROM utf8_table; DUMP(UTF8_COLUMN) -------------------------------------------------------------------------------- Typ=1 Len=17: 80,228,101,112,112,101,114,32,69,110,101,114,103,105,32,65,66 In this example, the second byte is 228 (an 'umlaute a' in WE8859ISOP1.) If this is to be in the an UTF-8 character, the following two bytes must be greater then 127. (see rules below) In this case, while the data can be viewed in SQLPlus, the data is not valid for UTF-8 or a conversion to UCS2. The data must be scrubbed to work in JAVA. This is not a failure in JAVA. It is the forgiving nature of Oracle that makes it appear OK in SQLPlus. As the data stored is not UFT-8 data You can also use the "dump" command below as it will only display valid characters of the database: SQL> SELECT dump(UTF8_COLUMN, 1017 ) FROM utf8; DUMP(UTF8_COLUMN,1017) ---------------------------------------------------------------------- Typ=1 Len=17 CharacterSet=UTF8: P,e4,e,p,p,e,r, ,E,n,e,r,g,i, ,A,B If this reports something other then UTF8 or US7ASCII then JAVA may have issues in converting. Explanation ----------- The data in the database is invalid. The reason is usually because an OCI program (jdbc oci8, OCI, Precompiler, SQL*Loader ) loaded the data with an improper character set. It displays properly in SQLPlus because the 228 is a valid character in the host character set. But it fails in Java because all Java is in UCS2. The conversion follows rules to convert UTF-8 to UCS2. Oracle will apply the rules, and try to convert the characters. If the conversion fails, the data is passed as is (Garbage In! Garbage Out!) The rules are as follows: When the first byte of the multi-byte character is: Decimal Bin Total number of bytes Subsequent bytes <128 0xxxxxxx 1 N/A >=192 110xxxxx 2 10xxxxxx >=224 1110xxxx 3 10xxxxxx >=240 11110xxx 4 10xxxxxx >=248 111110xx 5 10xxxxxx >=252 1111110x 6 10xxxxxx Note that all subsequent bytes, bytes 2 thru 6 as needed, will be greater than 127 and less than 192. This is done to prevent collisions in decoding strings.
本文档为【oracle-判断utf8】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。
下载需要: 免费 已有0 人下载