ITPUB??ì3
ITPUB论坛 » WEB程序开发 » perl(windows2003) Couldn't open encmap gb2312.enc


标题: perl(windows2003) Couldn't open encmap gb2312.enc
离线 liyihongcug
高级会员



精华贴数 1
个人空间 0
技术积分 9573 (122)
社区积分 1043 (892)
注册日期 2004-7-15
论坛徽章:8
会员2007贡献徽章铁扇公主授权会员2008年新春纪念徽章开发板块每日发贴之星数据库板块每日发贴之星
开发板块每日发贴之星ITPUB新首页上线纪念徽章    

发表于 2008-1-22 15:11 
perl(windows2003) Couldn't open encmap gb2312.enc

使用perl语言 调用xml
发生错误

Couldn't open encmap gb2312.enc:
No such file or directory
at C:/Perl/lib/XML/Parser.pm line 187


只看该作者    顶部
离线 liyihongcug
高级会员



精华贴数 1
个人空间 0
技术积分 9573 (122)
社区积分 1043 (892)
注册日期 2004-7-15
论坛徽章:8
会员2007贡献徽章铁扇公主授权会员2008年新春纪念徽章开发板块每日发贴之星数据库板块每日发贴之星
开发板块每日发贴之星ITPUB新首页上线纪念徽章    

发表于 2008-1-22 15:13 
PERL的xml:arser 有支持gbk或gb2312 charset 的吗?
缺省只有big5。

读包含中文的xml文件,我这样写
     
<?xml version="1.0" encoding="big5"?>;

<ETLConfig>;
    <database S="test" U="测试" P="测试"/>;
    <database S="aaa" U="aaa" P="aaa"/>;
</ETLConfig>;

输出的是乱码。
use XML::Simple;
use Data:umper;

my $config = XMLin('c:/server.xml');          # load the file

print Dumper($config);
use bytes;

print  $config->;{database}[0]->;{U};                             
no bytes;



您对本贴的看法:鲜花[0] 臭蛋[0]
做DBA,拿高薪,从CUUG开始 | 尚观最新ULP学员薪资统计报告! | 学Solaris 10 红宝书教程 得iPod大奖

apile   

精灵使




   
UID:10509
注册:2002-6-13
最后登录: 2008-01-19
帖子:3858
精华:8

可用积分:1326
信誉积分:100
专家积分:0 (本版)

状态:...离线...

[资料] [站内短信] [Blog]


  2楼 发表于 2003-6-18 08:41   
xml文件的中文问题


應該跟encode沒關係..我沒用過XML::Simple
但是XML::Simple是外國人寫的...所以不可能
default用big5的...
Default應該是utf8...或其它的...
試試看把下面這行改成
<?xml version="1.0" encoding="gb2312"?>;
看看...
Perl有本Perl and XML的書..我覺得字太小一直沒看...
看了兩頁就放棄了...
去找找有沒有電子書翻翻吧..



您对本贴的看法:鲜花[0] 臭蛋[0]
做DBA,拿高薪,从CUUG开始 | 尚观最新ULP学员薪资统计报告! | 学Solaris 10 红宝书教程 得iPod大奖

lgjut

圣骑士




UID:32071
注册:2002-12-3
最后登录: 2007-04-07
帖子:145
精华:0

可用积分:30
信誉积分:100
专家积分:0 (本版)

状态:...离线...

[资料] [站内短信] [Blog]


  3楼 发表于 2003-6-18 15:20   
xml文件的中文问题


沿着前人的路走了一遍,得出的结果和他的一样。难道这条路没人走通过吗?

The XML:arser installed from CPAN does not come with a
GB2312 encoding support. However, I was not able to add
the support as instructed by the XML::Encoding package.

To add this support, I did the following:

1. Download GB2312.TXT from ftp.unicode.org
2. Download the XML::Encoding 1.01 and get two binaries:
   make_encmap and compile_encoding
3. run make_encmap as follows:
   make_encmap GB2312 GB2312.TXT >; GB2312.encmap
4. Add expat='yes' to the first line of GB2312.encmap
5. run compile_encoding:
   compile_encoding -o GB2312.enc GB2312.encmap
6. copy GB2312.enc to
   /usr/lib/perl5/site_perl/5.005/i386-linux/XML/Parser/Encodings

Then I made the following perl script:
---------------
#!/usr/bin/perl
use XML:arser;

my $xmlfile = $ARGV[0];
my $parser = new XML:arser();
my $doc = $parser->;parsefile ("$xmlfile";
---------------

I run this script with a well-formed xml file having a head line
as: <?xml version="1.0" encoding="GB2312"?>;

Following error occurs:

unknown encoding at line 1, column 30, byte 30 at /usr/lib/perl5/site_perl/5.005/i386-linux/XML/Parser.pm line 185

Changing the encoding to other supported ones seem to work without error.
I'm wondering if there is something I'm missing in the process.

Thanks for any suggestions!



您对本贴的看法:鲜花[0] 臭蛋[0]
做DBA,拿高薪,从CUUG开始 | 尚观最新ULP学员薪资统计报告! | 学Solaris 10 红宝书教程 得iPod大奖

apile   

精灵使




   
UID:10509
注册:2002-6-13
最后登录: 2008-01-19
帖子:3858
精华:8

可用积分:1326
信誉积分:100
专家积分:0 (本版)

状态:...离线...

[资料] [站内短信] [Blog]


  4楼 发表于 2003-6-19 08:45   
xml文件的中文问题


會不會是大小寫的問題...改成
gb2312.enc
我剛剛檢查過
/usr/local/lib/perl5/site_perl/5.6.1/aix/XML/Parser/Encodings
這個目錄下:
-r--r--r--   1 root     system      4821 Feb 14 2000  Japanese_Encodings.msg
-r--r--r--   1 root     system      1946 Feb 14 2000  README
-r--r--r--   1 root     system     40706 May 10 2000  big5.enc
-r--r--r--   1 root     system     45802 May 10 2000  euc-kr.enc
-r--r--r--   1 root     system      1072 May 10 2000  iso-8859-2.enc
-r--r--r--   1 root     system      1072 May 10 2000  iso-8859-3.enc
-r--r--r--   1 root     system      1072 May 10 2000  iso-8859-4.enc
-r--r--r--   1 root     system      1072 May 10 2000  iso-8859-5.enc
-r--r--r--   1 root     system      1072 May 10 2000  iso-8859-7.enc
-r--r--r--   1 root     system      1072 May 10 2000  iso-8859-8.enc
-r--r--r--   1 root     system      1072 May 10 2000  iso-8859-9.enc
-r--r--r--   1 root     system      1072 May 10 2000  windows-1250.enc
-r--r--r--   1 root     system     37890 May 10 2000  x-euc-jp-jisx0221.enc
-r--r--r--   1 root     system     37890 May 10 2000  x-euc-jp-unicode.enc
-r--r--r--   1 root     system     20368 May 10 2000  x-sjis-cp932.enc
-r--r--r--   1 root     system     18202 May 10 2000  x-sjis-jdk117.enc
-r--r--r--   1 root     system     18202 May 10 2000  x-sjis-jisx0221.enc
-r--r--r--   1 root     system     18202 May 10 2000  x-sjis-unicode.enc

都是小寫...
另外..你可以在.pl前頭加上...
use Carp();
local $SIG{__WARN__} = \&Carp::cluck;

Trace一下..看到底是哪裡一行出問題..
185行是parse,他後面還有call其他function..
不然..實在看不出來哪兒出錯了...



您对本贴的看法:鲜花[0] 臭蛋[0]
做DBA,拿高薪,从CUUG开始 | 尚观最新ULP学员薪资统计报告! | 学Solaris 10 红宝书教程 得iPod大奖

lgjut

圣骑士




UID:32071
注册:2002-12-3
最后登录: 2007-04-07
帖子:145
精华:0

可用积分:30
信誉积分:100
专家积分:0 (本版)

状态:...离线...

[资料] [站内短信] [Blog]


  5楼 发表于 2003-6-19 18:00   
xml文件的中文问题


大写改成小写了,但还是同样的错。
gb2312.TXT见http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/GB/
能不能帮忙看一下。



您对本贴的看法:鲜花[0] 臭蛋[0]
做DBA,拿高薪,从CUUG开始 | 尚观最新ULP学员薪资统计报告! | 学Solaris 10 红宝书教程 得iPod大奖

apile   

精灵使




   
UID:10509
注册:2002-6-13
最后登录: 2008-01-19
帖子:3858
精华:8

可用积分:1326
信誉积分:100
专家积分:0 (本版)

状态:...离线...

[资料] [站内短信] [Blog]


  6楼 发表于 2003-6-19 19:12   
xml文件的中文问题


http://aspn.activestate.com/ASPN/Mail/Message/perl-xml/937990
http://lists.xml.org/archives/xml-dev/200208/msg01661.html
http://www.xml.com/pub/a/1999/09/expat/index.html

看看吧...
看起來是因為expat不支援gb2312..
所以你可能要先轉碼成utf8..
或者用他上面所提的另一個Parser
或者..改寫XML:arser..讓他支援gb2312...



您对本贴的看法:鲜花[0] 臭蛋[0]
做DBA,拿高薪,从CUUG开始 | 尚观最新ULP学员薪资统计报告! | 学Solaris 10 红宝书教程 得iPod大奖

lgjut

圣骑士




UID:32071
注册:2002-12-3
最后登录: 2007-04-07
帖子:145
精华:0

可用积分:30
信誉积分:100
专家积分:0 (本版)

状态:...离线...

[资料] [站内短信] [Blog]


  7楼 发表于 2003-6-20 22:26   
xml文件的中文问题


expat的最新版本有很大的变化,不知道它怎样被perl调用的.

www.sourceforge.net/projects/expat

Expat is a library, written in C, for parsing XML documents. It's the underlying XML parser for the open source Mozilla project, Perl's XML:arser, Python's xml.parsers.expat, and other open-source XML parsers.



您对本贴的看法:鲜花[0] 臭蛋[0]
做DBA,拿高薪,从CUUG开始 | 尚观最新ULP学员薪资统计报告! | 学Solaris 10 红宝书教程 得iPod大奖

apile   

精灵使




   
UID:10509
注册:2002-6-13
最后登录: 2008-01-19
帖子:3858
精华:8

可用积分:1326
信誉积分:100
专家积分:0 (本版)

状态:...离线...

[资料] [站内短信] [Blog]


  8楼 发表于 2003-6-22 11:37   
xml文件的中文问题


我試著去Trace XML:arser這個module...
不過看到蠻多看不懂的東西..
他裡面有用到
XML:arser::Expat這個module..
在XML:arser::Expat這個module中有個
load_encoding的function,是當Expat module
找不到相關 encoding時會自動呼叫,
我試著依照document的方式自己Load
gb2312.enc,[其中LoadEncoding的結果,可以得到正確的
GB2312,但是在相對應的%Encoding_Table中卻無法得到
這個gb2312.enc的編碼,因此才無法支援gb2312的編碼

[Copy to clipboard] [ - ]CODE:
local(*ENC);
open(ENC, $file) or croak("Couldn't open encmap $file:\n$!\n");
binmode(ENC);
my $data;
my $br = sysread(ENC, $data, -s $file);
croak("Trouble reading $file:\n$!\n")
   unless defined($br);
close(ENC);

my $name = LoadEncoding($data, $br);
#print "$name\n";
croak("$file isn't an encmap file")
   unless defined($name);
現在問題縮小到LoadEncoding($data,$br);
這行,可以正確的讀出GB2312編碼,但是卻無法存入
%Encoding_Table中,
這個LoadEncoding function 我還沒找到在哪裡...
有興趣的人可以follow下去...



您对本贴的看法:鲜花[0] 臭蛋[0]
做DBA,拿高薪,从CUUG开始 | 尚观最新ULP学员薪资统计报告! | 学Solaris 10 红宝书教程 得iPod大奖

lgjut

圣骑士




UID:32071
注册:2002-12-3
最后登录: 2007-04-07
帖子:145
精华:0

可用积分:30
信誉积分:100
专家积分:0 (本版)

状态:...离线...

[资料] [站内短信] [Blog]


  9楼 发表于 2003-6-22 11:59   
xml文件的中文问题


LoadEncoding函数是C代码。
在XML-Parser-2.31的Expat.c中。



您对本贴的看法:鲜花[0] 臭蛋[0]
做DBA,拿高薪,从CUUG开始 | 尚观最新ULP学员薪资统计报告! | 学Solaris 10 红宝书教程 得iPod大奖

apile   

精灵使




   
UID:10509
注册:2002-6-13
最后登录: 2008-01-19
帖子:3858
精华:8

可用积分:1326
信誉积分:100
专家积分:0 (本版)

状态:...离线...

[资料] [站内短信] [Blog]


  10楼 发表于 2003-6-23 08:41   
xml文件的中文问题


應該是Expat/Expat.xs裡面...XML_LoadEncoding function...
我C的功力還沒那麼好...所以可能得找C的高手幫忙看看..
因為他會產生一個Expat.so的share object在auto/XML/Parser/Expat目錄下,
可以直接蓋掉上面目錄中的那個...或者重新安裝XML:arser...


只看该作者    顶部
离线 liyihongcug
高级会员



精华贴数 1
个人空间 0
技术积分 9573 (122)
社区积分 1043 (892)
注册日期 2004-7-15
论坛徽章:8
会员2007贡献徽章铁扇公主授权会员2008年新春纪念徽章开发板块每日发贴之星数据库板块每日发贴之星
开发板块每日发贴之星ITPUB新首页上线纪念徽章    

发表于 2008-1-22 15:30 
not well-formed (invalid token) at line 2, column 8, byte 55 at C:/Perl/lib/XML/
Parser.pm line 187

???


只看该作者    顶部
离线 liyihongcug
高级会员



精华贴数 1
个人空间 0
技术积分 9573 (122)
社区积分 1043 (892)
注册日期 2004-7-15
论坛徽章:8
会员2007贡献徽章铁扇公主授权会员2008年新春纪念徽章开发板块每日发贴之星数据库板块每日发贴之星
开发板块每日发贴之星ITPUB新首页上线纪念徽章    

发表于 2008-1-22 15:31 
svn: Malformed XML: not well-formed (invalid token) at line 2
Your Problem
You are using the revision control subversion. You are calling a svn command such as svn cleanup . or svn st and get the following error message:
> svn cleanup .
svn: Malformed XML: not well-formed (invalid token) at line 2

This problem occurred on my system after changing from SuSE 9.0 to SuSE 9.1.
The Reason
The reason is a corrupted XML file in the .svn subdirectory of your current directory or one of your subdirectories. The name of the file is log. It seems to contain some kind of log or journal information. The error is that the first part of the file is simply missing resulting in a malformed XML file.
The Solution
First find the right subdirectory: Do a svn cleanup . in various subdirectories. Find the deepest subdirectory where the upper error occurs.
Then change into the (hidden) subdirectoy .svn and open the file named log:
> cd .svn
> ls
dir-prop-base  entries  log        README.txt  wcprops
dir-props      format   prop-base  text-base
empty-file     lock     props      tmp

Now check with head the beginning of that file. A damaged file begins like this:
> head log
ame="Kapitel-06.tex"
   text-time="2004-04-11T16:41:44.000000Z"
   committed-date="2004-04-11T21:36:46.947546Z"
   checksum="982a672d96ab994030e864a9ce38befb"
   last-author="mk"
   kind="file"/>
<entry
   committed-rev="5"
   name="Kapitel-07.tex"
   text-time="2004-04-12T15:31:10.000000Z"

A good file should start with a valid XML tag, which starts with an opening bracket < like this:
> head log
<modify-entry
   committed-rev="6"
   name=""/>
<modify-entry
   name=""
   committed-date="2004-04-12T15:45:32.289974Z"/>
<modify-entry
   name=""
   last-author="mk"/>
<modify-entry

If your file looks damaged, rename or delete it:
> mv log log.damaged

Now change back to the directory where you error occurred and everything should work again.


只看该作者    顶部
离线 liyihongcug
高级会员



精华贴数 1
个人空间 0
技术积分 9573 (122)
社区积分 1043 (892)
注册日期 2004-7-15
论坛徽章:8
会员2007贡献徽章铁扇公主授权会员2008年新春纪念徽章开发板块每日发贴之星数据库板块每日发贴之星
开发板块每日发贴之星ITPUB新首页上线纪念徽章    

发表于 2008-1-22 15:41 
Object Mixcould XML::Simple handling chinese character?
This is a discussion on could XML::Simple handling chinese character? within the Perl forums, part of the Programming Languages category; hi everyone: I found XML::Simple can not handling chinese character. for example: part1.xml: <?xml version="1....
   Object Mix > Programming Languages > Perl  
could XML::Simple handling chinese character?  
User Name  Remember Me?
Password   
     

Home Register FAQ Calendar  


Go to Page...
   


   LinkBack   Thread Tools   Display Modes   

  #1    06-16-2007, 11:55 PM  
havel.zhang     

could XML::Simple handling chinese character?

--------------------------------------------------------------------------------
hi everyone:

I found XML::Simple can not handling chinese character. for example:
part1.xml:
<?xml version="1.0" encoding="utf-8"?>
<config>
<user>和平</user>
<passwd>longNails</passwd>
<books>
<book author="Steinbeck" title="Cannery Row"/>
<book author="Faulkner" title="Soldier's Pay"/>
<book author="Steinbeck" title="East of Eden"/>
</books>
</config>

----------------------------------------
my program:

#!/usr/bin/perl -w
use strict;
use XML::Simple;
use Data:umper;
print Dumper (XML::Simple->new()->XMLin('part1.xml',ForceArray =>
1,KeepRoot => 1));
----------------------------------------
then the result is:
>not well-formed (invalid token) at line 2, column 8, byte 17 at C:/Perl/site/lib/XML/Parser.pm line 187

so it's just because of chinese character.

anyone can help me? thank you

havel


  

  #2    06-17-2007, 12:35 AM  
asimsuter--at--hotmail.com     

Re: could XML::Simple handling chinese character?

--------------------------------------------------------------------------------

On Jun 16, 8:55 pm, "havel.zhang" <havel.zh...@> wrote:
> hi everyone:
>
> I found XML::Simple can not handling chinese character. for example:
> part1.xml:
> <?xml version="1.0" encoding="utf-8"?>
> <config>
> <user>和平</user>
> <passwd>longNails</passwd>
> <books>
> <book author="Steinbeck" title="Cannery Row"/>
> <book author="Faulkner" title="Soldier's Pay"/>
> <book author="Steinbeck" title="East of Eden"/>
> </books>
> </config>
>
> ----------------------------------------
> my program:
>
> #!/usr/bin/perl -w
> use strict;
> use XML::Simple;
> use Data:umper;
> print Dumper (XML::Simple->new()->XMLin('part1.xml',ForceArray =>
> 1,KeepRoot => 1));
> ----------------------------------------
> then the result is:
>
> >not well-formed (invalid token) at line 2, column 8, byte 17 at C:/Perl/site/lib/XML/Parser.pm line 187
>
> so it's just because of chinese character.
>
> anyone can help me? thank you
>
> havel


Try XML::Parser

from http://search.cpan.org/~msergeant/XML-Parser/Parser.pm

================================================== =========================

XML documents may be encoded in character sets other than Unicode as
long as they may be mapped into the Unicode character set. Expat has
further restrictions on encodings. Read the xmlparse.h header file in
the expat distribution to see details on these restrictions.

Expat has built-in encodings for: UTF-8, ISO-8859-1, UTF-16, and US-
ASCII. Encodings are set either through the XML declaration encoding
attribute or through the ProtocolEncoding option to XML::Parser or
XML::Parser::Expat.

For encodings other than the built-ins, expat calls the function
load_encoding in the Expat package with the encoding name. This
function looks for a file in the path list
@XML::Parser::Expat::Encoding_Path, that matches the lower-cased name
with a '.enc' extension. The first one it finds, it loads.

If you wish to build your own encoding maps, check out the
XML::Encoding module from CPAN.
AUTHORS

================================================== =========================

Regards.

Asim Suter
asimsuter--at--hotmail.com



  

  #3    06-17-2007, 01:02 AM  
mirod     

Re: could XML::Simple handling chinese character?

--------------------------------------------------------------------------------

havel.zhang wrote:
> hi everyone:
>
> I found XML::Simple can not handling chinese character. for example:
> part1.xml:
> <?xml version="1.0" encoding="utf-8"?>
> <config>
> <user>和平</user>
> </config>

> #!/usr/bin/perl -w
> use strict;
> use XML::Simple;
> use Data:umper;
> print Dumper (XML::Simple->new()->XMLin('part1.xml',ForceArray =>
> 1,KeepRoot => 1));
> ----------------------------------------
> then the result is:
>> not well-formed (invalid token) at line 2, column 8, byte 17 at C:/Perl/site/lib/XML/Parser.pm line 187
>
> so it's just because of chinese character.

Actually the example works perfectly on my machine. There must be
something either in the format of your file (but I copied it as is, so I
can't see what could cause a problem there) or something in your
environment. What versions of perl, XML:::Simple, but also the parser
(XML::Parser in your case, but if you installed XML::LibXML it would be
used instead) are you using?

--
mirod

  

  #4    06-17-2007, 02:11 AM  
havel.zhang     

Re: could XML::Simple handling chinese character?

--------------------------------------------------------------------------------

On 6月17日, 下午1时01分, mirod <m...@xmltwig.com> wrote:
> havel.zhang wrote:
> > hi everyone:
>
> > I found XML::Simple can not handling chinese character. for example:
> > part1.xml:
> > <?xml version="1.0" encoding="utf-8"?>
> > <config>
> > <user>和平</user>
> > </config>
> > #!/usr/bin/perl -w
> > use strict;
> > use XML::Simple;
> > use Data:umper;
> > print Dumper (XML::Simple->new()->XMLin('part1.xml',ForceArray =>
> > 1,KeepRoot => 1));
> > ----------------------------------------
> > then the result is:
> >> not well-formed (invalid token) at line 2, column 8, byte 17 at C:/Perl/site/lib/XML/Parser.pm line 187
>
> > so it's just because of chinese character.
>
> Actually the example works perfectly on my machine. There must be
> something either in the format of your file (but I copied it as is, so I
> can't see what could cause a problem there) or something in your
> environment. What versions of perl, XML:::Simple, but also the parser
> (XML::Parser in your case, but if you installed XML::LibXML it would be
> used instead) are you using?
>
> --
> mirod- 隐藏被引用文字 -
>
> - 显示引用的文字 -

hi mirod:
when i changed chinese character with english word, it works fine.
my versions of perl is 5.8.8 .

havel


  

  #5    06-17-2007, 04:10 AM  
Mumia W.     

Re: could XML::Simple handling chinese character?

--------------------------------------------------------------------------------

On 06/17/2007 01:10 AM, havel.zhang wrote:
>
> hi mirod:
> when i changed chinese character with english word, it works fine.
> my versions of perl is 5.8.8 .
>
> havel
>

I also ran your program without problems on Perl 5.8.4 / Linux. You
should enable a utf8 locale on your computer and tell Perl to use that
encoding when reading from the file.

When I tested your program, I first saved part1.xml to a file in utf8
format; then I copied your script to a file in utf8 format. I also added
the "encoding" pragma to tell Perl that the script was written in utf8.
And my locale is currently set to utf8.

So there's no way for Perl to be unprepared to deal with utf8 encoded
data on my system right now, and Chinese characters should be stored in
either utf8 or gb2312 files.

I suspect your problem is encoding confusion. Either you don't have a
suitable locale installed (e.g. utf8), or you stored the file in one
encoding (e.g. gb2312), but you're trying to read it in another encoding
(utf8 ?).


  

  #6    06-17-2007, 07:32 AM  
Adrian Ulrich     

Re: could XML::Simple handling chinese character?

--------------------------------------------------------------------------------


You told XML::Something that this XML-File will be utf8 encoded
> <?xml version="1.0" encoding="utf-8"?>
> <user>鍜屽钩</user>

...so is '鍜屽钩' UTF-8 encoded? I'd recommend a real unicode editor like
yudit (http://www.yudit.org) to edit/create utf8 files.

> when i changed chinese character with english word, it works fine.

UTF-8 is a superset of ASCII. A normal ASCII string will always be valid UTF-8.

Regards,
Adrian


  

  #7    06-17-2007, 01:03 PM  
Peter J. Holzer     

Re: could XML::Simple handling chinese character?

--------------------------------------------------------------------------------

On 2007-06-17 08:09, Mumia W. <paduille.4061.mumia.w+nospam--at--earthlink.net> wrote:
> On 06/17/2007 01:10 AM, havel.zhang wrote:
>> hi mirod:
>> when i changed chinese character with english word, it works fine.
>> my versions of perl is 5.8.8 .
>
> I also ran your program without problems on Perl 5.8.4 / Linux. You
> should enable a utf8 locale on your computer and tell Perl to use that
> encoding when reading from the file.

No, you should not (well, using a utf8 locale may be a good idea anyway,
but it doesn't have anything to do with his problem). Telling perl to
use a specific encoding when reading XML files is at best ineffectual,
or it may cause problems.


> When I tested your program, I first saved part1.xml to a file in utf8
> format;

Thus is obviously necessary as the XML file starts with

<?xml version="1.0" encoding="utf-8"?>


> then I copied your script to a file in utf8 format.

The script doesn't contain any non-ASCII characters so there is no
difference between ASCII format, Latin-1 format, UTF-8 format, etc.


> I also added the "encoding" pragma to tell Perl that the script was
> written in utf8.

The script is pure ASCII. Of course that means it's UTF-8, too, but it's
also a dozen other charsets which are supersets of ASCII.


> And my locale is currently set to utf8.

Irrelevant. XML files contain their own encoding. They *must* *not* be
read differently depending on the locale. If the XML declaration
contains encoding="utf-8", the file must be parsed as UTF-8, regardless
of the charset of the current locale. Since you can't know the encoding
of an XML file before parsing it, it is the responsibility of the XML
parser to determine the encoding.


> So there's no way for Perl to be unprepared to deal with utf8 encoded
> data on my system right now,

Nothing you described above "prepared your system to deal with utf8
encoded" XML files.

> and Chinese characters should be stored in either utf8 or gb2312
> files.

Or GB18030 or EUC-CN or whatever contains the necessary characters. It
is only necessary that the XML declaration matches the contents of the
file.

> I suspect your problem is encoding confusion. Either you don't have a
> suitable locale installed (e.g. utf8),

I don't think you can install perl 5.8.8 without support for UTF-8,
regardless of any system-specific locales.

> or you stored the file in one encoding (e.g. gb2312), but you're
> trying to read it in another encoding (utf8 ?).

The parser must read it in UTF-8 encoding since that's what the file
says it is. Your suspicion that the file really is in some other
encoding seems likely (especially since Havel posted in gb2312).
It's also possible that the parser used by XML::Simple is broken, but
judging from the error message it is XML::Parser which in turn uses
expat, so I think that's unlikely.

hp

--
_ | Peter J. Holzer | I know I'd be respectful of a pirate
|_|_) | Sysadmin WSR | with an emu on his shoulder.
| | | hjp--at--hjp.at |
__/ | http://www.hjp.at/ | -- Sam in "Freefall"

  

  #8    06-17-2007, 05:01 PM  
xhoster--at--gmail.com     

Re: could XML::Simple handling chinese character?

--------------------------------------------------------------------------------

"havel.zhang" <havel.zhang@> wrote:
> hi everyone:
>
> I found XML::Simple can not handling chinese character. for example:
> part1.xml:
> <?xml version=3D"1.0" encoding=3D"utf-8"?>
> <config>
> <user>=BA=CD=C6=BD</user>
> <passwd>longNails</passwd>
> <books>
> <book author=3D"Steinbeck" title=3D"Cannery Row"/>
> <book author=3D"Faulkner" title=3D"Soldier's Pay"/>
> <book author=3D"Steinbeck" title=3D"East of Eden"/>
> </books>
> </config>

Hi Havel,

I'm not sure that the Chinese characters in your post survived their
trip through usenet, so I can't use the above to serve as a realistic test.
Can you post a bit of Perl code (using chr(), for example) which is coded
in ASCII but would, when run, properly create the characters you are trying
to express?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB

  

  #9    06-18-2007, 06:03 AM  
Ian Wilson     

Re: could XML::Simple handling chinese character?

--------------------------------------------------------------------------------

xhoster wrote:
> havel zhang wrote:

>> <?xml version=3D"1.0" encoding=3D"utf-8"?>
>> <user>=BA=CD=C6=BD</user>
>
>
> I'm not sure that the Chinese characters in your post survived their
> trip through usenet, so I can't use the above to serve as a realistic test.

The two chinese characters displayed OK in my newsreader.

The OP's posting had this header
Content-Type: text/plain; charset="gb2312"

Could it be that your newsreader doesn't support GB2312 encoding?


As others have said, it seems likely that the OP's XML file is actually
encoded in GB2312, not in UTF8 as specified in it's XML declaration.


> Can you post a bit of Perl code (using chr(), for example) which is coded
> in ASCII but would, when run, properly create the characters you are trying
> to express?

  

  #10    06-18-2007, 05:46 PM  
Bart Lateur     

Re: could XML::Simple handling chinese character?

--------------------------------------------------------------------------------

Ian Wilson wrote:

>As others have said, it seems likely that the OP's XML file is actually
>encoded in GB2312, not in UTF8 as specified in it's XML declaration.


只看该作者    顶部
离线 liyihongcug
高级会员



精华贴数 1
个人空间 0
技术积分 9573 (122)
社区积分 1043 (892)
注册日期 2004-7-15
论坛徽章:8
会员2007贡献徽章铁扇公主授权会员2008年新春纪念徽章开发板块每日发贴之星数据库板块每日发贴之星
开发板块每日发贴之星ITPUB新首页上线纪念徽章    

发表于 2008-1-24 11:08 
$strlbls=encode("gb2312",decode("utf8",$strlbls));
                $strdats=encode("gb2312",decode("utf8",$strdats));


只看该作者    顶部
 
    

相关内容


CopyRight 1999-2006 itpub.net All Right Reserved.
北京皓辰广域网络信息技术有限公司. 版权所有
E-mail:Webmaster@itpub.net
京ICP证:010037号 联系我们 法律顾问