files( created by Microsoft word ), if the file
contains HTML built-in entity references(for example:
) , node value may contain unknown character.
Like this:
source html:
<DIV>
<P class=MsoNormal style="MARGIN: 0cm 0cm 0pt
18pt"><SPAN lang=EN-US style="mso-bidi-font-size:
10.5pt"><FONT face="Times New Roman"><FONT
size=3>-rw-r--r--<SPAN style="mso-spacerun:
yes"> </SPAN>1 root<SPAN
style="mso-spacerun: yes">
</SPAN>root<SPAN style="mso-spacerun:
yes">
</SPAN>50 Jan 21 16:12
_1e.f6<o:p></o:p></FONT></FONT></SPAN></P>
</DIV>
after parsing html:
-rw-r--r--��?1 root���� root���������� 50 Jan 21 16:12
_1e.f6
How can I avoid it?
_________________________________________________________
Do You Yahoo!?
150万曲MP3疯狂搜,带您闯入音乐殿堂
http://music.yisou.com/
美女明星应有尽有,搜遍美图、艳图和酷图
http://image.yisou.com
1G就是1000兆,雅虎电邮自助扩容!
http://cn.rd.yahoo.com/mail_cn/tag/1g/*http://cn.mail.yahoo.com/event/mail_1g/
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
150万曲MP3疯狂搜,带您闯入音乐殿堂
http://music.yisou.com/
美女明星应有尽有,搜遍美图、艳图和酷图
http://image.yisou.com
1G就是1000兆,雅虎电邮自助扩容!
http://cn.rd.yahoo.com/mail_cn/tag/1g/*http://cn.mail.yahoo.com/event/mail_1g/
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]