Hallo,
JTidy is a very good HTMLParser but for HTML Websites made with the help
of Microssoft Office Products like Word for example it is not optimal.
Because ist returns "Microsoft specific HTML Tags" instead of only text.
Or as should I handle HTML Pages with source begins so
"
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=Content-Type content="text/html; charset=windows-1252">
<link rel=File-List href="index-Dateien/filelist.xml">
"
like XML Files and using a XML -Parser instead of a HTML-Parser?
I think it should be a HTML page because of
"<meta http-equiv=Content-Type content="text/html; charset=windows-1252">"
I am glad for every kind
Greetings
Gaston
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]