when parsing an xml response ( UTF-8 encoding) I get a parsing error

response =>
"<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<rss xmlns:opensearch=
\"http://a9.com/-/spec/opensearch/1.1/\" xmlns:dc=\"http://purl.org/dc/
elements/1.1/\" version=\"2.0\">\n <channel>\n <title>link:http://
lvh.me:3000 - Google Recherche de blogs</title>\n <link>http://
www.google.com/search?q=link:http://lvh.me:3000&amp;tbm=blg</link>\n
<description>Aucun document ne correspond aux termes de recherche sp
\xE9cifi\xE9s (&lt;b&gt;link:http://lvh.me:3000&lt;/b&gt;).</
description>\n <opensearch:totalResults>0</opensearch:totalResults>
\n <opensearch:startIndex>1</opensearch:startIndex>\n
<opensearch:itemsPerPage>10</opensearch:itemsPerPage>\n </channel>\n</
rss>"


parse_rss(response)

def parse_rss(body)
xml = REXML::Document.new(body)
REXML::ParseException Exception: #<REXML::ParseException:
#<ArgumentError: invalid byte sequence in UTF-8>

which seems to be raised by the <description> tag with a french text
using accentuated characters... like sp\xE9cifi\xE9s

is it an REXML bug ? ( in this case I may switch to Nokogiri...)
or did I missed any mandatory parameter in my request ?


thanks for your feedback

--
You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
To post to this group, send email to rubyonrails-talk@googlegroups.com.
To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

Search Discussions

  • Erwin at Jun 11, 2012 at 9:55 am
    [SOLVED] found the answer here :
    http://yehudakatz.com/2010/05/05/ruby-1-9-encodings-a-primer-and-the-solution-for-rails/
    .. I forgot to mention I am using Ruby 1.9.3 .....

    so

    xml =
    REXML::Document.new(body.force_encoding("ISO-8859-1").encode("UTF-8"))

    is the right way to handle the response
    On Jun 11, 11:26 am, Erwin wrote:
    when parsing an xml response  ( UTF-8 encoding) I get a parsing error

    response =>
    "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<rss xmlns:opensearch=
    \"http://a9.com/-/spec/opensearch/1.1/\" xmlns:dc=\"http://purl.org/dc/
    elements/1.1/\" version=\"2.0\">\n  <channel>\n    <title>link:http://
    lvh.me:3000 - Google Recherche de blogs</title>\n    <link>http://www.google.com/search?q=link:http://lvh.me:3000&tbm=blg</link>\n
    <description>Aucun document ne correspond aux termes de recherche sp
    \xE9cifi\xE9s (&lt;b&gt;link:http://lvh.me:3000</b>).</
    description>\n    <opensearch:totalResults>0</opensearch:totalResults>
    \n    <opensearch:startIndex>1</opensearch:startIndex>\n
    <opensearch:itemsPerPage>10</opensearch:itemsPerPage>\n  </channel>\n</
    rss>"

    parse_rss(response)

    def parse_rss(body)
    xml = REXML::Document.new(body)
    REXML::ParseException Exception: #<REXML::ParseException:
    #<ArgumentError: invalid byte sequence in UTF-8>

    which seems to be raised  by the  <description> tag with a french text
    using accentuated characters... like    sp\xE9cifi\xE9s

    is it an REXML bug ? ( in this case I may switch to Nokogiri...)
    or did I missed any mandatory parameter in my request ?

    thanks for your feedback
    --
    You received this message because you are subscribed to the Google Groups "Ruby on Rails: Talk" group.
    To post to this group, send email to rubyonrails-talk@googlegroups.com.
    To unsubscribe from this group, send email to rubyonrails-talk+unsubscribe@googlegroups.com.
    For more options, visit this group at http://groups.google.com/group/rubyonrails-talk?hl=en.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
grouprubyonrails-talk @
categoriesrubyonrails
postedJun 11, '12 at 9:27a
activeJun 11, '12 at 9:55a
posts2
users1
websiterubyonrails.org
irc#RubyOnRails

1 user in discussion

Erwin: 2 posts

People

Translate

site design / logo © 2022 Grokbase