FAQ
Hi again, gophers,

While working on an upcoming web crawler library (gocrawl, not ready yet,
I'll post when it is, but you can follow development on github
https://github.com/PuerkitoBio/gocrawl) I had to back off as something was
missing. So here is a new library, purell, to normalize URLs:

https://github.com/PuerkitoBio/purell

Basically, you feed it a URL (either in string or *url.URL format) and you
bitwise-or some flags, and it will apply the normalization manipulations
you requested. It is heavily based on wikipedia's article on URL
normalization (http://en.wikipedia.org/wiki/URL_normalization). The README
contains some examples and a link to the full godoc reference, and there is
a rather comprehensive test suite.

Hopefully it may be useful to some of you. It will certainly be useful in
gocrawl.

As a side note, the more I use Go, the more I love it. So a big thank you
to all the people behind Go.

Thanks,
Martin Angers

--

Search Discussions

  • Shavedmyhead at Oct 2, 2012 at 1:58 pm
    I'm working on my crawler too. Thank you for your sharing.
    I will use URL normalisation pkg and some of ideas in your code.
    On Wednesday, September 19, 2012 10:59:41 PM UTC+3, Martin Angers wrote:

    Hi again, gophers,

    While working on an upcoming web crawler library (gocrawl, not ready yet,
    I'll post when it is, but you can follow development on github
    https://github.com/PuerkitoBio/gocrawl) I had to back off as something
    was missing. So here is a new library, purell, to normalize URLs:

    https://github.com/PuerkitoBio/purell

    Basically, you feed it a URL (either in string or *url.URL format) and you
    bitwise-or some flags, and it will apply the normalization manipulations
    you requested. It is heavily based on wikipedia's article on URL
    normalization (http://en.wikipedia.org/wiki/URL_normalization). The
    README contains some examples and a link to the full godoc reference, and
    there is a rather comprehensive test suite.

    Hopefully it may be useful to some of you. It will certainly be useful in
    gocrawl.

    As a side note, the more I use Go, the more I love it. So a big thank you
    to all the people behind Go.

    Thanks,
    Martin Angers
    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedSep 19, '12 at 7:59p
activeOct 2, '12 at 1:58p
posts2
users2
websitegolang.org

2 users in discussion

Martin Angers: 1 post Shavedmyhead: 1 post

People

Translate

site design / logo © 2022 Grokbase