Hi again, gophers,
While working on an upcoming web crawler library (gocrawl, not ready yet,
I'll post when it is, but you can follow development on github
https://github.com/PuerkitoBio/gocrawl) I had to back off as something was
missing. So here is a new library, purell, to normalize URLs:
https://github.com/PuerkitoBio/purell
Basically, you feed it a URL (either in string or *url.URL format) and you
bitwise-or some flags, and it will apply the normalization manipulations
you requested. It is heavily based on wikipedia's article on URL
normalization (http://en.wikipedia.org/wiki/URL_normalization). The README
contains some examples and a link to the full godoc reference, and there is
a rather comprehensive test suite.
Hopefully it may be useful to some of you. It will certainly be useful in
gocrawl.
As a side note, the more I use Go, the more I love it. So a big thank you
to all the people behind Go.
Thanks,
Martin Angers
--