I forgot to add: if you only care about extracting the <script>s,
<style>s and <meta>s of an HTML document, then the HTML tokenizer
might be a better match than the HTML parser. The tokenizer is lower
level: it returns a stream of tokens instead of a complete parse tree,
but it gives you []byte instead of string, which means it makes far
fewer allocations. Note that you are responsible for copying the
tokenizer's []byte somewhere if you want it to last past the life of
the current token. See https://godoc.org/golang.org/x/net/html

You received this message because you are subscribed to the Google Groups "golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-nuts+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 4 of 5 | next ›
Discussion Overview
groupgolang-nuts @
postedMay 1, '15 at 12:37p
activeMay 4, '15 at 7:31a



site design / logo © 2021 Grokbase