FAQ
publicsuffix.org maintains a list of public suffixes, used to e.g.
scope HTTP cookies.

The golang.org/x/net/publicsuffix package contains a 'compiled'
version of the plain text list [0]. That compiled version that is more
efficient at runtime, but not human-readable.

The canonical upstream list gets ad hoc updates roughly once a week in
recent times [1]. We (usually me, sometimes Brad or Volker)
occasionally update the Go package, maybe once every few months [2]. I
hesitate to do so more frequently because even a one line change to
the upstream list pretty much changes the entire generated table ([3]
is a typical diff), and I don't want to bloat the golang.org/x/net git
repo with lots of essentially binary changes. I'm really only hand
waving and guessing here, as I don't really know how git works under
the hood, but for example, the generated table.go file currently
weighs 528K. At [4], Brad (who's away for some weeks) said that each
publicsuffix commit grows the x/net repo by 0.1MB, which isn't huge,
but it would add up over time.

On the other hand, we've had a number of CLs sent in over the years
because the checked in, generated version becomes stale, and these CLs
have arrived more frequently in recent months. It may be time to
update the generated form more frequently, if not automatically. Any
thoughts or experience out there with automatic code gen and "git
codereview mail"?

Automatic or not, it might then make sense to move the package to its
own dedicated git repo, instead of filling golang.org/x/net with noisy
churn. If so, any bikeshedding opinions between
golang.org/x/publicsuffix (gerrit + codereview) or
github.com/golang/publicsuffix (vanilla git) or something else?

Any other thoughts, golang-dev?



golang.org/issue/15518 also has some discussion of building the list
at runtime instead of at "go generate" time, if users supply a
public_suffix_list.dat file at runtime, but it is a nice property that
the package is currently usable out of the box with "go get" and
without having to supply a separate list. Also, if you need a process
to update that separate list, you might as well have a process to
update the Go package, modulo the repo size issue, if the Go package
was updated more frequently.

It is also trivial for third parties like chromium or letsencrypt to
fork the package and run "go generate" at whatever cadence they want
to, although all this might lead to is having N replicated problems
instead of 1 centralized problem.

[0] https://publicsuffix.org/list/public_suffix_list.dat
[1] https://github.com/publicsuffix/list/commits/master
[2] https://go.googlesource.com/net/+log/master/publicsuffix
[3] https://go.googlesource.com/net/+/d58ca6618b994150e624f6888d871f4709db51a0%5E%21/#F0
[4] https://github.com/letsencrypt/boulder/issues/1374#issuecomment-182429297

--
You received this message because you are subscribed to the Google Groups "golang-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Search Discussions

  • Jsha at Jun 8, 2016 at 7:15 pm
    A lot of the increased pull request problem probably results from the fact
    that Let's Encrypt groups its rate limits based on the Public Suffix List.
    This has lead to a number of free DNS providers that hadn't previously
    known about the PSL suddenly getting pressure from their customers to get
    added to the list. My apologies that this has had the unexpected knock-on
    effect of increasing pull request volume on this package.

    Simone Carletti, a PSL maintainer, has written an alternate implementation
    at https://github.com/weppos/publicsuffix-go that can read directly from a
    public_suffix_list.dat, and is planning to send a Boulder pull request to
    incorporate that library:
    https://github.com/letsencrypt/boulder/issues/1479#issuecomment-224543735.
    Hopefully that will reduce the pressure on golang.org/x/net/publicsuffix to
    update frequently.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Nigel Tao at Jun 9, 2016 at 5:25 am

    On Thu, Jun 9, 2016 at 4:45 AM, wrote:
    Simone Carletti, a PSL maintainer, has written an alternate implementation
    at https://github.com/weppos/publicsuffix-go that can read directly from a
    public_suffix_list.dat, and is planning to send a Boulder pull request to
    incorporate that library:
    https://github.com/letsencrypt/boulder/issues/1479#issuecomment-224543735.
    Hopefully that will reduce the pressure on golang.org/x/net/publicsuffix to
    update frequently.
    That might relieve the update pressure from letsencrypt on
    golang.org/x/net/publicsuffix, but the most recent CL I saw
    (https://go-review.googlesource.com/#/c/23832/) came from a Chromium
    developer, in response to a Chromium bug
    (https://github.com/chromium/hstspreload/issues/79).

    In any case, I've just submitted
    https://go-review.googlesource.com/#/c/23930/ which discards the
    for-debugging comments, shrinking the generated table.go file from
    538K to 139K.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Andrew Gerrand at Jun 9, 2016 at 1:19 am

    On 8 June 2016 at 18:45, Nigel Tao wrote:

    golang.org/x/publicsuffix (gerrit + codereview)

    If we're going for a new repo, this is what I recommend.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Dave MacFarlane at Jun 9, 2016 at 3:27 pm

    On Wed, Jun 8, 2016 at 4:45 AM, Nigel Tao wrote:

    publicsuffix.org maintains a list of public suffixes, used to e.g.
    scope HTTP cookies.

    The golang.org/x/net/publicsuffix package contains a 'compiled'
    version of the plain text list [0]. That compiled version that is more
    efficient at runtime, but not human-readable.

    The canonical upstream list gets ad hoc updates roughly once a week in
    recent times [1]. We (usually me, sometimes Brad or Volker)
    occasionally update the Go package, maybe once every few months [2]. I
    hesitate to do so more frequently because even a one line change to
    the upstream list pretty much changes the entire generated table ([3]
    is a typical diff), and I don't want to bloat the golang.org/x/net git
    repo with lots of essentially binary changes. I'm really only hand
    waving and guessing here, as I don't really know how git works under
    the hood, but for example, the generated table.go file currently
    weighs 528K. At [4], Brad (who's away for some weeks) said that each
    publicsuffix commit grows the x/net repo by 0.1MB, which isn't huge,
    but it would add up over time.
    Internally git stores a zlib compressed version of the file contents in a
    file
    under .git/objects named after the SHA1 hash of the file contents for every
    version of every file tracked by git, but since it's a SHA1 hash it's not
    duplicated if different files or commits have the same content.

    Occasionally, either when you run "git gc" or git decides to on its own, it
    takes the
    seldom used objects and compressed them into a pack file, which is the
    same as the format used to clone or fetch a repo over the wire. It's a file
    containing a
    bunch of objects where each one is represented as either the same zlib
    compressed entire contents, or as a (zlib compressed) binary delta from
    either
    another object or a delta from an absolute offset in the pack file.

    So if your "essentially binary" changes are localized to one part of the
    file (or
    get repetitive inside the file itself in a way that compresses well), then
    it will eventually
    get compressed into a way that's diskspace efficient (and regardless it'll
    use that compressed
    version to be network efficient while cloning. The git client and server
    negotiate what objects
    they have in common beforehand so that it can minimize bandwidth.), but you
    don't have any
    control over when that will happen on other people's local repos.
    Otherwise, it'll continuous grow
    at the size that you're seeing forever.

    (At least, that's my understanding from having tried to write a pure go git
    client and
    giving up before getting it to a useable state due to the complexity of
    git's command line..)

    Automatic or not, it might then make sense to move the package to its
    own dedicated git repo, instead of filling golang.org/x/net with noisy
    churn. If so, any bikeshedding opinions between
    golang.org/x/publicsuffix (gerrit + codereview) or
    github.com/golang/publicsuffix (vanilla git) or something else?

    Any other thoughts, golang-dev?
    Speaking as a Go user, the fact that some things are golang.org/x/ and some
    things are github.com/golang/x has
    always been weird and inconsistent to me. golang.org/x looks more
    "official", while I actually initially thought that
    github.com/golang was just a mirror.

    - Dave

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Travis Johnson at Jun 12, 2016 at 11:09 am

    So if your "essentially binary" changes are localized to one part of the
    file (or
    get repetitive inside the file itself in a way that compresses well)
    They're not, the data being committed is basically already "compressed" in
    a sense, as I understand it, and because the linebreaks are result-length
    based and not content-length based (think "gzip --rsyncable") you end up
    with diffs that don't work well with git's (frankly black-magic-esque) diff
    compression. Though it does still get zlib'd as you say.

    The removal of the comments seems to have shrunk the file pretty
    dramatically, but if this is going to be updated regularly it may still be
    a concern. I wonder if using a different format to store the data may help?
    Like storing one node/literal per line (bigger/longer file, but equivalent
    results and better diffs?), though I don't fully understand the method
    being used in this so I can't speak as to other potential options.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.
  • Volker Dobler at Jun 12, 2016 at 5:53 pm

    On Sun, Jun 12, 2016 at 1:09 PM, Travis Johnson wrote:

    So if your "essentially binary" changes are localized to one part of the
    file (or
    get repetitive inside the file itself in a way that compresses well)
    They're not, the data being committed is basically already "compressed" in
    a sense, as I understand it, and because the linebreaks are result-length
    based and not content-length based (think "gzip --rsyncable") you end up
    with diffs that don't work well with git's (frankly black-magic-esque) diff
    compression. Though it does still get zlib'd as you say.
    The removal of the comments seems to have shrunk the file pretty
    dramatically, but if this is going to be updated regularly it may still be
    a concern. I wonder if using a different format to store the data may help?
    Like storing one node/literal per line (bigger/longer file, but equivalent
    results and better diffs?), though I don't fully understand the method
    being used in this so I can't speak as to other potential options.

    The labels are compressed (in the sense that overlapping labels are
    combined into one large string). The actual table is stored in a form
    which allows fast lookup; it is not "compressed" it is the direct memory
    representation of the tree used to look up the publicsuffix of a domain.

    The source was never meant to be optimised, only the memory footprint
    of the list during runtime (and the time spent for a lookup).
    A more efficient or better diffable or better compressable source code
    representation probably would require building up the internal
    representation
    of the list from the source-code-efficient data either during init or the
    first use. I would consider this worse than the bytes wasted as git
    cannot compress the source well.

    V.

    --
    You received this message because you are subscribed to the Google Groups "golang-dev" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to golang-dev+unsubscribe@googlegroups.com.
    For more options, visit https://groups.google.com/d/optout.

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-dev @
categoriesgo
postedJun 8, '16 at 8:45a
activeJun 12, '16 at 5:53p
posts7
users6
websitegolang.org

People

Translate

site design / logo © 2021 Grokbase