FAQ
Hi,

I found some time polishing my implementation of the CookieJar
interface in net/http and found the following stumbling blocks:
Handling of public suffixes, handling of IP addresses, handling
of internationalized domain names (IDN) , storage size limitations
exported methods (other than SetCookies and Cookies) and
internal storage of the cookies.
I'd like to discuss this before providing a CL for review.

- Handling of public suffixes [1] is not complicated, but
requires large tables: There are more than 6000 rules in the
table maintained under [1]. This table might be useless for
lots of use cases.

- Browsers handle cookies received from an IP address
differently than RFC 6265 requires. And different browsers
disagree on the corner cases.

- IDNs require cookie domains to be punycoded. go-idn [2]
seems natural here. Would go-idn qualify to be incorporated
into the standard library?

- Cookie jar implementations in browsers tend to limit the
amount of cookies stored by dropping infrequently used
cookies.

- Which extra methods beside SetCookies and Cookies
should a useful cookie jar export?

- Depending on the amount of cookies kept in a jar
different data structures are more efficient. Keeping 10
cookies from 2 domains is a different story than keeping
tens of thousands of cookies from hundreds of domains.

I thought about providing a CL containing a cookie jar
which addresses these issues as follows:
- Ignore public suffixes (allow domain cookies for .com or co.uk).
- Be strict with IP addresses and stick to RFC 6265.
- Ignore IDNs and mark thus as a BUG.
- Do not impose any limits or automatic cleanup actions on the jar.
- Provide the following additional methods:
func NewJar() *Jar // set up epty jar
func (jar *Jar) All() []Cookie // returns copy of jar's content
func (jar *Jar) Add(cookies []Cookie) // populate/update jar
func (jar *Jar) Remove(domain, path, name string) bool // del.
cookie

Pros, cons and rationals:
- Ignoring public suffixes seems okay as long as no one writes
a browser in Go. Ignoring is fast and doesn't require a list of
known public suffixes which gets outdated and consumes RAM.
- Not properly punycoding the domain name is clearly a bug.
The jar will not handle cookies retrieved from an IDN properly.
- I assume (!) that the typical use case requires only a few
cookies from a handful of domains (some session or authentication
cookies). Automatic cleanup or limitations seem unnecessary
on this setup.
- The exported methods allow to serialize the jar with minimal
effort if someone wants to persist it to disc or share over the
network. Remove is a pure convenience methods as it could
be simulated by a call to SetCookies but it seems natural to
have Remove if you have Add (and deleting a cookie with
SetCookies is awkward). All is "inefficient" in the sense of
returning a copy of all cookies. As this might not be the most
common method to call on a jar it seem okay to me.

Any comments welcome!

Volker


[1] http://publicsuffix.org/
[2] http://code.google.com/p/go-idn/

--

Search Discussions

  • Nigel Tao at Oct 24, 2012 at 1:25 am

    On 22 October 2012 23:53, Volker Dobler wrote:
    - Provide the following additional methods:
    func NewJar() *Jar // set up epty jar
    func (jar *Jar) All() []Cookie // returns copy of jar's content
    func (jar *Jar) Add(cookies []Cookie) // populate/update jar
    func (jar *Jar) Remove(domain, path, name string) bool // del.
    cookie
    All returns a copy of the entire jar's content instead of providing an
    iterator; this could be expensive. Add takes Cookies but Remove
    doesn't; that looks odd. How do you envision these extra methods (over
    and above http.CookieJar) being used? Is it just for marshaling to and
    unmarshaling from disk? Is something like a database-backed cookie jar
    out of scope of net/http/cookiejar? If so, should package cookiejar
    also provide exported functions to help implement a DB-backed jar,
    instead of only providing a single, blessed jar implementation? Should
    such functions live in net/http instead??

    - Ignoring public suffixes seems okay as long as no one writes
    a browser in Go. Ignoring is fast and doesn't require a list of
    known public suffixes which gets outdated and consumes RAM.
    This might be OK for a third party library (e.g. sharing your code on
    github.com or code.google.com), but it seems like a showstopper to me,
    for something proposed for the standard library. Yes, public suffix
    lists can become outdated (and so should not be hard-coded) and can
    consume space, but a standard library cookiejar should give the
    programmer the option of supporting this.

    --
  • Volker Dobler at Oct 24, 2012 at 8:11 am

    Am Mittwoch, 24. Oktober 2012 03:25:55 UTC+2 schrieb Nigel Tao:
    On 22 October 2012 23:53, Volker Dobler wrote:
    - Provide the following additional methods:
    func NewJar() *Jar // set up epty jar
    func (jar *Jar) All() []Cookie // returns copy of jar's content
    func (jar *Jar) Add(cookies []Cookie) // populate/update jar
    func (jar *Jar) Remove(domain, path, name string) bool // del.
    cookie
    All returns a copy of the entire jar's content instead of providing an
    iterator; this could be expensive.
    Add takes Cookies but Remove doesn't; that looks odd.
    Yes, that is asymmetric. But the proposed signature of Remove
    makes it clear, that it will remove the cookie identified by
    name, path and domain while ignoring all other fields in Cookie.
    But maybe a
    func (jar *Jar) Remove(cookie Cookie) bool
    would do as well if the documentation states clearly that only
    Name, Path and Domain of cookie are used to look up the
    cookie to delete.

    How do you envision these extra methods (over
    and above http.CookieJar) being used? Is it just for marshaling to and
    unmarshaling from disk?
    For All and Add: yes, basically just (un)marshalling from/to disk.
    Add and Remove are not really necessary as they can be
    simulated by SetCookie. As the usage of SetCookie to add or
    remove a cookie is not obvious/straight forward adding an
    example here might be a need.

    Is something like a database-backed cookie jar
    out of scope of net/http/cookiejar? If so, should package cookiejar
    also provide exported functions to help implement a DB-backed jar,
    instead of only providing a single, blessed jar implementation? Should
    such functions live in net/http instead??
    I understand issue 1960: "net/http/cookiejar: implement basic
    in-memory Jar" (https://code.google.com/p/go/issues/detail?id=1960)
    to be a really simple, all in-memory jar.

    I am unsure what usecase a DB-backed jar would be good for
    (except the obvious browser-usecase).
    An in-memory jar could provide hooks which signal the addition,
    modification, deletion, expiration and access of cookies to the
    outside world and a DB backend could consume these.
    Adding something like
    package cookiejar
    type ChangeNotification struct { ... }
    type Jar struct {
    // provide this channel if interested in add, mod, del, ...
    Listener chan ChangeNotification
    }
    in the future seems not too complicated.

    - Ignoring public suffixes seems okay as long as no one writes
    a browser in Go. Ignoring is fast and doesn't require a list of
    known public suffixes which gets outdated and consumes RAM.
    This might be OK for a third party library (e.g. sharing your code on
    github.com or code.google.com), but it seems like a showstopper to me,
    for something proposed for the standard library. Yes, public suffix
    lists can become outdated (and so should not be hard-coded) and can
    consume space, but a standard library cookiejar should give the
    programmer the option of supporting this.
    Actually my original cookiejar this CL is based on
    (hosted on http://code.google.com/p/cookiejar/)
    does handle public suffixes properly. I am still experimenting
    with the data structure to store the rules effiziently.
    Should the list of public suffixec be a global used by
    all Jars or a field of Jar so that different Jars can use
    different lists or even none? The second approach might
    be the more flexible one. Something like:

    package cookiejar
    type SuffixList struct { ... }
    var PublicSuffixes SuffixList = ... builtin list ...
    type Jar struct {
    PublicSuffixes *SuffixList // optional list of public suffixes
    ...
    }

    I could include this right away, but the already too big CL would
    grow even further.

    [General issues from codereview comments copied here to
    keep overall/design discussion here.]
    One bigger picture issue is that you define a cookiejar.Cookie
    type that has a lot of overlap with the existing http.Cookie type.
    Should the two types be merged?
    cookiejar.Cookie needs at least one additional field Created
    to properly sort the cookies of same length Path. At least
    this field would have to be added to http.Cookie.

    The host vs. domain cookie flag _could_ be encoded as a
    leading dot "." in the Domain field. I think this is a bit
    ugly, so maybe one additional field like HostCookie bool or
    DomainCookie bool has to be added to http.Cookie.

    If cookiejar should provide features for a DB backed
    storage the LastAccess field seems necessary for housekeeping.
    So this would be a third field added to http.Cookie.

    Basically I am feeling comfortable with merging the additional
    fields into http.Cookie and using the later one in the jar.
    Is this the way to go?
    Overall, this change is huge, and I think it is too large
    to review comfortably. It's all very well to have a complete
    implementation to prove the concept, but finer-grained code
    reviews are easier to understand and manage. It may be
    best to leave this change as is, but not for submission, and
    commit a separate series of smaller changes.
    Okay. How about:
    1. Additional fields in http.Cookie
    2. New package cookiejar which contains only
    type Jar and the exported methods with
    appropriate documentation.
    3. The internal helper functions and their test, even if it
    might be unclear what they will be used for
    4. A stub for the in memory storage (currently type flat).
    5. The cookie jar logic and the tests.
    6. The type SuffixList and a built in list.
    7. The implementation of the public Suffix stuff including
    tests.
    8. More efficient data structure for internal storage.
    9. More efficient data structures for public suffixes
    Step 5 might still be big. 6 will be big -- at least in LOC.
    For example, the domainAndType method doesn't seem
    to depend on the receiving Jar at all;
    That is true. I kept it as a method as some natural configuration
    options of Jar - e.g. allow a host cookie to be set on an IP
    address like browsers do, or the public suffix stuff - would
    be handled in domainAndType. But you are right,
    this could go into step 3 as a function and be made a
    method on jar in step 6.


    --
  • Nigel Tao at Oct 25, 2012 at 4:18 am

    On 24 October 2012 19:11, Volker Dobler wrote:
    I understand issue 1960: "net/http/cookiejar: implement basic
    in-memory Jar" (https://code.google.com/p/go/issues/detail?id=1960)
    to be a really simple, all in-memory jar.

    I am unsure what usecase a DB-backed jar would be good for
    (except the obvious browser-usecase).
    Well, eventually someone will want the obvious browser-usecase cookie
    jar. It would be great if they could re-use at least some of this
    cookie code (whether in net/http or net/http/cookiejar), especially
    with parsing.

    Should the list of public suffixec be a global used by
    all Jars or a field of Jar so that different Jars can use
    different lists or even none?
    Without having tried it, I would expect the public suffix list to be
    an interface that programmers can configure a cooke jar with; passing
    nil means to ignore public suffixes. An interface allows for various
    implementations: some hard-coded, some file-backed, etc.

    Basically I am feeling comfortable with merging the additional
    fields into http.Cookie and using the later one in the jar.
    Is this the way to go?
    Yeah, probably. I'd like to know what bradfitz and rsc think, but
    bradfitz is traveling this week and rsc is super-busy and reads
    golang-nuts in batches, so it might take a few days for a response.

    --
  • Volker Dobler at Oct 31, 2012 at 12:08 am

    On 25 Okt., 05:18, Nigel Tao wrote:
    [...] I'd like to know what bradfitz and rsc think, but
    bradfitz is traveling this week and rsc is super-busy and reads
    golang-nuts in batches, so it might take a few days for a response.
    Any ideas, comments, suggestions or preferences on
    this topic jet?

    --
  • Nigel Tao at Oct 31, 2012 at 3:48 am

    On Wed, Oct 31, 2012 at 11:08 AM, Volker Dobler wrote:
    Any ideas, comments, suggestions or preferences on
    this topic jet?
    I had a brief chat with bradfitz and rsc. The big-picture plan, being
    the sum of many small changes, looks good. A couple of comments arose:

    Ideally, persistence would be an interface, and we'll only provide an
    in-memory implementation. But all the validation to decide what to
    ignore and what to persist should be shared and in the standard
    library. That is, if I want to write a cookie jar persistence layer
    based on leveldb-go, I shouldn't have to know anything about web
    security: just how to read and write some things from leveldb.

    Dealing with IDNs and Punycode probably don't belong in the cookiejar
    layer. It should either be handled higher (in the application), or
    lower (in e.g. the net/url package). I don't know what the right
    answer is yet.

    --

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupgolang-nuts @
categoriesgo
postedOct 22, '12 at 12:53p
activeOct 31, '12 at 3:48a
posts6
users2
websitegolang.org

2 users in discussion

Nigel Tao: 3 posts Volker Dobler: 3 posts

People

Translate

site design / logo © 2021 Grokbase