FAQ
tl;dr: non-unique distribution names are annoying and create a
security hole on rt.cpan.org. Fixing it may not be trivial.

## Terminology and context ##

By "distribution", I generally mean the unique path of a CPAN
distribution in the authors/id/X/XY directory. I may occassionally
refer to this as a "distfile" for specific clarity. The "distribution
name" is the portion of the basename without version or suffix.

distribution: DAGOLDEN/Foo-Bar-1.23.tar.gz
author: DAGOLDEN
distribution name: Foo-Bar
version: 1.23
suffix: .tar.gz

Distributions contain modules (.pm files), which contain packages
(namespaces declared by "package NAME"). PAUSE indexes packages and
associates them with a source distribution. PAUSE has a system of
permissions for packages and ensures that distributions are unique.

## Background ##

Yesterday, I conducted a test of the CPAN ecosystem by uploading two
distributions:

* DAGOLDEN/Acme-CPAN-Testers-UNKNOWN-0.03.tar.gz
* DAGOLDEN/Acme-CPAN-Testers-FAIL-0.02.tar.gz

These were intentionally constructed to have the same "distribution
name" as these existing distributions:

* BINGOS/Acme-CPAN-Testers-UNKNOWN-0.02.tar.gz
* BINGOS/Acme-CPAN-Testers-FAIL-0.02.tar.gz

Note that one has the same version number and one does not. And thank
you to BinGOs for volunteering his Acme modules.

While my distributions had the same "name", they contained completely
different, unindexed packages. BinGOs and I do not share co-maint on
any of the packages involved.

I observed the following after PAUSE accepted the distributions and
indexed the packages:

(1) metacpan.org and search.cpan.org incorrectly linked my
distributions to BinGOs'. E.g. they both believe the latest
"Acme-CPAN-Testers-UNKNOWN" is mine, though their contents and primary
maintainers are completely different.

(2) rt.cpan.org treated both distributions as having the same RT
queue. I gained administrative access to BinGOs' existing queues.

(3) cpantesters.org treated both distributions as one for the purpose
of aggregating test reports[1]


## Implications ##

The first observation is probably annoying but not outright dangerous.
I don't think any installers use latest release data to guess which
tarball to install. However, one could, for instance upload a
distribution with a duplicate name to a popular package but with a
much higher version number and it could effectively mask the original
for some types of queries or web requests.

The second observation is a security hole. Anyone can gain
administrative rights to an RT queue simply by uploading a
distribution of the same name. (I don't know if RT rejects
distributions containing unauthorized package, but it really doesn't
matter as far as threat vectors go.)

The third observation is also annoying but not dangerous. Someone
could intentionally or unintentionally upload a duplicate distribution
name and pollute existing
test results.

Like many things on CPAN, if we sort of trust everyone to act
decently, we can probably ignore this, just like we ignore all the
*.PL files that run arbitrary code on installation.

However, all of these point to the same underlying flaw: using a
non-unique data element as a unique key. This creates a common point
of failure.

## Solutions ##

Here's where I start brainstorming. If we can get some good
discussion on this list, then maybe we could finalize a plan at the QA
hackathon, which will have a number of the relevant
maintainers/administrators attending.

(a) We could do nothing; we've lived with it and can continue to live
with it and will police any incidents on a one-off basis

(b) We could extend PAUSE's permission system distribution names as
well, so that distribution names would have primary/co-maint rights
just as packages do. This would not fix any existing duplicates, but
would prevent future infractions. It means changing a lot of PAUSE
code, but would allow RT, search sites and CPAN Testers (CT) to pretty
much remain as is.

(c) We could restrict PAUSE to allow only "well formed" distribution
names[2] -- ones matching a module name inside containing a package of
a corresponding name. E.g. "Foo-Bar", containing "Foo/Bar.pm" with
package "Foo::Bar". The existing package permissions system becomes
the chokepoint to restrict abuse. Existing distributions with
non-conforming names (e.g. libwww-perl) either change for their next
release or get grandfathered somehow.

(d) RT, search sites and CT stop using distribution name as a key and
revert either to package names or to distfile in some fashion. This
is not a trivial amount of code change and -- in the case of RT --
might make RT much more complicated and less useful.

(e) We could develop a new, unique way to identify collections of
related packages. This could be based on some combination of
distribution name and the name of an authorized packages it contains,
or perhaps just on the name of a "primary" package. RT and the search
sites would need to migrate to a new data model and probably change
their HTTP routes to match.

(f) Something else

I welcome some thoughts and discussion. Even if the ultimate
conclusion is (a), I think it would be best to select that
intentionally, not default to it through apathy.

-- David

Notes:

[1] The Metabase backend for CT correctly distinguishes reports by
distfile, but this is not yet reflected downstream in reporting.

[2] See http://www.dagolden.com/index.php/308/packages-modules-and-distributions/

--
David Golden <xdg@xdg.me>
Take back your inbox! → http://www.bunchmail.com/
Twitter/IRC: @xdg

Search Discussions

  • Michael G. Schwern at Mar 13, 2013 at 10:12 pm

    On 3/13/13 2:31 PM, David Golden wrote:
    tl;dr: non-unique distribution names are annoying and create a
    security hole on rt.cpan.org. Fixing it may not be trivial.
    +1

    "Distributions", releases of a single project, are largely informal
    entities yet they're basic CPAN structure. It would be good to
    normalize and formalize them.

    ## Terminology and context ##

    By "distribution", I generally mean the unique path of a CPAN
    distribution in the authors/id/X/XY directory. I may occassionally
    refer to this as a "distfile" for specific clarity. The "distribution
    name" is the portion of the basename without version or suffix.

    distribution: DAGOLDEN/Foo-Bar-1.23.tar.gz
    author: DAGOLDEN
    distribution name: Foo-Bar
    version: 1.23
    suffix: .tar.gz

    Distributions contain modules (.pm files), which contain packages
    (namespaces declared by "package NAME"). PAUSE indexes packages and
    associates them with a source distribution. PAUSE has a system of
    permissions for packages and ensures that distributions are unique.
    I'd suggest that what you describe is a *release* of a distribution.

    Here's how BackPAN::Index does it...

    A distribution has a name and a list of releases.

    distribution:
    id: Foo-Bar
    releases:
    - DAGOLDEN/Foo-Bar-1.23.tar.gz
    - DAGOLDEN/Foo-Bar-1.22.tar.gz
    - MORBO/Foo-Bar-1.00.tar.gz

    It is effectively "the project" and may make more sense to call it
    "project" to avoid ambiguity over "distribution".

    Releases have a file (which is the same as the identifier, but does not
    have to be), an author (really "releaser"), a version and a
    distribution. They have other stuff, but this is enough to get the
    basic release vs distribution relationship.

    release:
    id: DAGOLDEN/Foo-Bar-1.23.tar.gz
    releaser: DAGOLDEN
    version: 1.23
    distribution: Foo-Bar
    provides:
    "Foo::Bar": 1.23,
    "Foo::Bar::Baz": 1.23

    Currently the release contains most of the meta information about the
    distribution such as mailing list, stability, contact info and version
    control. It may make sense to move the formal information of project
    meta data into the distribution, but keep the mechanism for updating it
    to include it with the latest release. Effectively, most of the project
    meta data is aliased to the latest release.

    I observed the following after PAUSE accepted the distributions and
    indexed the packages:

    (1) metacpan.org and search.cpan.org incorrectly linked my
    distributions to BinGOs'. E.g. they both believe the latest
    "Acme-CPAN-Testers-UNKNOWN" is mine, though their contents and primary
    maintainers are completely different.

    (2) rt.cpan.org treated both distributions as having the same RT
    queue. I gained administrative access to BinGOs' existing queues.

    (3) cpantesters.org treated both distributions as one for the purpose
    of aggregating test reports[1]
    These are good observations about how (not) easy it is to get permission
    information out CPAN/PAUSE.

    ## Solutions ##

    Here's where I start brainstorming. If we can get some good
    discussion on this list, then maybe we could finalize a plan at the QA
    hackathon, which will have a number of the relevant
    maintainers/administrators attending.

    (a) We could do nothing; we've lived with it and can continue to live
    with it and will police any incidents on a one-off basis
    In several projects (Gitpan and BackPAN::Index being two) I've found
    that putting together what a "distribution" is either very difficult to
    get correct, or you live with a high amount of broken distribution
    lists. It would make working with CPAN much easier if discovering what
    a distribution is and their releases was easy and correct.

    Which is to say, this is not just a security problem. The cost of our
    messy concept of distributions is a barrier to doing interesting things
    with CPAN.

    (b) We could extend PAUSE's permission system distribution names as
    well, so that distribution names would have primary/co-maint rights
    just as packages do. This would not fix any existing duplicates, but
    would prevent future infractions. It means changing a lot of PAUSE
    code, but would allow RT, search sites and CPAN Testers (CT) to pretty
    much remain as is.
    +1

    IMO this is a necessary piece of missing CPAN meta data which everyone
    else has to piece together again and again.

    We could also retroactively fix duplicates as they are reported once and
    for all.

    (c) We could restrict PAUSE to allow only "well formed" distribution
    names[2] -- ones matching a module name inside containing a package of
    a corresponding name. E.g. "Foo-Bar", containing "Foo/Bar.pm" with
    package "Foo::Bar". The existing package permissions system becomes
    the chokepoint to restrict abuse.
    -1 I don't think this is necessary if B is in place, and B is a much
    better solution.

    We've always had a policy of being very liberal with what we allow and
    not everything is a Perl library, PAUSE will not try to index it. I'm
    ok with that.

    If you have something which falls outside the normal structure, for some
    reason Foo-Bar-X.YZ.tar.gz doesn't have a lib/Foo/Bar.pm, the meta data
    would be trusted. If it says its release X.YZ of the Foo-Bar
    distribution then it is. The permissions system in B protects the rest
    and a permissions/distribution API lets external sites query it.

    Existing distributions with
    non-conforming names (e.g. libwww-perl) either change for their next
    release or get grandfathered somehow.
    I'm happy to grandfather in existing major packages, especially major
    ones like libwww-perl where people have learned to look for
    libwww-perl-X.YZ.tar.gz and not LWP-X.YZ.tar.gz

    (d) RT, search sites and CT stop using distribution name as a key and
    revert either to package names or to distfile in some fashion. This
    is not a trivial amount of code change and -- in the case of RT --
    might make RT much more complicated and less useful.
    -1

    The distribution name is still a good identifier and I'd rather see the
    distribution meta problem solved.

    (e) We could develop a new, unique way to identify collections of
    related packages. This could be based on some combination of
    distribution name and the name of an authorized packages it contains,
    or perhaps just on the name of a "primary" package. RT and the search
    sites would need to migrate to a new data model and probably change
    their HTTP routes to match.
    -1

    Sounds complicated and unnecessary. Its hard to express for humans.
    The set of authorized packages and who did the release changes from
    release to release. Colons give some filesystems (OS X) indigestion.
  • Thomas Sibley at Mar 19, 2013 at 12:50 am

    On 03/13/2013 03:11 PM, Michael G. Schwern wrote:
    Here's how BackPAN::Index does it...

    A distribution has a name and a list of releases.

    [snip]
    This is the model MetaCPAN uses as well. See
    <https://github.com/CPAN-API/cpan-api/tree/master/lib/MetaCPAN/Document>.
    It may make sense to move the formal information of project
    meta data into the distribution, but keep the mechanism for updating it
    to include it with the latest release. Effectively, most of the project
    meta data is aliased to the latest release.
    +1
    (b) We could extend PAUSE's permission system distribution names as
    well, so that distribution names would have primary/co-maint rights
    just as packages do. This would not fix any existing duplicates, but
    would prevent future infractions. It means changing a lot of PAUSE
    code, but would allow RT, search sites and CPAN Testers (CT) to pretty
    much remain as is.
    +1

    IMO this is a necessary piece of missing CPAN meta data which everyone
    else has to piece together again and again.
    Ditto. +1.

    If we attempt this solution, I suggest considering case-insensitive
    distribution names for purposes of general clarity (and portability).

    A semi-related discussion regarding case-insensitive module/package
    names petered out last year:
    http://www.nntp.perl.org/group/perl.cpan.workers/2012/03/msg997.html
  • David Golden at Mar 19, 2013 at 1:01 am

    On Mon, Mar 18, 2013 at 8:50 PM, Thomas Sibley wrote:
    If we attempt this solution, I suggest considering case-insensitive
    distribution names for purposes of general clarity (and portability). +1
    A semi-related discussion regarding case-insensitive module/package
    names petered out last year:
    http://www.nntp.perl.org/group/perl.cpan.workers/2012/03/msg997.html
    Rik and I have done the code work for PAUSE. Just the pull request to
    Andreas is pending.

    David


    --
    David Golden <xdg@xdg.me>
    Take back your inbox! → http://www.bunchmail.com/
    Twitter/IRC: @xdg
  • Michael G. Schwern at Mar 20, 2013 at 12:47 am

    On 3/18/13 5:50 PM, Thomas Sibley wrote:
    If we attempt this solution, I suggest considering case-insensitive
    distribution names for purposes of general clarity (and portability).

    A semi-related discussion regarding case-insensitive module/package
    names petered out last year:
    http://www.nntp.perl.org/group/perl.cpan.workers/2012/03/msg997.html
    +1 to case-insensitive distribution names.

    -1 to case-insensitive package names, because case matters to the language.

    I'd also like to amplify changing "distribution" to "project" which
    clarifies the release vs distribution problem and also that's the
    vocabulary most other people/systems use.
  • David Golden at Mar 20, 2013 at 1:04 am

    On Tue, Mar 19, 2013 at 8:47 PM, Michael G. Schwern wrote:
    -1 to case-insensitive package names, because case matters to the language.
    Sadly, they really need to be case-insensitive, because of how Perl
    maps "Foo::Bar" to "Foo/Bar.pm".

    We don't want someone installing "foo::bar" (intentionally or by
    accident in a dependency chain) to overwrite an existing "Foo/Bar.pm"

    David

    --
    David Golden <xdg@xdg.me>
    Take back your inbox! → http://www.bunchmail.com/
    Twitter/IRC: @xdg
  • Michael G. Schwern at Mar 20, 2013 at 1:14 am

    On 3/19/13 6:03 PM, David Golden wrote:
    On Tue, Mar 19, 2013 at 8:47 PM, Michael G. Schwern wrote:
    -1 to case-insensitive package names, because case matters to the language.
    Sadly, they really need to be case-insensitive, because of how Perl
    maps "Foo::Bar" to "Foo/Bar.pm".

    We don't want someone installing "foo::bar" (intentionally or by
    accident in a dependency chain) to overwrite an existing "Foo/Bar.pm"
    I thought you had a typo there, but I see what you mean.

    Yeah, put that way I agree, package permissions on CPAN should be case
    sensitive. Existing conflicts grandfathered in as needed.
  • David E. Wheeler at Mar 20, 2013 at 3:13 am

    On Mar 19, 2013, at 6:14 PM, Michael G. Schwern wrote:

    I thought you had a typo there, but I see what you mean.

    Yeah, put that way I agree, package permissions on CPAN should be case
    sensitive. Existing conflicts grandfathered in as needed.
    I made distribution and extension names case-insensitive on PGXN to avoid these sorts of problems. +1

    Best,

    David
  • Eric Wilhelm at Mar 13, 2013 at 10:25 pm
    # from David Golden on Wednesday 13 March 2013:
    I observed the following after PAUSE accepted the distributions and
    indexed the packages:
    ...
    (4) OS packaging went sideways for debian, redhat, etc

    I'm not sure exactly how this sort of thing would play through there,
    but have observed some pain with dist/package disconnect in my recent
    packaging work. It might be an issue to consider in your solution
    (sorry I don't have anything else to contribute to that at the moment.)

    --Eric
    --
    ---------------------------------------------------
    http://scratchcomputing.com
    ---------------------------------------------------
  • Kenichi ishigaki at Mar 14, 2013 at 2:45 am
    The following are lists of the duplicated distributions found in
    uploads.db (as of 3/13 JST)

    1) duplicated cpan distributions uploaded by different authors (80 dists)
    https://gist.github.com/charsbar/a5c2452128b5fd6e5b69

    2) duplicated cpan/backpan distributions uploaded by different authors
    (317 dists)
    https://gist.github.com/charsbar/e10df3c150f4db9bd2a6

    3) duplicated cpan/backpan distributions uploaded by different
    authors, or uploaded with different file extensions (668 dists)
    https://gist.github.com/charsbar/8db90370168d8c28a504

    Some of them were uploaded by an unauthorized author (probably by
    mistake), but not a few were uploaded by different but authorized
    authors who probably forgot to update the version. It may be useful to
    check their release date, but not sure if it always works.


    2013/3/14 David Golden <xdg@xdg.me>:
    tl;dr: non-unique distribution names are annoying and create a
    security hole on rt.cpan.org. Fixing it may not be trivial.

    ## Terminology and context ##

    By "distribution", I generally mean the unique path of a CPAN
    distribution in the authors/id/X/XY directory. I may occassionally
    refer to this as a "distfile" for specific clarity. The "distribution
    name" is the portion of the basename without version or suffix.

    distribution: DAGOLDEN/Foo-Bar-1.23.tar.gz
    author: DAGOLDEN
    distribution name: Foo-Bar
    version: 1.23
    suffix: .tar.gz

    Distributions contain modules (.pm files), which contain packages
    (namespaces declared by "package NAME"). PAUSE indexes packages and
    associates them with a source distribution. PAUSE has a system of
    permissions for packages and ensures that distributions are unique.

    ## Background ##

    Yesterday, I conducted a test of the CPAN ecosystem by uploading two
    distributions:

    * DAGOLDEN/Acme-CPAN-Testers-UNKNOWN-0.03.tar.gz
    * DAGOLDEN/Acme-CPAN-Testers-FAIL-0.02.tar.gz

    These were intentionally constructed to have the same "distribution
    name" as these existing distributions:

    * BINGOS/Acme-CPAN-Testers-UNKNOWN-0.02.tar.gz
    * BINGOS/Acme-CPAN-Testers-FAIL-0.02.tar.gz

    Note that one has the same version number and one does not. And thank
    you to BinGOs for volunteering his Acme modules.

    While my distributions had the same "name", they contained completely
    different, unindexed packages. BinGOs and I do not share co-maint on
    any of the packages involved.

    I observed the following after PAUSE accepted the distributions and
    indexed the packages:

    (1) metacpan.org and search.cpan.org incorrectly linked my
    distributions to BinGOs'. E.g. they both believe the latest
    "Acme-CPAN-Testers-UNKNOWN" is mine, though their contents and primary
    maintainers are completely different.

    (2) rt.cpan.org treated both distributions as having the same RT
    queue. I gained administrative access to BinGOs' existing queues.

    (3) cpantesters.org treated both distributions as one for the purpose
    of aggregating test reports[1]


    ## Implications ##

    The first observation is probably annoying but not outright dangerous.
    I don't think any installers use latest release data to guess which
    tarball to install. However, one could, for instance upload a
    distribution with a duplicate name to a popular package but with a
    much higher version number and it could effectively mask the original
    for some types of queries or web requests.

    The second observation is a security hole. Anyone can gain
    administrative rights to an RT queue simply by uploading a
    distribution of the same name. (I don't know if RT rejects
    distributions containing unauthorized package, but it really doesn't
    matter as far as threat vectors go.)

    The third observation is also annoying but not dangerous. Someone
    could intentionally or unintentionally upload a duplicate distribution
    name and pollute existing
    test results.

    Like many things on CPAN, if we sort of trust everyone to act
    decently, we can probably ignore this, just like we ignore all the
    *.PL files that run arbitrary code on installation.

    However, all of these point to the same underlying flaw: using a
    non-unique data element as a unique key. This creates a common point
    of failure.

    ## Solutions ##

    Here's where I start brainstorming. If we can get some good
    discussion on this list, then maybe we could finalize a plan at the QA
    hackathon, which will have a number of the relevant
    maintainers/administrators attending.

    (a) We could do nothing; we've lived with it and can continue to live
    with it and will police any incidents on a one-off basis

    (b) We could extend PAUSE's permission system distribution names as
    well, so that distribution names would have primary/co-maint rights
    just as packages do. This would not fix any existing duplicates, but
    would prevent future infractions. It means changing a lot of PAUSE
    code, but would allow RT, search sites and CPAN Testers (CT) to pretty
    much remain as is.

    (c) We could restrict PAUSE to allow only "well formed" distribution
    names[2] -- ones matching a module name inside containing a package of
    a corresponding name. E.g. "Foo-Bar", containing "Foo/Bar.pm" with
    package "Foo::Bar". The existing package permissions system becomes
    the chokepoint to restrict abuse. Existing distributions with
    non-conforming names (e.g. libwww-perl) either change for their next
    release or get grandfathered somehow.

    (d) RT, search sites and CT stop using distribution name as a key and
    revert either to package names or to distfile in some fashion. This
    is not a trivial amount of code change and -- in the case of RT --
    might make RT much more complicated and less useful.

    (e) We could develop a new, unique way to identify collections of
    related packages. This could be based on some combination of
    distribution name and the name of an authorized packages it contains,
    or perhaps just on the name of a "primary" package. RT and the search
    sites would need to migrate to a new data model and probably change
    their HTTP routes to match.

    (f) Something else

    I welcome some thoughts and discussion. Even if the ultimate
    conclusion is (a), I think it would be best to select that
    intentionally, not default to it through apathy.

    -- David

    Notes:

    [1] The Metabase backend for CT correctly distinguishes reports by
    distfile, but this is not yet reflected downstream in reporting.

    [2] See http://www.dagolden.com/index.php/308/packages-modules-and-distributions/

    --
    David Golden <xdg@xdg.me>
    Take back your inbox! → http://www.bunchmail.com/
    Twitter/IRC: @xdg
  • David Golden at Mar 14, 2013 at 1:12 pm
    Thank you. We can ignore all the "-withoutworldwriteables" ones
    because PAUSE generated those.

    Last time I did this sort of thing (for figuring out how to map CT
    reports before Metabase), the mapping wasn't too bad and there were
    only a handful of cases where I had to email authors and find out what
    distribution was what.

    (Here is my "override" file that took precedence over other
    heuristics: https://gist.github.com/dagolden/5161146 )

    Usually, they were actual duplicates of the same code, possibly
    originally unauthorized. I don't I found any that were same name and
    truly different code.

    Which is why (a) do nothing could still be an option. No one has done
    anything malicious to date (that we know of).

    David
    On Wed, Mar 13, 2013 at 10:45 PM, kenichi ishigaki wrote:
    The following are lists of the duplicated distributions found in
    uploads.db (as of 3/13 JST)

    1) duplicated cpan distributions uploaded by different authors (80 dists)
    https://gist.github.com/charsbar/a5c2452128b5fd6e5b69

    2) duplicated cpan/backpan distributions uploaded by different authors
    (317 dists)
    https://gist.github.com/charsbar/e10df3c150f4db9bd2a6

    3) duplicated cpan/backpan distributions uploaded by different
    authors, or uploaded with different file extensions (668 dists)
    https://gist.github.com/charsbar/8db90370168d8c28a504

    Some of them were uploaded by an unauthorized author (probably by
    mistake), but not a few were uploaded by different but authorized
    authors who probably forgot to update the version. It may be useful to
    check their release date, but not sure if it always works.


    2013/3/14 David Golden <xdg@xdg.me>:
    tl;dr: non-unique distribution names are annoying and create a
    security hole on rt.cpan.org. Fixing it may not be trivial.

    ## Terminology and context ##

    By "distribution", I generally mean the unique path of a CPAN
    distribution in the authors/id/X/XY directory. I may occassionally
    refer to this as a "distfile" for specific clarity. The "distribution
    name" is the portion of the basename without version or suffix.

    distribution: DAGOLDEN/Foo-Bar-1.23.tar.gz
    author: DAGOLDEN
    distribution name: Foo-Bar
    version: 1.23
    suffix: .tar.gz

    Distributions contain modules (.pm files), which contain packages
    (namespaces declared by "package NAME"). PAUSE indexes packages and
    associates them with a source distribution. PAUSE has a system of
    permissions for packages and ensures that distributions are unique.

    ## Background ##

    Yesterday, I conducted a test of the CPAN ecosystem by uploading two
    distributions:

    * DAGOLDEN/Acme-CPAN-Testers-UNKNOWN-0.03.tar.gz
    * DAGOLDEN/Acme-CPAN-Testers-FAIL-0.02.tar.gz

    These were intentionally constructed to have the same "distribution
    name" as these existing distributions:

    * BINGOS/Acme-CPAN-Testers-UNKNOWN-0.02.tar.gz
    * BINGOS/Acme-CPAN-Testers-FAIL-0.02.tar.gz

    Note that one has the same version number and one does not. And thank
    you to BinGOs for volunteering his Acme modules.

    While my distributions had the same "name", they contained completely
    different, unindexed packages. BinGOs and I do not share co-maint on
    any of the packages involved.

    I observed the following after PAUSE accepted the distributions and
    indexed the packages:

    (1) metacpan.org and search.cpan.org incorrectly linked my
    distributions to BinGOs'. E.g. they both believe the latest
    "Acme-CPAN-Testers-UNKNOWN" is mine, though their contents and primary
    maintainers are completely different.

    (2) rt.cpan.org treated both distributions as having the same RT
    queue. I gained administrative access to BinGOs' existing queues.

    (3) cpantesters.org treated both distributions as one for the purpose
    of aggregating test reports[1]


    ## Implications ##

    The first observation is probably annoying but not outright dangerous.
    I don't think any installers use latest release data to guess which
    tarball to install. However, one could, for instance upload a
    distribution with a duplicate name to a popular package but with a
    much higher version number and it could effectively mask the original
    for some types of queries or web requests.

    The second observation is a security hole. Anyone can gain
    administrative rights to an RT queue simply by uploading a
    distribution of the same name. (I don't know if RT rejects
    distributions containing unauthorized package, but it really doesn't
    matter as far as threat vectors go.)

    The third observation is also annoying but not dangerous. Someone
    could intentionally or unintentionally upload a duplicate distribution
    name and pollute existing
    test results.

    Like many things on CPAN, if we sort of trust everyone to act
    decently, we can probably ignore this, just like we ignore all the
    *.PL files that run arbitrary code on installation.

    However, all of these point to the same underlying flaw: using a
    non-unique data element as a unique key. This creates a common point
    of failure.

    ## Solutions ##

    Here's where I start brainstorming. If we can get some good
    discussion on this list, then maybe we could finalize a plan at the QA
    hackathon, which will have a number of the relevant
    maintainers/administrators attending.

    (a) We could do nothing; we've lived with it and can continue to live
    with it and will police any incidents on a one-off basis

    (b) We could extend PAUSE's permission system distribution names as
    well, so that distribution names would have primary/co-maint rights
    just as packages do. This would not fix any existing duplicates, but
    would prevent future infractions. It means changing a lot of PAUSE
    code, but would allow RT, search sites and CPAN Testers (CT) to pretty
    much remain as is.

    (c) We could restrict PAUSE to allow only "well formed" distribution
    names[2] -- ones matching a module name inside containing a package of
    a corresponding name. E.g. "Foo-Bar", containing "Foo/Bar.pm" with
    package "Foo::Bar". The existing package permissions system becomes
    the chokepoint to restrict abuse. Existing distributions with
    non-conforming names (e.g. libwww-perl) either change for their next
    release or get grandfathered somehow.

    (d) RT, search sites and CT stop using distribution name as a key and
    revert either to package names or to distfile in some fashion. This
    is not a trivial amount of code change and -- in the case of RT --
    might make RT much more complicated and less useful.

    (e) We could develop a new, unique way to identify collections of
    related packages. This could be based on some combination of
    distribution name and the name of an authorized packages it contains,
    or perhaps just on the name of a "primary" package. RT and the search
    sites would need to migrate to a new data model and probably change
    their HTTP routes to match.

    (f) Something else

    I welcome some thoughts and discussion. Even if the ultimate
    conclusion is (a), I think it would be best to select that
    intentionally, not default to it through apathy.

    -- David

    Notes:

    [1] The Metabase backend for CT correctly distinguishes reports by
    distfile, but this is not yet reflected downstream in reporting.

    [2] See http://www.dagolden.com/index.php/308/packages-modules-and-distributions/

    --
    David Golden <xdg@xdg.me>
    Take back your inbox! → http://www.bunchmail.com/
    Twitter/IRC: @xdg


    --
    David Golden <xdg@xdg.me>
    Take back your inbox! → http://www.bunchmail.com/
    Twitter/IRC: @xdg
  • Ruslan Zakirov at Mar 14, 2013 at 12:15 pm

    On Thu, Mar 14, 2013 at 1:31 AM, David Golden wrote:
    (2) rt.cpan.org treated both distributions as having the same RT
    queue. I gained administrative access to BinGOs' existing queues.
    Just to make it clear. rt.cpan.org uses distribution name as queue
    identifier and
    expects it uniquelly identify a "project". For sync it uses files
    available on every
    CPAN mirror.

    Here is what happened. As you've uploaded a distribution X (fake) with
    module FOO and
    that module is not in original X then you got your permissions on FOO.
    Permissions are
    per module, not per distribution. CPAN2RT just maps permissions to
    modules, modules to distributions and grants collected set of authors
    maintainership of the distribution's queue.

    rt.cpan.org would win for sure from a new meta information DB
    published to CPAN mirrors. For example meta.sqlite.db that contains
    distributions' info, releases' info (versions), authors' info,
    maintainers per distribution, modules per distribution. I think it's
    all rt.cpan.org needs at this moment.

    --
    Best regards, Ruslan.
  • David Golden at Mar 14, 2013 at 1:00 pm

    On Thu, Mar 14, 2013 at 8:14 AM, Ruslan Zakirov wrote:
    On Thu, Mar 14, 2013 at 1:31 AM, David Golden wrote:
    (2) rt.cpan.org treated both distributions as having the same RT
    queue. I gained administrative access to BinGOs' existing queues.
    Just to make it clear. rt.cpan.org uses distribution name as queue
    identifier and
    expects it uniquelly identify a "project".
    I understand. That expectation is incorrect. That's what we're talking about.
    rt.cpan.org would win for sure from a new meta information DB
    published to CPAN mirrors.
    Thinking about a mechanism for distributing the information is a
    secondary concern. The primary concern is deciding what is the
    correct unit of aggregation and how to ensure it remains unique.

    It could just as easily be "primary package" as "distribution name" or
    "project name" or whatever. If that were to be the decision, then
    rt.cpan.org would need to migrate its queue structure accordingly.

    Most of the options involve a fair amount of work and risk and we
    collectively need to figure out the best course of action.

    Other than (a) do nothing, I think the (c) restrict PAUSE to
    well-formed distributions option is probably the least amount of work
    and risk (I could probably patch PAUSE in a few hours to do so), but
    we might regret the number of edge cases involved (non-module
    distributions) and might take a lot of flak for the new, stricter
    rules.

    I like Schwern's data model conceptually, but I cringe at the amount
    of work required to make it happen in PAUSE (including a permissions
    system around it) and PAUSE surgery feel like high risk to me.

    Seems clear to me that we're going to wind up with a "least worst" option. :-)

    David

    --
    David Golden <xdg@xdg.me>
    Take back your inbox! → http://www.bunchmail.com/
    Twitter/IRC: @xdg
  • David Cantrell at Mar 14, 2013 at 1:33 pm

    On Wed, Mar 13, 2013 at 05:31:11PM -0400, David Golden wrote:

    (b) We could extend PAUSE's permission system distribution names as
    well, so that distribution names would have primary/co-maint rights
    just as packages do. This would not fix any existing duplicates, but
    would prevent future infractions. It means changing a lot of PAUSE
    code, but would allow RT, search sites and CPAN Testers (CT) to pretty
    much remain as is.
    This seems like the most sensible way to go. It fixes the problem in
    one place with no significant impact on anyone else. I imagine that
    because you've only just noticed this there aren't that many duplicates
    anyway, and I would expect at least some of those to be mistakes that
    people would be happy to have deleted.
    (d) RT, search sites and CT stop using distribution name as a key and
    revert either to package names or to distfile in some fashion. This
    is not a trivial amount of code change and -- in the case of RT --
    might make RT much more complicated and less useful.
    RT is a commercial tool, and rt.cpan exists because Best Practical
    graciously let us use it for free. Let's not ask them to do any work,
    unless there's an actual bug in RT - which I don't think there is.

    --
    David Cantrell | Godless Liberal Elitist

    More people are driven insane through religious hysteria than
    by drinking alcohol. -- W C Fields

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcpan-workers @
categoriesperl
postedMar 13, '13 at 9:31p
activeMar 20, '13 at 3:13a
posts14
users8
websitecpan.org

People

Translate

site design / logo © 2018 Grokbase