FAQ
First, to set a baseline, I love Catalyst and am working on 3 different
projects with it.

Now, I am arguing that the Regex argument (not snippet) handling is
conceptually broken and should be changed.

http://host.xyz/this/is/a/path
http://host.xyz/?these=1;are=1;arguments=1

If I have a Regex ('whatever/(STOP-HERE)$'), I really mean the '$'. I
want it to stop.
I don't want to slurp up more path into arguments because it's not
mapping to anything
the application does.

http://host.xyz/whatever/STOP-HERE/meaningless/misleading/nonsense/
should/404

A path is, conceptually, a pointer to a real thing. Not an arbitrary
set of parameters which
may or may not map to results. This is different from query string
arguments which are
conceptually asking a question for which there may be no answer. E.g.:
http://google.com/search?q=%22no+resultz+possible+for+query_string%22

Even in the rare cases where paths can be arbitrary, like
http://answers.com/perl%20catalyst
They are typically limited to 1 argument.

So, I'm in the position where I'm about to start adding:
my ( $self, $c, $path_args ) = @_;
die 404 if $path_args;
into most all my Regex controllers.

The solution, if anyone agrees, is to allow '$', '\Z', '\z' to truly
end what
the Regex controller will accept. If there is more, it should be kicked
down to
the default(s) where several (or all) can be handled at once instead of
per
controller sub.

Yes, sensible? No, crazy talk?


Thanks for looking!
-Ashley

Search Discussions

  • Aristotle Pagaltzis at Jan 5, 2006 at 11:49 pm
    Hi Ashley,

    * apv [2006-01-05 23:25]:
    A path is, conceptually, a pointer to a real thing. Not an
    arbitrary set of parameters which may or may not map to
    results. This is different from query string arguments which
    are conceptually asking a question for which there may be no
    answer.
    +1

    Regards,
    --
    Aristotle Pagaltzis // <http://plasmasturm.org/>
  • Adam Jacob at Jan 6, 2006 at 5:35 am

    On Jan 5, 2006, at 2:18 PM, apv wrote:
    The solution, if anyone agrees, is to allow '$', '\Z', '\z' to
    truly end what
    the Regex controller will accept. If there is more, it should be
    kicked down to
    the default(s) where several (or all) can be handled at once
    instead of per
    controller sub.

    Yes, sensible? No, crazy talk?
    Sounds sensible to me.

    Adam
  • Bill Moseley at Jan 6, 2006 at 7:32 am

    On Thu, Jan 05, 2006 at 02:18:44PM -0800, apv wrote:
    Now, I am arguing that the Regex argument (not snippet) handling is
    conceptually broken and should be changed.

    http://host.xyz/this/is/a/path
    http://host.xyz/?these=1;are=1;arguments=1
    Those are parameters. Arguments are what follow the action.
    If I have a Regex ('whatever/(STOP-HERE)$'), I really mean the '$'. I
    want it to stop.
    I don't want to slurp up more path into arguments because it's not
    mapping to anything the application does.

    http://host.xyz/whatever/STOP-HERE/meaningless/misleading/nonsense/
    should/404
    I'm not sure I'm clear what you are saying. Why not just ignore the
    arguments?

    It's how the CGI standard works and all the other actions work. Why
    should the Regex matching be different?
    So, I'm in the position where I'm about to start adding:
    my ( $self, $c, $path_args ) = @_;
    die 404 if $path_args;
    into most all my Regex controllers.
    But, you wouldn't do that for Local and Path actions?

    The solution, if anyone agrees, is to allow '$', '\Z', '\z' to truly
    end what the Regex controller will accept. If there is more, it
    should be kicked down to the default(s) where several (or all) can
    be handled at once instead of per controller sub.
    But the $ defines where the action ends and the arguments begin, right?

    If $ instead matches the end of the path then someone that wants
    arguments cannot as easily get them. As it is now if you really don't
    want to accept requests with extra PATH_INFO you can just check
    $path_args, as in your example above.




    --
    Bill Moseley
    moseley@hank.org
  • Ashley Pond V at Jan 6, 2006 at 8:33 am

    On Thursday, January 5, 2006, at 10:39 PM, Bill Moseley wrote:
    On Thu, Jan 05, 2006 at 02:18:44PM -0800, apv wrote:
    Now, I am arguing that the Regex argument (not snippet) handling is
    conceptually broken and should be changed.

    http://host.xyz/this/is/a/path
    http://host.xyz/?these=1;are=1;arguments=1
    Those are parameters. Arguments are what follow the action.
    Yes, as far as Catalyst is concerned. A parsed query string becomes the
    arguments for standard CGI. And a good standard for URIs is
    that what appears to be a literal "/this/is/something" should
    really map to something directly and with full meaning; no matter
    what is happening in the background (file system, dispatch, 302s, ...).
    If I have a Regex ('whatever/(STOP-HERE)$'), I really mean the '$'. I
    want it to stop.
    I don't want to slurp up more path into arguments because it's not
    mapping to anything the application does.

    http://host.xyz/whatever/STOP-HERE/meaningless/misleading/nonsense/
    should/404
    I'm not sure I'm clear what you are saying. Why not just ignore the
    arguments?
    Would we ignore the middle argument? Or the first? Of course not.
    Ignoring
    trailing ones is possible, because we happen to read left to right, but
    I
    think goes against a best practice mindset.

    And *all* input should be dealt with by solid code. The more of that the
    framework takes off the hacker, the better the framework is.
    It's how the CGI standard works and all the other actions work. Why
    should the Regex matching be different?
    So, I'm in the position where I'm about to start adding:
    my ( $self, $c, $path_args ) = @_;
    die 404 if $path_args;
    into most all my Regex controllers.
    But, you wouldn't do that for Local and Path actions?
    In those cases I am often going to check @path_args for size and
    content and die if they don't meet expectations. In those cases
    it's not an extra chore to do so; DRY. Though it would be nice to
    have a sub xyz : Literal {} to also circumvent needing to do that
    when it's not already needed.

    I'll add that to the wish/fix/discussion list: a "Literal" dispatch
    type that accepts no path arguments. Would also work as
    sub index : Literal {} to catch on the local "/."

    So, I'm really glad you brought this aspect up b/c I wasn't thinking
    broadly enough. There are other problems beyond conceptual with
    silently ignoring/okaying args.

    No matter how small a percentage of log entries it would be,
    it makes reliable/meaningful log parsing all but impossible. I
    really don't want logs full of this kind of stuff
    /_vti_bin/owssvr.dll?...
    /cgi-bin/formmail.cgi?...
    /favicon.ico
    to map to a real page b/c the app gave a 200 for the detritus.
    It might even encourage more hacking against an application.
    Whoo-hoo! It's accepting my spam! I'll send 5 million more through
    there right now.
    If $ instead matches the end of the path then someone that wants
    arguments cannot as easily get them. As it is now if you really don't
    want to accept requests with extra PATH_INFO you can just check
    $path_args, as in your example above.
    If you want more arguments you could easily get them by leaving off the
    '$'
    and signifying the end of your desired string with other regex
    techniques.

    I know I can check the args but as I see it, the args are wrong to be
    taken
    up when they're to be ignored in many cases, and any framework
    which requires code to circumvent default behavior a good chunk of the
    time
    is broken, as I'm seeing this.

    It's not like it's a huge headache to work around but I do believe it
    would benefit from the tweaks I'm suggesting. And of course,
    as Catalyst is gaining steam, there may not be a chance to
    revisit conceptual design choices like this a year from now.


    -Ashley

    -------------- next part --------------
    A non-text attachment was scrubbed...
    Name: not available
    Type: text/enriched
    Size: 3917 bytes
    Desc: not available
    Url : http://lists.rawmode.org/pipermail/catalyst/attachments/20060105/8496c80e/attachment.bin
  • Matt S Trout at Jan 6, 2006 at 4:07 pm

    On Thu, Jan 05, 2006 at 11:39:51PM -0800, apv wrote:
    On Thursday, January 5, 2006, at 10:39 PM, Bill Moseley wrote:
    On Thu, Jan 05, 2006 at 02:18:44PM -0800, apv wrote:

    Now, I am arguing that the Regex argument (not snippet) handling
    is
    conceptually broken and should be changed.

    http://host.xyz/this/is/a/path
    http://host.xyz/?these=1;are=1;arguments=1
    Those are parameters. Arguments are what follow the action.
    Yes, as far as Catalyst is concerned. A parsed query string becomes the
    arguments for standard CGI. And a good standard for URIs is
    that what appears to be a literal "/this/is/something" should
    really map to something directly and with full meaning; no matter
    what is happening in the background (file system, dispatch, 302s, ...).
    Which is why, for example, one might have a URI like

    /<category>/item/<id>

    However, in Catalyst you'd do this by registering a Regex of ^/(.*)/item$
    and grabbing the id out the args. If the object/record/whatever corresponding
    to that id doesn't exist, sure, 404 it - but Catalyst's default actions
    explicitly provide longest-match dispatch, and for a majority of situations
    that's incredibly useful.
    If I have a Regex ('whatever/(STOP-HERE)$'), I really mean the
    '$'. I
    want it to stop.
    I don't want to slurp up more path into arguments because it's
    not
    mapping to anything the application does.


    http://host.xyz/whatever/STOP-HERE/meaningless/misleading/nonsense/

    should/404
    I'm not sure I'm clear what you are saying. Why not just ignore
    the
    arguments?
    Would we ignore the middle argument? Or the first? Of course not.
    Ignoring
    trailing ones is possible, because we happen to read left to right, but
    I
    think goes against a best practice mindset.

    And *all* input should be dealt with by solid code. The more of that
    the
    framework takes off the hacker, the better the framework is.
    It's how the CGI standard works and all the other actions work.
    Why
    should the Regex matching be different?

    So, I'm in the position where I'm about to start adding:
    my ( $self, $c, $path_args ) = @_;
    die 404 if $path_args;
    into most all my Regex controllers.
    But, you wouldn't do that for Local and Path actions?
    In those cases I am often going to check @path_args for size and
    content and die if they don't meet expectations. In those cases
    it's not an extra chore to do so; DRY. Though it would be nice to
    have a sub xyz : Literal {} to also circumvent needing to do that
    when it's not already needed.
    Look at the DispatchType system and the way actions are registered; it
    should be easy enough to implement.
    I'll add that to the wish/fix/discussion list: a "Literal" dispatch
    type that accepts no path arguments. Would also work as
    sub index : Literal {} to catch on the local "/."
    That's how index already works.
    If $ instead matches the end of the path then someone that wants
    arguments cannot as easily get them. As it is now if you really
    don't
    want to accept requests with extra PATH_INFO you can just check
    $path_args, as in your example above.
    If you want more arguments you could easily get them by leaving off the
    '$'
    and signifying the end of your desired string with other regex
    techniques.
    However that would be a substantial backwards-compatibility breakage because
    $ already means something else currently.
    I know I can check the args but as I see it, the args are wrong to be
    taken
    up when they're to be ignored in many cases, and any framework
    which requires code to circumvent default behavior a good chunk of the
    time
    is broken, as I'm seeing this.

    It's not like it's a huge headache to work around but I do believe it
    would benefit from the tweaks I'm suggesting. And of course,
    as Catalyst is gaining steam, there may not be a chance to
    revisit conceptual design choices like this a year from now.
    Read 'sub register_actions' in Catalyst::Base, then look at
    setup_actions and prepare_action in Catalyst::Dispatcher and how the
    Catalyst::DispatchType::* and Catalyst::Action objects interact; I'd
    be very interested to see an implementation of :Literal done this way;
    much of my work on the 5.5 dispatcher was to make the sort of extensions
    you're talking about easy to do.

    --
    Matt S Trout Offering custom development, consultancy and support
    Technical Director contracts for Catalyst, DBIx::Class and BAST. Contact
    Shadowcat Systems Ltd. mst (at) shadowcatsystems.co.uk for more information

    + Help us build a better perl ORM: http://dbix-class.shadowcatsystems.co.uk/ +
  • Ashley Pond V at Jan 6, 2006 at 8:24 pm

    On Friday, January 6, 2006, at 07:24 AM, Matt S Trout wrote:
    Yes, as far as Catalyst is concerned. A parsed query string becomes
    the
    arguments for standard CGI. And a good standard for URIs is
    that what appears to be a literal "/this/is/something" should
    really map to something directly and with full meaning; no matter
    what is happening in the background (file system, dispatch, 302s,
    ...).
    Which is why, for example, one might have a URI like

    /<category>/item/<id>

    However, in Catalyst you'd do this by registering a Regex of
    ^/(.*)/item$
    and grabbing the id out the args. If the object/record/whatever
    corresponding
    to that id doesn't exist, sure, 404 it - but Catalyst's default actions
    explicitly provide longest-match dispatch, and for a majority of
    situations
    that's incredibly useful.
    I probably wouldn't write a regex that way. Dot-star is almost always a
    mistake; even when it works it's probably going to be less efficient
    than a pattern that won't necessitate backtracking (pretty sure but I'm
    not
    a regex guru).

    I would also write it to capture the id to snippets b/c you should
    write a
    check for it anyway, why not in the Regex().

    I do agree that for the majority of intended usage situations it's
    great,
    the unintended usage situations, however unlikely, are virtually
    infinite.
    So "majority" is contextual.
    Look at the DispatchType system and the way actions are registered; it
    should be easy enough to implement.
    Eek. I'll take a stab. It's so much easier to tell other people to do
    it, though,
    can't I just keep doing that? :)
    I'll add that to the wish/fix/discussion list: a "Literal" dispatch
    type that accepts no path arguments. Would also work as
    sub index : Literal {} to catch on the local "/."
    That's how index already works.
    Sorry, I was being a dummy.
    If $ instead matches the end of the path then someone that wants
    arguments cannot as easily get them. As it is now if you really
    don't
    want to accept requests with extra PATH_INFO you can just check
    $path_args, as in your example above.
    If you want more arguments you could easily get them by leaving off
    the
    '$'
    and signifying the end of your desired string with other regex
    techniques.
    However that would be a substantial backwards-compatibility breakage
    because
    $ already means something else currently.
    Yes. Now is the time though. If Catalyst's features are already locked
    down,
    it should probably be announced. :) Slightly painful/irritating changes
    are
    possible now for long term sensibility.

    The proposition would result in a few hackers changing a few lines of
    code today but would let a thousand hackers each avoid writing dozens
    of lines of code down the road (assuming they do full argument checking
    which is the best practices baseline for my side of the discussion).

    As a tangent, I was thinking last night that this sort of fudginess with
    argument handling is the kind of thing that a detractor will latch onto.
    "Oh, Catalyst? Not only is it written in _Perl_ but they don't even
    check
    args let alone type them."
    Read 'sub register_actions' in Catalyst::Base, then look at
    setup_actions and prepare_action in Catalyst::Dispatcher and how the
    Catalyst::DispatchType::* and Catalyst::Action objects interact; I'd
    be very interested to see an implementation of :Literal done this way;
    much of my work on the 5.5 dispatcher was to make the sort of
    extensions
    you're talking about easy to do.
    I really appreciate the pointers. I have not looked at the guts of
    Catalyst
    yet. I'll try to dive in this weekend.


    -Ashley
  • Aristotle Pagaltzis at Jan 6, 2006 at 2:37 pm
    Hi Bill,

    * Bill Moseley [2006-01-06 07:45]:
    I'm not sure I'm clear what you are saying. Why not just ignore
    the arguments?

    It's how the CGI standard works and all the other actions work.
    Why should the Regex matching be different?
    No, it?s not. They?re path segments, not parameters. There is an
    RFC that defines how URIs are interpreted; you should look at it
    at sometime.

    Among other things, returning 200 for URLs with random
    discardable bits means that bots such as such search engine
    spiders may go on a wild goose chase, fetching different-
    looking-but-not-actually-different URLs all day long without
    knowing any better.

    If you want such URLs to produce a result rather than just 404,
    then they should redirect to the canonical URL. I?ve written
    about the considerations in

    Transparent opaque changeable permanent URLs
    <http://plasmasturm.org/log/358/>

    I?d consider adherence to the design of HTTP the absolute most
    basic requirement for something that calls itself a ?web
    application.?

    Regards,
    --
    Aristotle Pagaltzis // <http://plasmasturm.org/>
  • Bill Moseley at Jan 6, 2006 at 7:52 pm

    On Fri, Jan 06, 2006 at 02:43:52PM +0100, A. Pagaltzis wrote:
    * Bill Moseley [2006-01-06 07:45]:
    I'm not sure I'm clear what you are saying. Why not just ignore
    the arguments?

    It's how the CGI standard works and all the other actions work.
    Why should the Regex matching be different?
    No, it???s not. They???re path segments, not parameters. There is an
    RFC that defines how URIs are interpreted; you should look at it
    at sometime.
    Oh, I should, yes.

    In the context of CGI the path segments after the script are placed in
    PATH_INFO. In the context of Catalyst, the script is the "action" and
    the PATH_INFO becomes the arguments.

    Among other things, returning 200 for URLs with random
    discardable bits means that bots such as such search engine
    spiders may go on a wild goose chase, fetching different-
    looking-but-not-actually-different URLs all day long without
    knowing any better.
    Spider's don't make up URLs to follow, they follow links. So that's
    only going to happen if you are putting up invalid links.

    If you don't want to allow extra segments after the action then it's
    easy to check @args and deal with it as you like.

    If you want such URLs to produce a result rather than just 404,
    then they should redirect to the canonical URL. I???ve written
    about the considerations in

    Transparent opaque changeable permanent URLs
    <http://plasmasturm.org/log/358/>
    Then what should a request for an invalid article do?

    $ HEAD http://plasmasturm.org/log/3583329393/
    200 OK


    --
    Bill Moseley
    moseley@hank.org
  • Ashley Pond V at Jan 6, 2006 at 8:44 pm

    On Friday, January 6, 2006, at 10:59 AM, Bill Moseley wrote:
    Among other things, returning 200 for URLs with random
    discardable bits means that bots such as such search engine
    spiders may go on a wild goose chase, fetching different-
    looking-but-not-actually-different URLs all day long without
    knowing any better.
    Spider's don't make up URLs to follow, they follow links. So that's
    only going to happen if you are putting up invalid links.
    Which would be completely safe if one never made mistakes
    and no one else in the entire Internet was allowed to link to
    your site. Once the bad URIs are in the indexes, they can
    stay and even propagate depending the engine and the site
    plan.

    Bad spiders, hackers, and gateway spammers do make up
    URLs for your site.
    If you don't want to allow extra segments after the action then it's
    easy to check @args and deal with it as you like.
    The main point -- my imitation of broken record ends here :) --
    is that best practices dictate you *always* check all input to an
    application. Therefore, the default (or easily settable) should
    be to check/limit them automatically.
    If you want such URLs to produce a result rather than just 404,
    then they should redirect to the canonical URL. I???ve written
    about the considerations in

    Transparent opaque changeable permanent URLs
    <http://plasmasturm.org/log/358/>
    Then what should a request for an invalid article do?

    $ HEAD http://plasmasturm.org/log/3583329393/
    200 OK
    It should probably 404. Pointing out that server doesn't currently
    do so doesn't change the fitness of the ideas in the article.

    -Ashley
  • Bill Moseley at Jan 6, 2006 at 9:28 pm

    On Fri, Jan 06, 2006 at 11:51:15AM -0800, apv wrote:
    Then what should a request for an invalid article do?

    $ HEAD http://plasmasturm.org/log/3583329393/
    200 OK
    It should probably 404. Pointing out that server doesn't currently
    do so doesn't change the fitness of the ideas in the article.
    No, you are right, it doesn't.

    By the way:

    Your point 2 says:

    Search engines also give huge priority to words found in the URL,
    which makes good slugs a very good idea if you care about your
    ranking.

    Do you have a source for that statement?



    --
    Bill Moseley
    moseley@hank.org
  • Aristotle Pagaltzis at Jan 6, 2006 at 10:01 pm
    Hi Bill,

    * Bill Moseley [2006-01-06 21:40]:
    Your point 2 says:
    That was my article, not Ashley?s. :-)
    Search engines also give huge priority to words found in the
    URL, which makes good slugs a very good idea if you care
    about your ranking.

    Do you have a source for that statement?
    I don?t, I?m afraid. I read it somewhere a while back, but heck
    if I can remember where. It empirically seems true, though; and
    it makes sense, considering how limited the URL real estate is.
    I?m pretty sure I could find a reference on some SEO forum or
    magazine site, but I haven?t looked.

    Regards,
    --
    Aristotle Pagaltzis // <http://plasmasturm.org/>
  • Aristotle Pagaltzis at Jan 6, 2006 at 9:43 pm
    Hi Bill,

    * Bill Moseley [2006-01-06 20:05]:
    In the context of CGI the path segments after the script are
    placed in PATH_INFO. In the context of Catalyst, the script is
    the "action" and the PATH_INFO becomes the arguments.
    Those are server-side processing minutiae. They have nothing to
    do with how a client sees things.
    Spider's don't make up URLs to follow, they follow links. So
    that's only going to happen if you are putting up invalid links.
    *insert bob-the-angry-flower on apostrophes* :-)

    Anyway, no, they don?t make up URLs, but people do, and people
    sometimes post links. Or maybe your code has a bug, and /foo/1
    links not to /bar/2, but to /foo/1/bar/2. Or has a self-link on
    /foo/1 that points to /foo/1/1. Or you restructure your site, and
    outdated links from other sites now return something that has
    nothing to do with where the link used to point to, but with a
    status 200 rather than either 410 or (as they should) a redirect.
    Or the site is a wiki and contains lots of not-yet-live internal
    links.

    They aren?t very common cases, but they happen, and they should
    not return 200.
    Then what should a request for an invalid article do?

    $ HEAD http://plasmasturm.org/log/3583329393/
    200 OK
    Yes, I know; it?s a bug, and it annoys me greatly. The reason is
    that the log permalink pages are served dynamically, because I
    haven?t figured out a good way to build them statically. I use
    make to build the site, and it?s a bit hard to coerce into doing
    what I need, because there?s no 1:1 correspondence between source
    and target files.

    I?ve been looking at other build systems, but so far I haven?t
    found anything as no-fuss as make yet. S/Cons: too specialised.
    The CPAN has nothing viable. Maybe Rake is an option.

    There are also a number of other warts; I?d like to set the mtime
    on my newsfeed according to the updated-time of the newest entry,
    f.ex.

    Maybe I should just write a script to build the site in Perl?

    Anyway, this is way off-topic. Just to say that just because the
    site has bugs doesn?t mean I don?t care.

    Regards,
    --
    Aristotle Pagaltzis // <http://plasmasturm.org/>

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcatalyst @
categoriescatalyst, perl
postedJan 5, '06 at 11:12p
activeJan 6, '06 at 10:01p
posts13
users5
websitecatalystframework.org
irc#catalyst

People

Translate

site design / logo © 2022 Grokbase