FAQ
I am getting lots of requests like this:

http://www.example.com/index.php?q=cgi-bin/printOriginal.pl&file=/alpha/beta
/gamma/rage_prevention.shtml

The file argument is a valid page on our old site and is itself redirected
with a ReWriteRule in .htaccess. However, cgi-bin/printOriginal.pl does not
exist and I have no idea what it was supposed to do (well, I can guess print
the page). We get lots of these requests for different pages. I have tried a
simple rewrite rule and a URL alias to prevent the 404 processing, but
neither has fixed it.

Is it possible to design a rewriterule that essentially discards the
"cgi-bin/printOriginal.pl" and just serves up the requested page (well,
after its own rewrite rule has worked)? So this would become

http://www.example.com/index.php/alpha/beta/gamma/rage_prevention.shtml





Nancy E. Wichmann, PMP

Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King,
Jr.



-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.drupal.org/pipermail/development/attachments/20091110/12d21cf0/attachment.html

Search Discussions

  • Seth Freach at Nov 10, 2009 at 4:26 pm
    Nancy,

    I'm assuming this is a leftover from the moms team site? The incoming
    requests are coming from the fact that Google appears to have lots of
    these links in its index still to these URLs and sites which still link
    to these URLs.

    Instead of a rewrite, I'd suggest a a response code 301 redirect. This
    will be more Google friendly.

    look in the default .htaccess file for the (commented out by default)
    lines that deal with www. redirection (ie, you always want people to see
    "www" or never do, regardless of how they access the site.) Using those
    patterns should help show you how to redirect to the same content but
    without the "cgi-bin/printOriginal.pl&file=/"

    Seth


    Nancy Wichmann wrote:
    I am getting lots of requests like this:

    http://www.example.com/index.php?q=cgi-bin/printOriginal.pl&file=/alpha/beta/gamma/rage_prevention.shtml

    The file argument is a valid page on our old site and is itself
    redirected with a ReWriteRule in .htaccess. However,
    cgi-bin/printOriginal.pl does not exist and I have no idea what it was
    supposed to do (well, I can guess print the page). We get lots of
    these requests for different pages. I have tried a simple rewrite rule
    and a URL alias to prevent the 404 processing, but neither has fixed it.

    Is it possible to design a rewriterule that essentially discards the
    "cgi-bin/printOriginal.pl" and just serves up the requested page
    (well, after its own rewrite rule has worked)? So this would become

    http://www.example.com/index.php/alpha/beta/gamma/rage_prevention.shtml





    Nancy E. Wichmann, PMP

    Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L.
    King, Jr.

    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://lists.drupal.org/pipermail/development/attachments/20091110/9ea9d8bc/attachment.html
  • Nancy Wichmann at Nov 10, 2009 at 7:55 pm
    Wow, how did you know about MomsTeam (now YouthSportsParents)?



    I put this in there already RewriteRule ^cgi-bin/printOriginal.pl/$
    http://www.youthsportsparents.com [R01,L]

    And I am still seeing these come through to the Drupal log.



    There might be a clue in RewriteRule ^alpha/sports/(.*)
    http://www.youthsportsparents.com/sports/$1 [R01,L] if I really understood
    regular [sic] expressions.



    Nancy E. Wichmann, PMP

    Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King,
    Jr.



    From: development-bounces at drupal.org [mailto:development-bounces at drupal.org]
    On Behalf Of Seth Freach
    Sent: Tuesday, November 10, 2009 11:26 AM
    To: development at drupal.org
    Subject: Re: [development] Can .htaccess discard part of a path?



    Nancy,

    I'm assuming this is a leftover from the moms team site? The incoming
    requests are coming from the fact that Google appears to have lots of these
    links in its index still to these URLs and sites which still link to these
    URLs.

    Instead of a rewrite, I'd suggest a a response code 301 redirect. This will
    be more Google friendly.

    look in the default .htaccess file for the (commented out by default) lines
    that deal with www. redirection (ie, you always want people to see "www" or
    never do, regardless of how they access the site.) Using those patterns
    should help show you how to redirect to the same content but without the
    "cgi-bin/printOriginal.pl&file=/"

    Seth


    Nancy Wichmann wrote:

    I am getting lots of requests like this:

    http://www.example.com/index.php?q=cgi-bin/printOriginal.pl
    <http://www.example.com/index.php?q=cgi-bin/printOriginal.pl&file=/alpha/bet
    a/gamma/rage_prevention.shtml> &file=/alpha/beta/gamma/rage_prevention.shtml

    The file argument is a valid page on our old site and is itself redirected
    with a ReWriteRule in .htaccess. However, cgi-bin/printOriginal.pl does not
    exist and I have no idea what it was supposed to do (well, I can guess print
    the page). We get lots of these requests for different pages. I have tried a
    simple rewrite rule and a URL alias to prevent the 404 processing, but
    neither has fixed it.

    Is it possible to design a rewriterule that essentially discards the
    "cgi-bin/printOriginal.pl" and just serves up the requested page (well,
    after its own rewrite rule has worked)? So this would become

    http://www.example.com/index.php/alpha/beta/gamma/rage_prevention.shtml





    Nancy E. Wichmann, PMP

    Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L. King,
    Jr.



    No virus found in this incoming message.
    Checked by AVG - www.avg.com
    Version: 9.0.704 / Virus Database: 270.14.59/2494 - Release Date: 11/10/09
    02:38:00

    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://lists.drupal.org/pipermail/development/attachments/20091110/e3f11f6c/attachment-0001.html
  • Jennifer Hodgdon at Nov 10, 2009 at 8:32 pm

    Nancy Wichmann wrote:
    I put this in there already
    RewriteRule ^cgi-bin/printOriginal.pl/$
    http://www.youthsportsparents.com [R01,L]
    And I am still seeing these come through to the Drupal log.
    You said the URLs that were problems looked like this:

    http://www.example.com/index.php?q=cgi-bin/printOriginal.pl&file=/alpha/beta/gamma/rage_prevention.shtml

    The regular expression above ends in $, which is the regexp special
    character meaning "end of the string/line". So it would only match a
    URL that ended with "printOriginal.pl/". You need something after that
    to match the rest of the URL... Something like:

    ^cgi-bin/printOriginal.pl/.*

    Might work a bit better... (Caveat: I'm not an expert on Apache
    .htaccess redirects either.)

    -- Jennifer

    --
    Jennifer Hodgdon * Poplar ProductivityWare
    www.poplarware.com
    Drupal, WordPress, and custom Web programming
  • Jamie Holly at Nov 10, 2009 at 8:57 pm
    If there is no real way to figure out the new page from the old string
    then you could redirect it to a generic 404 page, or an internal Drupal
    page (or anything really):

    RewriteEngine on
    RewriteBase /
    Rewritecond %{QUERY_STRING} ^q=cgi-bin(.*)$
    RewriteRule .* {put your new URL here - keep the space between the * and
    URL}? [R01,L]

    That would redirect any query that has q=cgi-bin at the beginning to the
    new page (static 404, the front page, etc.).

    If there is a way to figure up your own content then a simple module
    would come into play here. Check for $_GET['q'] equaling the cgi-bin
    line and for $_GET['file']. Do it on something like hook_init and then
    have some code figure the post from the $_GET['file'] and do a
    drupal_goto based on the result. If nothing is found then just do a 404.


    Jamie Holly
    http://www.intoxination.net
    http://www.hollyit.net



    Jennifer Hodgdon wrote:
    Nancy Wichmann wrote:
    I put this in there already
    RewriteRule ^cgi-bin/printOriginal.pl/$
    http://www.youthsportsparents.com [R01,L]
    And I am still seeing these come through to the Drupal log.
    You said the URLs that were problems looked like this:

    http://www.example.com/index.php?q=cgi-bin/printOriginal.pl&file=/alpha/beta/gamma/rage_prevention.shtml

    The regular expression above ends in $, which is the regexp special
    character meaning "end of the string/line". So it would only match a
    URL that ended with "printOriginal.pl/". You need something after that
    to match the rest of the URL... Something like:

    ^cgi-bin/printOriginal.pl/.*

    Might work a bit better... (Caveat: I'm not an expert on Apache
    .htaccess redirects either.)

    -- Jennifer
  • Seth Freach at Nov 10, 2009 at 9:14 pm
    a module based solution is possible too, but I'd point out a couple of
    caveats:
    - a module to handle this will require a Drupal bootstrap to take
    place. Depending on your site load and resources, this may or may not
    be negligible, and might not be desirable if it can be avoided. This
    might be a non-issue for small or even medium traffic sites...
    - drupal_goto will return a 302 http response code by default. Be sure
    to specify '301' as the 4th arg to drupal_goto to tell it that this
    resource has moved permanently, not temporarily.

    Seth

    Jamie Holly wrote:
    If there is no real way to figure out the new page from the old string
    then you could redirect it to a generic 404 page, or an internal
    Drupal page (or anything really):

    RewriteEngine on
    RewriteBase /
    Rewritecond %{QUERY_STRING} ^q=cgi-bin(.*)$
    RewriteRule .* {put your new URL here - keep the space between the *
    and URL}? [R01,L]

    That would redirect any query that has q=cgi-bin at the beginning to
    the new page (static 404, the front page, etc.).

    If there is a way to figure up your own content then a simple module
    would come into play here. Check for $_GET['q'] equaling the cgi-bin
    line and for $_GET['file']. Do it on something like hook_init and then
    have some code figure the post from the $_GET['file'] and do a
    drupal_goto based on the result. If nothing is found then just do a 404.


    Jamie Holly
    http://www.intoxination.net http://www.hollyit.net



    Jennifer Hodgdon wrote:
    Nancy Wichmann wrote:
    I put this in there already RewriteRule
    ^cgi-bin/printOriginal.pl/$ http://www.youthsportsparents.com [R01,L]
    And I am still seeing these come through to the Drupal log.
    You said the URLs that were problems looked like this:

    http://www.example.com/index.php?q=cgi-bin/printOriginal.pl&file=/alpha/beta/gamma/rage_prevention.shtml


    The regular expression above ends in $, which is the regexp special
    character meaning "end of the string/line". So it would only match a
    URL that ended with "printOriginal.pl/". You need something after
    that to match the rest of the URL... Something like:

    ^cgi-bin/printOriginal.pl/.*

    Might work a bit better... (Caveat: I'm not an expert on Apache
    .htaccess redirects either.)

    -- Jennifer
  • Seth Freach at Nov 10, 2009 at 9:05 pm
    Hi Nancy.

    I haven't tested this, but try:

    RewriteCond %{REQUEST_URI}
    ^/index.php?q=cgi-bin/printOriginal\.pl&file=.*$ [NC]
    RewriteRule ^/index.php?q=cgi-bin/printOriginal\.pl&file=/(.*)$ /$1
    [L,R01]

    And see if that can give you a place to start. The above assumes that
    clean URLs will translate it to 'index.php?q=' later. This is so that
    the 301 redirect (which google will remember) will be to a Clean URL.
    If not desired to function like this, you can change the last "/$1" in
    the above example to: "/index.php?q=$1" .

    You can throw some "RewriteCond %{HTTP_HOST}" lines in there too and
    change those also if you want to preserve the SEO value of links to old
    domains as well, but that's probably a topic for another list, I'd guess.

    Seth

    (A google search for "printOriginal.pl" turned up a few momsteam.com links.)

    Nancy Wichmann wrote:
    Wow, how did you know about MomsTeam (now YouthSportsParents)?



    I put this in there already RewriteRule ^cgi-bin/printOriginal.pl/$
    http://www.youthsportsparents.com [R01,L]

    And I am still seeing these come through to the Drupal log.



    There might be a clue in RewriteRule ^alpha/sports/(.*)
    http://www.youthsportsparents.com/sports/$1 [R01,L] if I really
    understood regular [sic] expressions.



    Nancy E. Wichmann, PMP

    Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L.
    King, Jr.



    *From:* development-bounces at drupal.org
    [mailto:development-bounces at drupal.org] *On Behalf Of *Seth Freach
    *Sent:* Tuesday, November 10, 2009 11:26 AM
    *To:* development at drupal.org
    *Subject:* Re: [development] Can .htaccess discard part of a path?



    Nancy,

    I'm assuming this is a leftover from the moms team site? The incoming
    requests are coming from the fact that Google appears to have lots of
    these links in its index still to these URLs and sites which still
    link to these URLs.

    Instead of a rewrite, I'd suggest a a response code 301 redirect.
    This will be more Google friendly.

    look in the default .htaccess file for the (commented out by default)
    lines that deal with www. redirection (ie, you always want people to
    see "www" or never do, regardless of how they access the site.) Using
    those patterns should help show you how to redirect to the same
    content but without the "cgi-bin/printOriginal.pl&file=/"

    Seth


    Nancy Wichmann wrote:

    I am getting lots of requests like this:

    http://www.example.com/index.php?q=cgi-bin/printOriginal.pl&file=/alpha/beta/gamma/rage_prevention.shtml
    <http://www.example.com/index.php?q=cgi-bin/printOriginal.pl&file=/alpha/beta/gamma/rage_prevention.shtml>

    The file argument is a valid page on our old site and is itself
    redirected with a ReWriteRule in .htaccess. However,
    cgi-bin/printOriginal.pl does not exist and I have no idea what it was
    supposed to do (well, I can guess print the page). We get lots of
    these requests for different pages. I have tried a simple rewrite rule
    and a URL alias to prevent the 404 processing, but neither has fixed it.

    Is it possible to design a rewriterule that essentially discards the
    "cgi-bin/printOriginal.pl" and just serves up the requested page
    (well, after its own rewrite rule has worked)? So this would become

    http://www.example.com/index.php/alpha/beta/gamma/rage_prevention.shtml





    Nancy E. Wichmann, PMP

    Injustice anywhere is a threat to justice everywhere. -- Dr. Martin L.
    King, Jr.



    No virus found in this incoming message.
    Checked by AVG - www.avg.com
    Version: 9.0.704 / Virus Database: 270.14.59/2494 - Release Date:
    11/10/09 02:38:00
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://lists.drupal.org/pipermail/development/attachments/20091110/a75bf065/attachment.html
  • Jeffry Graham at Nov 11, 2009 at 6:17 am
    Hi Nancy,

    Your existing rewrite rules do nothing to match your QUERY_STRING. You
    need a combination of matching the REQUEST_URI and QUERY_STRING.

    I would suggest the following as a starting point *before* the
    standard drupal rewrite rules.

    RewriteCond %{REQUEST_URI} ^/cgi-bin/printOriginal.pl$
    RewriteCond %{QUERY_STRING} ^file=(.*)$
    RewriteRule ^(.*)$ %1? [R01,L]

    You may need to adjust the RewriteRule line to

    RewriteRule ^(.*)$ /PATH/TO/LEGACY/FILESDIR/%1? [R01,L]

    That way if a user requests:
    http://www.example.com/cgi-bin/printOriginal.pl?file=/foo/bar.shtml

    They should be redirected with a 301 (permanently moved) to:
    http://www.example.com/foo/bar.shtml

    The key here is that filepath and files are matched via REQUEST_URI,
    but any parameters passed must be matched via QUERY_STRING. Also, the
    QUERY_STRING regex may need to be adjusted appropriately based on your
    incoming requests. (eg. if more than 'file' appears in the parameter
    list)

    I suggest using wget or similar to test your 301's as you write them
    as it will spit out the 301 if you trigger one, and show you the
    rewritten URL client side. This is useful for debugging the
    corresponding regex's.

    I didn't test any of the above so I hope that helps get you started,

    Jeff

    PS. the 'regular' in regular expression is a reference to regular
    languages: http://en.wikipedia.org/wiki/Regular_language

    On Nov 10, 2009, at 11:55 AM, Nancy Wichmann wrote:

    Wow, how did you know about MomsTeam (now YouthSportsParents)?

    I put this in there already RewriteRule ^cgi-bin/printOriginal.pl/
    $ http://www.youthsportsparents.com[R01,L]
    And I am still seeing these come through to the Drupal log.

    There might be a clue in RewriteRule ^alpha/sports/(.*) http://www.youthsportsparents.com/sports/$1
    [R01,L] if I really understood regular [sic] expressions.

    Nancy E. Wichmann, PMP
    Injustice anywhere is a threat to justice everywhere. -- Dr. Martin
    L. King, Jr.

    From: development-bounces at drupal.org [mailto:development-bounces at drupal.org
    ] On Behalf Of Seth Freach
    Sent: Tuesday, November 10, 2009 11:26 AM
    To: development at drupal.org
    Subject: Re: [development] Can .htaccess discard part of a path?

    Nancy,

    I'm assuming this is a leftover from the moms team site? The
    incoming requests are coming from the fact that Google appears to
    have lots of these links in its index still to these URLs and sites
    which still link to these URLs.

    Instead of a rewrite, I'd suggest a a response code 301 redirect.
    This will be more Google friendly.

    look in the default .htaccess file for the (commented out by
    default) lines that deal with www. redirection (ie, you always want
    people to see "www" or never do, regardless of how they access the
    site.) Using those patterns should help show you how to redirect to
    the same content but without the "cgi-bin/printOriginal.pl&file=/"

    Seth


    Nancy Wichmann wrote:
    I am getting lots of requests like this:
    http://www.example.com/index.php?q=cgi-bin/printOriginal.pl&file=/alpha/beta/gamma/rage_prevention.shtml
    The file argument is a valid page on our old site and is itself
    redirected with a ReWriteRule in .htaccess. However, cgi-bin/
    printOriginal.pl does not exist and I have no idea what it was
    supposed to do (well, I can guess print the page). We get lots of
    these requests for different pages. I have tried a simple rewrite
    rule and a URL alias to prevent the 404 processing, but neither has
    fixed it.
    Is it possible to design a rewriterule that essentially discards the
    "cgi-bin/printOriginal.pl" and just serves up the requested page
    (well, after its own rewrite rule has worked)? So this would become
    http://www.example.com/index.php/alpha/beta/gamma/
    rage_prevention.shtml


    Nancy E. Wichmann, PMP
    Injustice anywhere is a threat to justice everywhere. -- Dr. Martin
    L. King, Jr.

    No virus found in this incoming message.
    Checked by AVG - www.avg.com
    Version: 9.0.704 / Virus Database: 270.14.59/2494 - Release Date:
    11/10/09 02:38:00
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://lists.drupal.org/pipermail/development/attachments/20091110/d965b2bc/attachment-0001.html
  • David Metzler at Nov 11, 2009 at 4:55 am
    I know this might sound crazy, but you ming consider a custom module
    that responds to cgi-bin and decides what to do from there. Or one
    that responds to cgi-bin/printOriginal.pl. Then you don't have to
    work out funky rewrite logic, and you can decide what to do from there.

    I've done some pretty crazy custom modules.... even wrote one once
    that generated dynamic css :)
    On Nov 10, 2009, at 8:08 AM, Nancy Wichmann wrote:

    I am getting lots of requests like this:
    http://www.example.com/index.php?q=cgi-bin/printOriginal.pl&file=/
    alpha/beta/gamma/rage_prevention.shtml
    The file argument is a valid page on our old site and is itself
    redirected with a ReWriteRule in .htaccess. However, cgi-bin/
    printOriginal.pl does not exist and I have no idea what it was
    supposed to do (well, I can guess print the page). We get lots of
    these requests for different pages. I have tried a simple rewrite
    rule and a URL alias to prevent the 404 processing, but neither has
    fixed it.
    Is it possible to design a rewriterule that essentially discards
    the "cgi-bin/printOriginal.pl" and just serves up the requested
    page (well, after its own rewrite rule has worked)? So this would
    become
    http://www.example.com/index.php/alpha/beta/gamma/
    rage_prevention.shtml


    Nancy E. Wichmann, PMP
    Injustice anywhere is a threat to justice everywhere. -- Dr. Martin
    L. King, Jr.
    -------------- next part --------------
    An HTML attachment was scrubbed...
    URL: http://lists.drupal.org/pipermail/development/attachments/20091110/c476de79/attachment.html

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdevelopment @
categoriesdrupal
postedNov 10, '09 at 4:08p
activeNov 11, '09 at 6:17a
posts9
users6
websitedrupal.org
irc#drupal

People

Translate

site design / logo © 2022 Grokbase