FAQ
Hi folks,

I'm working on a simple CMS (actually started it for learning Catalyst,
but the goal is to be able to maintain a few websites with it). Each
page is stored in DB and it can have file attachments, also stored in DB
(content is BLOB).

When serving an attachment, instead of always retrieving it from DB I
want to save them in a cache directory. So the first time a file is
requested, it will be fetched from DB, saved in cache, then I want the
Web server to do the job as if it were a static file. With plain
ModPerl, I would do it like this:

$r->filename($path_to_saved_file);
# plus some more hacks if it runs after PerlMapToStorageHandler
return Apache2::Const::DECLINED;

I expect something similar will work with Catalyst using the ModPerl2
engine, but I was wondering if I can do something that would work well
with the development server and FastCGI as well.

Thanks for any hints.
-Mihai

Search Discussions

  • Kieren Diment at Jun 6, 2009 at 11:40 am

    On 06/06/2009, at 9:17 PM, Mihai Bazon wrote:

    Hi folks,

    I'm working on a simple CMS (actually started it for learning
    Catalyst,
    but the goal is to be able to maintain a few websites with it). Each
    page is stored in DB and it can have file attachments, also stored
    in DB
    (content is BLOB).

    When serving an attachment, instead of always retrieving it from DB I
    want to save them in a cache directory. So the first time a file is
    requested, it will be fetched from DB, saved in cache, then I want the
    Web server to do the job as if it were a static file. With plain
    ModPerl, I would do it like this:

    $r->filename($path_to_saved_file);
    # plus some more hacks if it runs after PerlMapToStorageHandler
    return Apache2::Const::DECLINED;
    my $file = $c->path_to($something);
    if (!-e $file) ) {
    $c->model('DB::Files')->get_file(@args);
    }
    $c->serve_static_content($file); # part of
    Catalyst::Plugin::Static::Simple


    or have a read of this...

    http://dev.catalystframework.org/wiki/adventcalendararticles/2007/11-making_your_catalyst_app_cache-friendly
  • Mihai Bazon at Jun 6, 2009 at 12:07 pm

    Kieren Diment wrote:

    When serving an attachment, instead of always retrieving it from DB I
    want to save them in a cache directory. So the first time a file is
    requested, it will be fetched from DB, saved in cache, then I want the
    Web server to do the job as if it were a static file. With plain
    ModPerl, I would do it like this:

    $r->filename($path_to_saved_file);
    # plus some more hacks if it runs after PerlMapToStorageHandler
    return Apache2::Const::DECLINED;
    my $file = $c->path_to($something);
    if (!-e $file) ) {
    $c->model('DB::Files')->get_file(@args);
    }
    $c->serve_static_content($file); # part of
    Catalyst::Plugin::Static::Simple
    This seems close to what I need, thanks!

    -Mihai
  • Ian Docherty at Jun 6, 2009 at 11:41 am
    Mihai

    Mihai Bazon wrote:
    Hi folks,

    I'm working on a simple CMS (actually started it for learning Catalyst,
    but the goal is to be able to maintain a few websites with it). Each
    page is stored in DB and it can have file attachments, also stored in DB
    (content is BLOB).
    I may get shot down in flames for this, but I would not personally put
    page data or attachments into the DB in the first place. I would put the
    page into the filesystem and use the DB to reference the file contents.
    This would also satisfy your cache problems since you can retrieve the
    static data directly from the filesystem.
    When serving an attachment, instead of always retrieving it from DB I
    want to save them in a cache directory. So the first time a file is
    requested, it will be fetched from DB, saved in cache, then I want the
    Web server to do the job as if it were a static file. With plain
    ModPerl, I would do it like this:

    $r->filename($path_to_saved_file);
    # plus some more hacks if it runs after PerlMapToStorageHandler
    return Apache2::Const::DECLINED;

    I expect something similar will work with Catalyst using the ModPerl2
    engine, but I was wondering if I can do something that would work well
    with the development server and FastCGI as well.

    Thanks for any hints.
    -Mihai
    _______________________________________________
    List: Catalyst@lists.scsys.co.uk
    Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
    Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
    Dev site: http://dev.catalyst.perl.org/

  • John Romkey at Jun 6, 2009 at 12:51 pm

    On Jun 6, 2009, at 7:41 AM, Ian Docherty wrote:
    Mihai Bazon wrote:
    Hi folks,

    I'm working on a simple CMS (actually started it for learning
    Catalyst,
    but the goal is to be able to maintain a few websites with it). Each
    page is stored in DB and it can have file attachments, also stored
    in DB
    (content is BLOB).
    I may get shot down in flames for this, but I would not personally
    put page data or attachments into the DB in the first place. I would
    put the page into the filesystem and use the DB to reference the
    file contents. This would also satisfy your cache problems since you
    can retrieve the static data directly from the filesystem.
    I agree whole-heartedly.

    There are very few drawbacks to serving directly from the filesystem
    (only one I can think of, which has to do with access control to the
    content), while there are several when serving from the DB, mostly
    performance-related. Apache is very good at serving a static file
    quickly - when you have to pull a large object out of the database,
    you're dramatically increasing the instruction path and you're also
    increasing the number of copies of the data that it's necessary to do.
    Add in the overhead of then passing it through Catalyst and you've
    probably increased your overhead by orders of magnitude.

    If it's okay to pull it from the database and cache it in the
    filesystem, why not just leave it in the filesystem in the first place
    and not bother with the database?

    The only reason I would consider storing a file in a database would be
    if I couldn't satisfy my security model any other way.
    - john romkey
    http://www.romkey.com/
  • Octavian Rasnita at Jun 6, 2009 at 1:34 pm
    From: "John Romkey" <romkey@apocalypse.org>
    I agree whole-heartedly.

    There are very few drawbacks to serving directly from the filesystem
    (only one I can think of, which has to do with access control to the
    content),
    Sometimes the static content shouldn't be accessible at a permanent URL for
    security reasons.

    Although it wouldn't be portable, I think it could be helpful to be able to
    create a directory with the public content outside of the web space, and
    when an authorized user requests a certain file, the application creates a
    random link file to the wanted file, and place that link in the public web
    space, then redirect to the URL for that link file.

    The problem is that this link file should be deleted after the user finished
    downloading the file, and I don't know if this is possible, because the user
    accesses the file directly, not by using the application.

    It would be nice to have a trigger that deletes the static file (link to
    that file) after it was downloaded.

    Octavian
  • Ash Berlin at Jun 6, 2009 at 1:49 pm

    On 6 Jun 2009, at 14:34, Octavian R?snita wrote:

    From: "John Romkey" <romkey@apocalypse.org>
    I agree whole-heartedly.

    There are very few drawbacks to serving directly from the
    filesystem (only one I can think of, which has to do with access
    control to the content),
    Sometimes the static content shouldn't be accessible at a permanent
    URL for security reasons.
    Agreed. Lighttpd has X-SendFile header that is good to use in cases
    like this. nginx has similar behaviour under a different header name,
    and there is an Apache module that gives X-SendFile to Apache users.
  • Mihai Bazon at Jun 6, 2009 at 6:58 pm

    John Romkey wrote:
    On Jun 6, 2009, at 7:41 AM, Ian Docherty wrote:
    Mihai Bazon wrote:
    Hi folks,

    I'm working on a simple CMS (actually started it for learning
    Catalyst,
    but the goal is to be able to maintain a few websites with it). Each
    page is stored in DB and it can have file attachments, also stored
    in DB
    (content is BLOB).
    I may get shot down in flames for this, but I would not personally
    put page data or attachments into the DB in the first place. I would
    put the page into the filesystem and use the DB to reference the
    file contents. This would also satisfy your cache problems since you
    can retrieve the static data directly from the filesystem.
    I agree whole-heartedly.

    There are very few drawbacks to serving directly from the filesystem
    (only one I can think of, which has to do with access control to the
    content), while there are several when serving from the DB, mostly
    performance-related.
    I am aware of the performance quirks, but I'm still a fan of holding all
    the content into the DB. In fact, the bigger plan is to store templates
    and CSS/JS files in the DB as well. MySQL blobs are pretty fast, btw.

    Performance should be (almost) the same as for static files if I
    implement a handler that (1) updates the static file from DB when it's
    out of date and (2) DECLINE-s the request so that Apache^W the web
    server itself can further serve the file.

    Catalyst::Plugin::Static::Simple is doing the job (not strictly what I
    asked though), but I'm not speaking a million requests per day. I
    looked at the source--it sets $c->response->body($filehandle). It's
    elegant, but probably slow. When the need comes, I'll figure out
    something faster -- but for now all I can think of is "early
    optimization is the root of all evil". ;-)

    Cheers,
    -Mihai
  • Cosimo Streppone at Jun 6, 2009 at 7:25 pm

    Mihai Bazon wrote:

    John Romkey wrote:
    On Jun 6, 2009, at 7:41 AM, Ian Docherty wrote:
    Mihai Bazon wrote:
    Hi folks,
    I am aware of the performance quirks Good.
    MySQL blobs are pretty fast, btw.
    Ah :)

    I'm interested about this. Do you have numbers?
    Performance should be (almost) the same as for static files if I
    implement a handler that (1) updates the static file from DB when it's
    out of date and (2) DECLINE-s the request so that Apache^W the web
    server itself can further serve the file.
    Performance can be almost the same. Scalability won't.
    But of course you are the only person that can evaluate that
    depending on your needs, requirements, etc...

    However:
    When the need comes, I'll figure out something faster
    -- but for now all I can think of is "early
    optimization is the root of all evil". ;-)
    I know a team that thought exactly the same, did exactly
    the same as you are planning to do (even pictures in the db),
    served from the db and scaled on the fly by CGI processes.

    (and don't underestimate backend<->db network traffic)

    That became my team some time ago, and we spent
    _months_ to destroy that monster and serve those
    content as completely static by lightweight httpd servers.

    I'm not saying you should change anything, but think
    about the poor souls who shall maintain the system
    in the hypotetical future where you get, as we do,
    millions of hits/day.

    --
    Cosimo
  • Mihai Bazon at Jun 7, 2009 at 12:01 pm
    Look, I think I didn't properly explain what I need. Or, maybe you
    didn't read all my email and just noticed that I intend to keep file
    content in a BLOB. :-)

    I do want to keep files in the DB, BUT serve them as static files. The
    backend<->db traffic is unimportant, as it will happen only when the
    file is updated.

    I wrote some tests, since you ask me about numbers. You can download it
    here to run it yourself:

    http://mihai.bazon.net/Static-VS-DBI.tar.bz2

    There are 3 tests: (1) serving a static file, (2) fetching the content
    from DB at each request (Dropme::handle_dynamic) and (3) redirect Apache
    to a file on disk which is updated with the content from DB
    (Dropme::handle_dynamic_cached). The Dropme package is defined in
    modperl.pl

    Here's how to run it (assumes a mod_perl2-enabled Apache2):

    cd ~
    tar jxf Static-VS-DBI.tar.bz2
    chmod 777 Static-VS-DBI/cache # Apache needs to write here
    ln -s ~/Static-VS-DBI /tmp/testblob
    cd /etc/apache2/sites-enabled # or wherever you keep vhosts
    ln -s /tmp/testblob/dropme.conf
    sudo /etc/init.d/apache2 restart # or whatever for your distro
    mysqladmin -u root -p create dropme
    mysql -u root -p dropme

    # run the following in MySQL console:

    grant all privileges on dropme.* to dropme@localhost identified by 'dropme';
    ^D (exit MySQL shell)

    cd ~/Static-VS-DBI
    ./createdb.pl

    The createdb.pl script will put each files/* in a record in the "Files"
    table in the DB. Then, you can use the following URL-s to access them:

    1. http://localhost:54321/image.jpg (served statically by Apache)
    2. http://localhost:54321/mp1/image.jpg (through handle_dynamic)
    3. http://localhost:54321/mp2/image.jpg (through handle_dynamic_cached)

    To stress-test, use "ab", e.g.:

    ab -c 2 -n 100 http://localhost:54321/mp2/image.jpg

    My conclusions (numbers in requests per second):

    file | static | mp1 | mp2
    ---------------------------------------------------------
    lgpl.txt (25K) | 65.05 | 58.87 | 53.06
    image-small.jpg (110K) | 61.49 | 57.21 | 52.45
    image.jpg (1.8M) | 53.76 | 35.43 | 49.33
    10MB.bin (10M) | 38.77 | 11.01 | 31.73

    As expected, static wins in all cases. For small files, mp1 is better
    than mp2 but not by a long shot. For large files, mp1 is a lot slower,
    while static and mp2 are comparable (even when you increase -c
    (connections per second)). "mp2" can probably be optimized, I wrote
    some ugly code to check if the cached file is out of date; it also can
    be installed as a MapToStorageHandler, rather than ResponseHandler,
    since what it does is mapping an URL to a file:

    1. is the file cached?
    - If not, retrieve from DB then save it on disk.
    - If yes, is its mtime older than what's in DB?
    - if yes, retrieve from DB then save it on disk.
    2. $r->filename($cached_file) and return DECLINED

    So the file is actually served by Apache itself, and the BLOB is hit
    only once. For most requests, the Perl handler steps in only to check
    that the cached file is up-to-date. Moreover, the cache is outside the
    document_root, which is many times convenient.

    Cheers,
    -Mihai

    Cosimo Streppone wrote:
    Mihai Bazon wrote:
    John Romkey wrote:
    On Jun 6, 2009, at 7:41 AM, Ian Docherty wrote:
    Mihai Bazon wrote:
    Hi folks,
    I am aware of the performance quirks Good.
    MySQL blobs are pretty fast, btw.
    Ah :)

    I'm interested about this. Do you have numbers?
    Performance should be (almost) the same as for static files if I
    implement a handler that (1) updates the static file from DB when it's
    out of date and (2) DECLINE-s the request so that Apache^W the web
    server itself can further serve the file.
    Performance can be almost the same. Scalability won't.
    But of course you are the only person that can evaluate that
    depending on your needs, requirements, etc...

    However:
    When the need comes, I'll figure out something faster
    -- but for now all I can think of is "early
    optimization is the root of all evil". ;-)
    I know a team that thought exactly the same, did exactly
    the same as you are planning to do (even pictures in the db),
    served from the db and scaled on the fly by CGI processes.

    (and don't underestimate backend<->db network traffic)

    That became my team some time ago, and we spent
    _months_ to destroy that monster and serve those
    content as completely static by lightweight httpd servers.

    I'm not saying you should change anything, but think
    about the poor souls who shall maintain the system
    in the hypotetical future where you get, as we do,
    millions of hits/day.

    --
    Cosimo

    _______________________________________________
    List: Catalyst@lists.scsys.co.uk
    Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
    Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
    Dev site: http://dev.catalyst.perl.org/
  • Bill Moseley at Jun 7, 2009 at 3:21 pm

    On Sun, Jun 07, 2009 at 03:01:58PM +0300, Mihai Bazon wrote:
    I do want to keep files in the DB, BUT serve them as static files. The
    backend<->db traffic is unimportant, as it will happen only when the
    file is updated.
    Of all parts to a web app the database is probably the hardest to
    scale. Keeping traffic off the database when possible is good.

    Of course, when it's time to worry about that, and if it's the
    "static" content in the db that's a problem, you can move the static
    content off the db at that point. For low traffic single-server sites
    I've found the db very handy for uploaded content.

    You probably want a central store for the uploaded static content
    anyway. But, if you have or plan to have multiple non-sticky app
    servers then you don't want to be stuck with your files on the app
    server's file system.

    I wrote some tests, since you ask me about numbers. You can download it
    here to run it yourself:

    ab -c 2 -n 100 http://localhost:54321/mp2/image.jpg
    Don't you want "-n 1" there? If it's static content don't you only
    serve it once from the app server? Serve it with expire headers way
    into the future and then let the front-end proxy cache (or better use
    a content delivery network).

    As mentioned, if you need auth for these files then look into the
    sendfile and reproxy features of Lighty and Perlbal. Use a dedicated
    static server for all static content. Then you just need to get
    uploads redirected there.

    --
    Bill Moseley.
    moseley@hank.org
    Sent from my iMutt
  • Cosimo Streppone at Jun 7, 2009 at 6:43 pm

    Mihai Bazon wrote:

    I wrote some tests, since you ask me about numbers. You can download it
    here to run it yourself:

    http://mihai.bazon.net/Static-VS-DBI.tar.bz2
    Excellent! Thanks for this.
    Will look into it.
    1. is the file cached?
    - If not, retrieve from DB then save it on disk.
    - If yes, is its mtime older than what's in DB?
    [...]
    For most requests, the Perl handler steps in only to check
    that the cached file is up-to-date.
    So this is the critical step. The one that will
    force you to hit the db for every static file request.

    Again, if it's fine for you, everyone's happy.
    And probably that db table will be mostly read-only and
    cached in memory...
    Moreover, the cache is outside the
    document_root, which is many times convenient.
    I don't know what you mean here. Can you explain me?

    --
    Cosimo
  • Aristotle Pagaltzis at Jun 14, 2009 at 10:30 am

    * John Romkey [2009-06-06 14:55]:
    There are very few drawbacks to serving directly from the
    filesystem (only one I can think of, which has to do with
    access control to the content), while there are several when
    serving from the DB, mostly performance-related. Apache is very
    good at serving a static file quickly
    Varnish is even better at serving cached objects. I wouldn?t
    deploy an app without a reverse proxy in front any more. For
    static content in the DB, use some form of content addressing
    scheme (use some hash of the content as the key for the file),
    then you can set the expiration date for those URIs to 20 years
    in the future and let the frontend cache sort them out.
    - when you have to pull a large object out of the database,
    you're dramatically increasing the instruction path and you're
    also increasing the number of copies of the data that it's
    necessary to do. Add in the overhead of then passing it through
    Catalyst and you've probably increased your overhead by orders
    of magnitude.
    Who cares? The frontend reverse proxy will keep that object
    cached for the next 3 weeks if it?s hot for that long. Those
    requests are never going to punch through to the Catalyst app
    server so how slow or fast things are on the Catalyst end is
    just a pointless microoptimisation.

    And the backend gets that much easier to set up, because you no
    longer need to worry about which processes running where have
    what sort of access to what part of the filesystem, and is the
    app server configured to put the files where Apache will map them
    to the right URI?

    What a headache.

    Just stick it in the database, have the app serve it, and let
    HTTP worry about optimising GETs. That?s what it was *designed*
    for, and it?s *awesome* at that. Learn it, live it, love it.

    Regards,
    --
    Aristotle Pagaltzis // <http://plasmasturm.org/>
  • Jason Galea at Jun 7, 2009 at 12:17 am
    The way I (and I assume many others) implement image thumbnails could be
    the way to go.. (unless you need the access control)

    The server handles things as per any normal static content with a 404
    handler dealing with missing files. The handler creates and serves the
    file so any future requests for the same file are handled by the static
    server. To regenerate the file, simply delete it and it will be created
    the next time it's requested.

    cheers,

    J


    Mihai Bazon wrote:
    Hi folks,

    I'm working on a simple CMS (actually started it for learning Catalyst,
    but the goal is to be able to maintain a few websites with it). Each
    page is stored in DB and it can have file attachments, also stored in DB
    (content is BLOB).

    When serving an attachment, instead of always retrieving it from DB I
    want to save them in a cache directory. So the first time a file is
    requested, it will be fetched from DB, saved in cache, then I want the
    Web server to do the job as if it were a static file. With plain
    ModPerl, I would do it like this:

    $r->filename($path_to_saved_file);
    # plus some more hacks if it runs after PerlMapToStorageHandler
    return Apache2::Const::DECLINED;

    I expect something similar will work with Catalyst using the ModPerl2
    engine, but I was wondering if I can do something that would work well
    with the development server and FastCGI as well.

    Thanks for any hints.
    -Mihai
    _______________________________________________
    List: Catalyst@lists.scsys.co.uk
    Listinfo: http://lists.scsys.co.uk/cgi-bin/mailman/listinfo/catalyst
    Searchable archive: http://www.mail-archive.com/catalyst@lists.scsys.co.uk/
    Dev site: http://dev.catalyst.perl.org/

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcatalyst @
categoriescatalyst, perl
postedJun 6, '09 at 11:17a
activeJun 14, '09 at 10:30a
posts14
users10
websitecatalystframework.org
irc#catalyst

People

Translate

site design / logo © 2022 Grokbase