FAQ
One of the things I have been helping companies with for the past couple
of years is sorting through the complexities of deploying PHP code with
the least possible interruption to the running site.

With APC you can achieve atomic deploys without a server restart and
without clearing the opcode cache through careful use of the
realpath/stat cache and a clearstatcache() call in the front-controller.
The logic behind it is a little complicated, but it goes something like
this:

- Request 1 starts before the deploy and loads script A, B
- Deploy to a separate directory and the docroot symlink now points to here
- Request 2 starts and loads A, B, C
- Request 1 was a bit slow and gets to load C now

So this is the scenario that trips up most deploy systems because
request 1 would load a version of C that doesn't match A and B already
loaded and thus this deploy is not atomic even though all the files were
deployed atomically.

With the realpath/stat cache and APC's use of inodes as cache keys
request 1 will get the inode from the previous version of C, so it will
not be out of sync with the previously loaded A and B. In request 2 we
put a clearstatcache() call in the front-controller triggered usually by
comparing the version baked into the front-controller with a version
number written to shared memory. So by detecting that there is a more
recent version of the code available in the front-controller at the
start of a request we can make sure that all new requests will see the
new code while requests that were executing when the deploy happened
will continue to use the previous version until they are done.

Now, with PHP 5.5 and the new OPcache things are a bit different.
OPcache is not inode-based so we can't use the same trick. Since we are
focusing on a single cache implementation I think we should document a
preferred approach to this common scenario. I see a couple of approaches:

1. Turn off validate_timestamps and always do a graceful server restart
on a deploy
+ effective
- slow and annoying when you deploy a lot, especially companies who do
a lot of A/B testing and feature-based development with potentially
hundreds of small code and config deploys to ramp features up/down
throughout the day. Being able to invalidate a single cache entry might
mean you could avoid doing the full restart on a simple config-file
deploy, but currently opcache can't do that(*)

2. Do something interesting with revalidate_freq. If we always knew that
the file stat happened at :00 of the minute and we deploy at :01 then
perhaps we could get away with not doing anything else
+ no server restarts and no cache clears
- scripts that take longer than 59 seconds to complete would be a
problem and the code currently can't guarantee timestamps checks at
regular intervals like this

3. Add some magic to OPcache that gives it the concept of a server
request. Almost like a DB transaction. Currently on a cache reset,
OPcache lets currently executing entries complete, but this is on a
per-entry basis. A web request is made up of many of these entries so
unless they are somehow bracketed it doesn't help us. So something like
opcache_request_begin()/opcache_request_done() might work.
+ no server restarts and no cache clears
- This might get way too complex, especially since userspace may never
call opcache_request_done() which means we would need some sort of
timeout mechanism as well

(*) for single-file deploys, such as a config-change to ramp a feature
up or down you could blacklist the config file and use apcu/yac or some
other user cache mechanism to speed things up.

None of these approaches sound ideal to me, and that includes the
existing inode-caching APC approach. Too brittle and complicated. Any
other ideas?

-Rasmus

Search Discussions

  • Ferenc Kovacs at Mar 23, 2013 at 10:01 pm

    On Sat, Mar 23, 2013 at 7:57 PM, Rasmus Lerdorf wrote:

    One of the things I have been helping companies with for the past couple
    of years is sorting through the complexities of deploying PHP code with
    the least possible interruption to the running site.

    With APC you can achieve atomic deploys without a server restart and
    without clearing the opcode cache through careful use of the
    realpath/stat cache and a clearstatcache() call in the front-controller.
    The logic behind it is a little complicated, but it goes something like
    this:

    - Request 1 starts before the deploy and loads script A, B
    - Deploy to a separate directory and the docroot symlink now points to here
    - Request 2 starts and loads A, B, C
    - Request 1 was a bit slow and gets to load C now

    So this is the scenario that trips up most deploy systems because
    request 1 would load a version of C that doesn't match A and B already
    loaded and thus this deploy is not atomic even though all the files were
    deployed atomically.

    With the realpath/stat cache and APC's use of inodes as cache keys
    request 1 will get the inode from the previous version of C, so it will
    not be out of sync with the previously loaded A and B. In request 2 we
    put a clearstatcache() call in the front-controller triggered usually by
    comparing the version baked into the front-controller with a version
    number written to shared memory. So by detecting that there is a more
    recent version of the code available in the front-controller at the
    start of a request we can make sure that all new requests will see the
    new code while requests that were executing when the deploy happened
    will continue to use the previous version until they are done.

    Now, with PHP 5.5 and the new OPcache things are a bit different.
    OPcache is not inode-based so we can't use the same trick. Since we are
    focusing on a single cache implementation I think we should document a
    preferred approach to this common scenario. I see a couple of approaches:

    1. Turn off validate_timestamps and always do a graceful server restart
    on a deploy
    + effective
    - slow and annoying when you deploy a lot, especially companies who do
    a lot of A/B testing and feature-based development with potentially
    hundreds of small code and config deploys to ramp features up/down
    throughout the day. Being able to invalidate a single cache entry might
    mean you could avoid doing the full restart on a simple config-file
    deploy, but currently opcache can't do that(*)

    2. Do something interesting with revalidate_freq. If we always knew that
    the file stat happened at :00 of the minute and we deploy at :01 then
    perhaps we could get away with not doing anything else
    + no server restarts and no cache clears
    - scripts that take longer than 59 seconds to complete would be a
    problem and the code currently can't guarantee timestamps checks at
    regular intervals like this

    3. Add some magic to OPcache that gives it the concept of a server
    request. Almost like a DB transaction. Currently on a cache reset,
    OPcache lets currently executing entries complete, but this is on a
    per-entry basis. A web request is made up of many of these entries so
    unless they are somehow bracketed it doesn't help us. So something like
    opcache_request_begin()/opcache_request_done() might work.
    + no server restarts and no cache clears
    - This might get way too complex, especially since userspace may never
    call opcache_request_done() which means we would need some sort of
    timeout mechanism as well

    (*) for single-file deploys, such as a config-change to ramp a feature
    up or down you could blacklist the config file and use apcu/yac or some
    other user cache mechanism to speed things up.

    None of these approaches sound ideal to me, and that includes the
    existing inode-caching APC approach. Too brittle and complicated. Any
    other ideas?

    -Rasmus

    --
    PHP Internals - PHP Runtime Development Mailing List
    To unsubscribe, visit: http://www.php.net/unsub.php
    realpath the document root(which is a symlink to the actual release
    directory) from your index.php/bootstrap file and use that as a base path
    for making absolute paths everywhere?
    that way the requests started before the symlink switch will continue with
    the old version but requests started after the switch will use the files
    from the new revision.
    ofc. you can still have issues like an ajax request from the old version
    gets served by the new version, and if you have more than one server sooner
    or later you will/have to sacrifice something from the CAP trio.

    --
    Ferenc Kovács
    @Tyr43l - http://tyrael.hu
  • Rasmus Lerdorf at Mar 23, 2013 at 10:26 pm

    On 03/23/2013 03:01 PM, Ferenc Kovacs wrote:
    realpath the document root(which is a symlink to the actual release
    directory) from your index.php/bootstrap file and use that as a base
    path for making absolute paths everywhere?
    that way the requests started before the symlink switch will continue
    with the old version but requests started after the switch will use the
    files from the new revision.
    ofc. you can still have issues like an ajax request from the old version
    gets served by the new version, and if you have more than one server
    sooner or later you will/have to sacrifice something from the CAP trio.
    Well, solving the multi-request/multi-server ajax scenario is a bit of a
    different problem. You'd need to version those requests to handle that.
    The scope I am concerned with here is per-server deploy atomicity.

    But yes, some way to have a 2-docroot scenario where all requests
    started on one via the docroot symlink stays on that one would be a good
    approach but it would take a lot of discipline at the userspace level to
    enforce that across a large and diverse codebase with autoloaders and
    actual realpath calls all over the place.

    -Rasmus
  • David Muir at Mar 24, 2013 at 10:52 am

    On 24/03/13 09:26, Rasmus Lerdorf wrote:
    On 03/23/2013 03:01 PM, Ferenc Kovacs wrote:
    realpath the document root(which is a symlink to the actual release
    directory) from your index.php/bootstrap file and use that as a base
    path for making absolute paths everywhere?
    that way the requests started before the symlink switch will continue
    with the old version but requests started after the switch will use the
    files from the new revision.
    ofc. you can still have issues like an ajax request from the old version
    gets served by the new version, and if you have more than one server
    sooner or later you will/have to sacrifice something from the CAP trio.
    Well, solving the multi-request/multi-server ajax scenario is a bit of a
    different problem. You'd need to version those requests to handle that.
    The scope I am concerned with here is per-server deploy atomicity.

    But yes, some way to have a 2-docroot scenario where all requests
    started on one via the docroot symlink stays on that one would be a good
    approach but it would take a lot of discipline at the userspace level to
    enforce that across a large and diverse codebase with autoloaders and
    actual realpath calls all over the place.

    -Rasmus
    Are you saying that to allow atomic deploys with O+ you need to make
    sure that all files are either autoloaded with a full realpath, or
    manually included/required by the realpath?

    You mentioned that O+ does not use inodes as cache keys like APC, but
    what does it use instead? Just the file path?

    Cheers,
    David
  • Terry Ellison at Mar 24, 2013 at 4:37 pm
    Rasmus,
    <snip>

    - Request 1 starts before the deploy and loads script A, B
    - Deploy to a separate directory and the docroot symlink now points to here
    - Request 2 starts and loads A, B, C
    - Request 1 was a bit slow and gets to load C now
    The issues that you raise about introducing atomic versioning in the
    script namespace do need to be addressed to avoid material service
    disruption during application version upgrade. However, surely another
    facet of the O+ architectural also frustrates this deployment model.

    My reading is that is that O+ processes each new (cache-miss) compile
    request by first sizing the memory requirements for the compiled source
    and then allocating a single brick from (one of) the SMA at its high
    water mark. Stale cache entries are marked as corrupt and their storage
    is then allocated to wasted_shared_memory with no attempt to reuse it.
    SMA exhaustion or the % wastage exceeding a threshold ultimately
    triggers a process shutdown cascade. This strategy is lean and fast
    but as far as I understand this, it ultimately uses a process death
    cascade and population rebirth to implement garbage collection.

    Wouldn't your non-stop models would require a more stable reuse
    architecture which recycles wasted memory stably without the death
    cascade? Perhaps one of the Zend team could correct my inference if
    I've got it wrong again :-(

    Regards
    Terry

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupphp-internals @
categoriesphp
postedMar 23, '13 at 6:57p
activeMar 24, '13 at 4:37p
posts5
users4
websitephp.net

People

Translate

site design / logo © 2019 Grokbase