Grokbase Groups HBase dev July 2012
FAQ
Wondering whether we should retain the VersionedProtocol now that we have protobuf implementation for most (all?) of the protocols. I think we still need the version checks and do them when we need to. Take this case:
1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest) now has a better implementation called FooMethod_improved(FooMethodRequest).
3. HBase installations have happened with both the protocol implementations.
4. Clients should be able to talk to both old and new servers (and invoke the newer implementation of FooMethod if the protocol implements it).

(4) is possible when the getProtocolVersion is implemented by the protocol at the server. The client could check what the version of the protocol was (assuming VersionedProtocol semantics where the protocol version number is upgraded for such significant changes) and depending on that invoke the appropriate method...

Having to map version-numbers of protocols to the methods-supported is probably arcane IMO but works..

The other approach (that wouldn't require the version#) is to do something like - On the client side, get the protocol methods supported at the server (and cache it) and then look this map up whenever needed to decide which method to invoke.

Any thoughts on whether we should invest time in the second approach yet?

Thanks,
Devaraj.

Search Discussions

  • Jimmy Xiang at Jul 31, 2012 at 3:10 am
    Another approach is to use the new call at first. If got some
    exception like unknown method, then fall back to the old method.

    Thanks,
    Jimmy
    On Mon, Jul 30, 2012 at 5:47 PM, Devaraj Das wrote:
    Wondering whether we should retain the VersionedProtocol now that we have protobuf implementation for most (all?) of the protocols. I think we still need the version checks and do them when we need to. Take this case:
    1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
    2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest) now has a better implementation called FooMethod_improved(FooMethodRequest).
    3. HBase installations have happened with both the protocol implementations.
    4. Clients should be able to talk to both old and new servers (and invoke the newer implementation of FooMethod if the protocol implements it).

    (4) is possible when the getProtocolVersion is implemented by the protocol at the server. The client could check what the version of the protocol was (assuming VersionedProtocol semantics where the protocol version number is upgraded for such significant changes) and depending on that invoke the appropriate method...

    Having to map version-numbers of protocols to the methods-supported is probably arcane IMO but works..

    The other approach (that wouldn't require the version#) is to do something like - On the client side, get the protocol methods supported at the server (and cache it) and then look this map up whenever needed to decide which method to invoke.

    Any thoughts on whether we should invest time in the second approach yet?

    Thanks,
    Devaraj.
  • Ted Yu at Jul 31, 2012 at 3:12 am
    If v3 of the method emerges, we might need to retry twice, right ?

    Cheers
    On Mon, Jul 30, 2012 at 8:09 PM, Jimmy Xiang wrote:

    Another approach is to use the new call at first. If got some
    exception like unknown method, then fall back to the old method.

    Thanks,
    Jimmy
    On Mon, Jul 30, 2012 at 5:47 PM, Devaraj Das wrote:
    Wondering whether we should retain the VersionedProtocol now that we
    have protobuf implementation for most (all?) of the protocols. I think we
    still need the version checks and do them when we need to. Take this case:
    1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
    2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest)
    now has a better implementation called FooMethod_improved(FooMethodRequest).
    3. HBase installations have happened with both the protocol
    implementations.
    4. Clients should be able to talk to both old and new servers (and
    invoke the newer implementation of FooMethod if the protocol implements it).
    (4) is possible when the getProtocolVersion is implemented by the
    protocol at the server. The client could check what the version of the
    protocol was (assuming VersionedProtocol semantics where the protocol
    version number is upgraded for such significant changes) and depending on
    that invoke the appropriate method...
    Having to map version-numbers of protocols to the methods-supported is
    probably arcane IMO but works..
    The other approach (that wouldn't require the version#) is to do
    something like - On the client side, get the protocol methods supported at
    the server (and cache it) and then look this map up whenever needed to
    decide which method to invoke.
    Any thoughts on whether we should invest time in the second approach yet?

    Thanks,
    Devaraj.
  • Ted Yu at Jul 31, 2012 at 9:50 pm
    I looked at TestMultipleProtocolServer.java from hadoop trunk.
    It illustrates how VersionedProtocol is used for client to talk to servers
    running various versioned protocols.

    FYI
    On Mon, Jul 30, 2012 at 8:11 PM, Ted Yu wrote:

    If v3 of the method emerges, we might need to retry twice, right ?

    Cheers

    On Mon, Jul 30, 2012 at 8:09 PM, Jimmy Xiang wrote:

    Another approach is to use the new call at first. If got some
    exception like unknown method, then fall back to the old method.

    Thanks,
    Jimmy

    On Mon, Jul 30, 2012 at 5:47 PM, Devaraj Das <ddas@hortonworks.com>
    wrote:
    Wondering whether we should retain the VersionedProtocol now that we
    have protobuf implementation for most (all?) of the protocols. I think we
    still need the version checks and do them when we need to. Take this case:
    1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
    2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest)
    now has a better implementation called FooMethod_improved(FooMethodRequest).
    3. HBase installations have happened with both the protocol
    implementations.
    4. Clients should be able to talk to both old and new servers (and
    invoke the newer implementation of FooMethod if the protocol implements it).
    (4) is possible when the getProtocolVersion is implemented by the
    protocol at the server. The client could check what the version of the
    protocol was (assuming VersionedProtocol semantics where the protocol
    version number is upgraded for such significant changes) and depending on
    that invoke the appropriate method...
    Having to map version-numbers of protocols to the methods-supported is
    probably arcane IMO but works..
    The other approach (that wouldn't require the version#) is to do
    something like - On the client side, get the protocol methods supported at
    the server (and cache it) and then look this map up whenever needed to
    decide which method to invoke.
    Any thoughts on whether we should invest time in the second approach yet?
    Thanks,
    Devaraj.
  • Michael Stack at Aug 1, 2012 at 8:41 am

    On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das wrote:
    Wondering whether we should retain the VersionedProtocol now that we have protobuf implementation for most (all?) of the protocols. I think we still need the version checks and do them when we need to. Take this case:
    1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
    2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest) now has a better implementation called FooMethod_improved(FooMethodRequest).
    3. HBase installations have happened with both the protocol implementations.
    4. Clients should be able to talk to both old and new servers (and invoke the newer implementation of FooMethod if the protocol implements it).

    (4) is possible when the getProtocolVersion is implemented by the protocol at the server. The client could check what the version of the protocol was (assuming VersionedProtocol semantics where the protocol version number is upgraded for such significant changes) and depending on that invoke the appropriate method...

    Having to map version-numbers of protocols to the methods-supported is probably arcane IMO but works..

    The other approach (that wouldn't require the version#) is to do something like - On the client side, get the protocol methods supported at the server (and cache it) and then look this map up whenever needed to decide which method to invoke.

    Any thoughts on whether we should invest time in the second approach yet?
    The VersionedProtocol w/ client being able to interrogate what methods
    a server supports strikes me as a facility that will be rarely used if
    at all and bringing it along, keeping up the directory of supported
    methods, will take a load of work on our part that we'll do less than
    perfectly so should it ever be needed, it won't work because we let it
    go stale.

    What do you reckon?

    The above painted scenario too is a little on the exotic side. We can
    do something like Jimmy suggests in those rare cases we need to add a
    new method because there is insufficient wiggle-room w/i the
    particular PB method call (If we get into the issue Ted raises where
    we'd have to go back to the server twice because there is a third new
    method call, we're doing our API wrong).

    The protocol needs a version though. We'll be still sending that
    'hrpc' long in the header preamble? Should we add a version long
    after the 'hrpc' long?

    As to a directory of supported methods, do we need this in the
    protocol at all? Can't this be knowledge kept outside of the
    on-the-wire back and forth?

    St.Ack
  • Todd Lipcon at Aug 1, 2012 at 6:05 pm
    One possibility:

    During the IPC handshake, we could send the full version string /
    source checksum. Then, have a client-wide map which caches which
    methods have been found to be supported or not supported for an
    individual version. So, we don't need to maintain the mapping
    ourselves, but we also wouldn't need to do the full retry every time.

    A different idea would be to introduce a call like
    "getServerCapabilities()" which returns a bitmap, and define a bit per
    time that we add a new feature.

    The advantage of these approaches vs a single increasing version
    number is that we sometimes want to backport a new IPC to an older
    version, but not backport all of the intervening IPCs. Having a bitmap
    allows us to "pick and choose" on backports without having to pull in
    a bunch of things we didn't necessarily want.

    -Todd
    On Wed, Aug 1, 2012 at 1:41 AM, Stack wrote:
    On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das wrote:
    Wondering whether we should retain the VersionedProtocol now that we have protobuf implementation for most (all?) of the protocols. I think we still need the version checks and do them when we need to. Take this case:
    1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
    2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest) now has a better implementation called FooMethod_improved(FooMethodRequest).
    3. HBase installations have happened with both the protocol implementations.
    4. Clients should be able to talk to both old and new servers (and invoke the newer implementation of FooMethod if the protocol implements it).

    (4) is possible when the getProtocolVersion is implemented by the protocol at the server. The client could check what the version of the protocol was (assuming VersionedProtocol semantics where the protocol version number is upgraded for such significant changes) and depending on that invoke the appropriate method...

    Having to map version-numbers of protocols to the methods-supported is probably arcane IMO but works..

    The other approach (that wouldn't require the version#) is to do something like - On the client side, get the protocol methods supported at the server (and cache it) and then look this map up whenever needed to decide which method to invoke.

    Any thoughts on whether we should invest time in the second approach yet?
    The VersionedProtocol w/ client being able to interrogate what methods
    a server supports strikes me as a facility that will be rarely used if
    at all and bringing it along, keeping up the directory of supported
    methods, will take a load of work on our part that we'll do less than
    perfectly so should it ever be needed, it won't work because we let it
    go stale.

    What do you reckon?

    The above painted scenario too is a little on the exotic side. We can
    do something like Jimmy suggests in those rare cases we need to add a
    new method because there is insufficient wiggle-room w/i the
    particular PB method call (If we get into the issue Ted raises where
    we'd have to go back to the server twice because there is a third new
    method call, we're doing our API wrong).

    The protocol needs a version though. We'll be still sending that
    'hrpc' long in the header preamble? Should we add a version long
    after the 'hrpc' long?

    As to a directory of supported methods, do we need this in the
    protocol at all? Can't this be knowledge kept outside of the
    on-the-wire back and forth?

    St.Ack


    --
    Todd Lipcon
    Software Engineer, Cloudera
  • Andrew Purtell at Aug 1, 2012 at 7:39 pm
    I like the idea of "getServerCapabilities()" as a bitset.

    - Andy
    On Wed, Aug 1, 2012 at 11:04 AM, Todd Lipcon wrote:
    One possibility:

    During the IPC handshake, we could send the full version string /
    source checksum. Then, have a client-wide map which caches which
    methods have been found to be supported or not supported for an
    individual version. So, we don't need to maintain the mapping
    ourselves, but we also wouldn't need to do the full retry every time.

    A different idea would be to introduce a call like
    "getServerCapabilities()" which returns a bitmap, and define a bit per
    time that we add a new feature.

    The advantage of these approaches vs a single increasing version
    number is that we sometimes want to backport a new IPC to an older
    version, but not backport all of the intervening IPCs. Having a bitmap
    allows us to "pick and choose" on backports without having to pull in
    a bunch of things we didn't necessarily want.

    -Todd
    On Wed, Aug 1, 2012 at 1:41 AM, Stack wrote:
    On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das wrote:
    Wondering whether we should retain the VersionedProtocol now that we have protobuf implementation for most (all?) of the protocols. I think we still need the version checks and do them when we need to. Take this case:
    1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
    2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest) now has a better implementation called FooMethod_improved(FooMethodRequest).
    3. HBase installations have happened with both the protocol implementations.
    4. Clients should be able to talk to both old and new servers (and invoke the newer implementation of FooMethod if the protocol implements it).

    (4) is possible when the getProtocolVersion is implemented by the protocol at the server. The client could check what the version of the protocol was (assuming VersionedProtocol semantics where the protocol version number is upgraded for such significant changes) and depending on that invoke the appropriate method...

    Having to map version-numbers of protocols to the methods-supported is probably arcane IMO but works..

    The other approach (that wouldn't require the version#) is to do something like - On the client side, get the protocol methods supported at the server (and cache it) and then look this map up whenever needed to decide which method to invoke.

    Any thoughts on whether we should invest time in the second approach yet?
    The VersionedProtocol w/ client being able to interrogate what methods
    a server supports strikes me as a facility that will be rarely used if
    at all and bringing it along, keeping up the directory of supported
    methods, will take a load of work on our part that we'll do less than
    perfectly so should it ever be needed, it won't work because we let it
    go stale.

    What do you reckon?

    The above painted scenario too is a little on the exotic side. We can
    do something like Jimmy suggests in those rare cases we need to add a
    new method because there is insufficient wiggle-room w/i the
    particular PB method call (If we get into the issue Ted raises where
    we'd have to go back to the server twice because there is a third new
    method call, we're doing our API wrong).

    The protocol needs a version though. We'll be still sending that
    'hrpc' long in the header preamble? Should we add a version long
    after the 'hrpc' long?

    As to a directory of supported methods, do we need this in the
    protocol at all? Can't this be knowledge kept outside of the
    on-the-wire back and forth?

    St.Ack


    --
    Todd Lipcon
    Software Engineer, Cloudera


    --
    Best regards,

    - Andy

    Problems worthy of attack prove their worth by hitting back. - Piet
    Hein (via Tom White)
  • Devaraj Das at Aug 3, 2012 at 6:41 pm
    Responses inline..
    On Wed, Aug 1, 2012 at 11:04 AM, Todd Lipcon wrote:
    One possibility:

    During the IPC handshake, we could send the full version string /
    source checksum. Then, have a client-wide map which caches which
    methods have been found to be supported or not supported for an
    individual version. So, we don't need to maintain the mapping
    ourselves, but we also wouldn't need to do the full retry every time.
    Yeah this is what I was thinking as the alternate to the current approach of using VersionedProtocol.
    A different idea would be to introduce a call like
    "getServerCapabilities()" which returns a bitmap, and define a bit per
    time that we add a new feature.

    The advantage of these approaches vs a single increasing version
    number is that we sometimes want to backport a new IPC to an older
    version, but not backport all of the intervening IPCs. Having a bitmap
    allows us to "pick and choose" on backports without having to pull in
    a bunch of things we didn't necessarily want.
    Good point.
    On Wed, Aug 1, 2012 at 1:41 AM, Stack wrote:
    On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das wrote:
    Wondering whether we should retain the VersionedProtocol now that we have protobuf implementation for most (all?) of the protocols. I think we still need the version checks and do them when we need to. Take this case:
    1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
    2. Protocol Foo evolves over time, and the FooMethod(FooMethodRequest) now has a better implementation called FooMethod_improved(FooMethodRequest).
    3. HBase installations have happened with both the protocol implementations.
    4. Clients should be able to talk to both old and new servers (and invoke the newer implementation of FooMethod if the protocol implements it).

    (4) is possible when the getProtocolVersion is implemented by the protocol at the server. The client could check what the version of the protocol was (assuming VersionedProtocol semantics where the protocol version number is upgraded for such significant changes) and depending on that invoke the appropriate method...

    Having to map version-numbers of protocols to the methods-supported is probably arcane IMO but works..

    The other approach (that wouldn't require the version#) is to do something like - On the client side, get the protocol methods supported at the server (and cache it) and then look this map up whenever needed to decide which method to invoke.

    Any thoughts on whether we should invest time in the second approach yet?
    The VersionedProtocol w/ client being able to interrogate what methods
    a server supports strikes me as a facility that will be rarely used if
    at all and bringing it along, keeping up the directory of supported
    methods, will take a load of work on our part that we'll do less than
    perfectly so should it ever be needed, it won't work because we let it
    go stale.
    Yeah, this won't be a common case. It'd (hopefully) be rare. The directory of methods would be the methods in the protocol-interface at the server that could be figured by invoking reflection (and hence staleness issue shouldn't happen).
    What do you reckon?

    The above painted scenario too is a little on the exotic side. We can
    do something like Jimmy suggests in those rare cases we need to add a
    new method because there is insufficient wiggle-room w/i the
    particular PB method call (If we get into the issue Ted raises where
    we'd have to go back to the server twice because there is a third new
    method call, we're doing our API wrong).
    Agree that the exception handling hack can be played here.. In general, having some solution around this might be really helpful *if* we get some API wrong (for e.g., indirect implication on memory by the API semantics) and we need to fix it without breaking compatibility.. In HDFS, listFile proved to be a memory killer for extremely large directories and people implemented the iterator version of the same.
    The protocol needs a version though. We'll be still sending that
    'hrpc' long in the header preamble? Should we add a version long
    after the 'hrpc' long?
    The version in "hrpc" is the RPC version (as opposed to protocol version). I think that's orthogonal to this discussion..
    As to a directory of supported methods, do we need this in the
    protocol at all? Can't this be knowledge kept outside of the
    on-the-wire back and forth?

    St.Ack
    As I answered above, and as Todd also says, it probably makes sense to have a client wide cache for protocol<->supported-methods .. and look up the cache when and if the client needs to decide between different versions of a method, or picking a new method, based on the server it is talking to...
  • Michael Stack at Dec 27, 2012 at 8:05 pm
    So, picking up this thread again because I'm working on
    https://issues.apache.org/jira/browse/HBASE-6521 "
    Address the handling of multiple versions of a protocol"Address the
    handling of multiple versions of a protocol", the original question was
    two-fold as I read it.

    1. Should we keep VersionedProtocol.
    2. How does a client figure if a server supports a particular capability

    On question 1:

    VersionedProtocol [1] does two things. It returns the server version of
    the protocol and separately, a "ProtocolSignature" Writable which allows
    you get a 'hash' of the server's protocol method signatures. There is an
    implication that the server will give out different versions of the
    protocol dependent on what version the client volunteers (not the case) and
    it is implied that the client does something with these method hash
    signatures. It doesn't.

    So, VP is a Writable that returns Writables we don't make use of implying a
    functionality unrealized.

    Thats how I read it. Objections? [3]

    It sounds like at least ProtocolSignature can go. If we did want to go the
    route ProtocolSignature implies, we should probably do the native protobuf
    thing and make use of ServiceDescriptors, protobuf descriptions of what a
    protobuf Service exposes [2].

    That leaves the VPs return of the server protocol version as all that
    remains 'useful'.

    But is it? Is version going to be useful going forward? If we lean on
    version, clients will have to keep a registry of versions to available
    methods. Or ask the server what it has and somehow sort though the return
    to figure what it can and cannot make sense of by method. Sounds like a
    bunch of work.

    At a minimum, VP will have to be protobuf'd so it is going to have to
    change. And we should probably add a bit more info to the return since we
    are going to the trouble of an RPC anyways.

    This serves as a lead in to question 2:

    Protobuf as is helps in the case where an ipc takes an extra parameter or
    adds extra info to the return; the majority of the evolutions that will be
    happening in the ipc interface. But what to do about the scenario Devaraj
    outlines at the head of the thread where we have shipped a method that
    causes the server to OOME in production or we add a method to the server
    that runs ten times faster than the old one? Or probably more likely, the
    server has a whole new 'feature' (as Todd calls it) orthogonal to the set
    the protocol version implies? How does the client figure the new feature
    is available?

    We could have the client try the invocation -- as Jimmy suggests -- and if
    it fails, register the fail in a client-wide map so we avoid retrying on
    each invocation (We should just do this anyways). The client could go back
    to the server and do the above suggested query of server capabilities and
    then adjust the call accordingly or since we are doing an ipc setup call
    anyways, we could have the server return the list of capabilities at this
    time. The client could cache what is available or not and just ask the
    server when convenient for it.

    Using the bitmap shorthand describing what is available seems like it would
    be less work to do than implementing protobuf service
    description/interrogation and then dynamically composing method calls.

    Proposal:

    + Remove VersionedProtocol and SignatureProtocol
    + Instead of VP, add a new Interface called VersionedService or probably
    better, ProtocolDescriptor, that all RPC Protocols implement. It has
    methods (getDescriptor) to return a pb Message that has the server version
    of the protocol and a bitmap of feature's the server implements. This is
    the call we will make when we set up the ipc proxy. Clients can cache the
    result. Every time we change a Service/Protocol, we set a particular bit
    in the Service/Protocol bitmap. This new Interface might also return the
    long form pb ServiceDescriptors (the pb getDescriptorForType from Service
    Interface). It could be useful debugging.

    What you lot think?

    St.Ack

    1.
    http://svn.apache.org/viewvc/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/VersionedProtocol.java?view=markup
    2.
    https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/Service
    3. We have VP and PS because, as I understand it, we once that we would
    support choosing between protocol and protocol versions and that we'd
    support both protobufs and Writables. This is no longer an wanted.







    On Fri, Aug 3, 2012 at 11:40 AM, Devaraj Das wrote:

    Responses inline..
    On Wed, Aug 1, 2012 at 11:04 AM, Todd Lipcon wrote:
    One possibility:

    During the IPC handshake, we could send the full version string /
    source checksum. Then, have a client-wide map which caches which
    methods have been found to be supported or not supported for an
    individual version. So, we don't need to maintain the mapping
    ourselves, but we also wouldn't need to do the full retry every time.
    Yeah this is what I was thinking as the alternate to the current approach
    of using VersionedProtocol.
    A different idea would be to introduce a call like
    "getServerCapabilities()" which returns a bitmap, and define a bit per
    time that we add a new feature.

    The advantage of these approaches vs a single increasing version
    number is that we sometimes want to backport a new IPC to an older
    version, but not backport all of the intervening IPCs. Having a bitmap
    allows us to "pick and choose" on backports without having to pull in
    a bunch of things we didn't necessarily want.
    Good point.
    On Wed, Aug 1, 2012 at 1:41 AM, Stack wrote:
    On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das wrote:
    Wondering whether we should retain the VersionedProtocol now that we
    have protobuf implementation for most (all?) of the protocols. I think we
    still need the version checks and do them when we need to. Take this case:
    1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
    2. Protocol Foo evolves over time, and the
    FooMethod(FooMethodRequest) now has a better implementation called
    FooMethod_improved(FooMethodRequest).
    3. HBase installations have happened with both the protocol
    implementations.
    4. Clients should be able to talk to both old and new servers (and
    invoke the newer implementation of FooMethod if the protocol implements it).
    (4) is possible when the getProtocolVersion is implemented by the
    protocol at the server. The client could check what the version of the
    protocol was (assuming VersionedProtocol semantics where the protocol
    version number is upgraded for such significant changes) and depending on
    that invoke the appropriate method...
    Having to map version-numbers of protocols to the methods-supported
    is probably arcane IMO but works..
    The other approach (that wouldn't require the version#) is to do
    something like - On the client side, get the protocol methods supported at
    the server (and cache it) and then look this map up whenever needed to
    decide which method to invoke.
    Any thoughts on whether we should invest time in the second approach
    yet?
    The VersionedProtocol w/ client being able to interrogate what methods
    a server supports strikes me as a facility that will be rarely used if
    at all and bringing it along, keeping up the directory of supported
    methods, will take a load of work on our part that we'll do less than
    perfectly so should it ever be needed, it won't work because we let it
    go stale.
    Yeah, this won't be a common case. It'd (hopefully) be rare. The directory
    of methods would be the methods in the protocol-interface at the server
    that could be figured by invoking reflection (and hence staleness issue
    shouldn't happen).
    What do you reckon?

    The above painted scenario too is a little on the exotic side. We can
    do something like Jimmy suggests in those rare cases we need to add a
    new method because there is insufficient wiggle-room w/i the
    particular PB method call (If we get into the issue Ted raises where
    we'd have to go back to the server twice because there is a third new
    method call, we're doing our API wrong).
    Agree that the exception handling hack can be played here.. In general,
    having some solution around this might be really helpful *if* we get some
    API wrong (for e.g., indirect implication on memory by the API semantics)
    and we need to fix it without breaking compatibility.. In HDFS, listFile
    proved to be a memory killer for extremely large directories and people
    implemented the iterator version of the same.
    The protocol needs a version though. We'll be still sending that
    'hrpc' long in the header preamble? Should we add a version long
    after the 'hrpc' long?
    The version in "hrpc" is the RPC version (as opposed to protocol version).
    I think that's orthogonal to this discussion..
    As to a directory of supported methods, do we need this in the
    protocol at all? Can't this be knowledge kept outside of the
    on-the-wire back and forth?

    St.Ack
    As I answered above, and as Todd also says, it probably makes sense to
    have a client wide cache for protocol<->supported-methods .. and look up
    the cache when and if the client needs to decide between different versions
    of a method, or picking a new method, based on the server it is talking
    to...
  • Jimmy Xiang at Dec 27, 2012 at 9:13 pm
    +1 for removing VersionedProtocol and SignatureProtocol
    +0 for VersionedService/ProtocolDescriptor

    If we do have VersionedService/ProtocolDesscriptor, it will most likely be
    used in some
    mixed environment (most likely, new client and mixed versions of HBase
    servers, since old client doesn't
    know any new feature, old client doesn't assume an existing feature will be
    gone in the future either).

    With PB, I think we are going to support a rolling-upgrade path. That
    means, some mixed
    versions of HBase servers can be compatible. For enterprise, I think it is
    not that hard to
    maintain compatible HBase clusters. So I don't think it is absolutely
    needed.

    Thanks,
    Jimmy
    On Thu, Dec 27, 2012 at 12:05 PM, Stack wrote:

    So, picking up this thread again because I'm working on
    https://issues.apache.org/jira/browse/HBASE-6521 "
    Address the handling of multiple versions of a protocol"Address the
    handling of multiple versions of a protocol", the original question was
    two-fold as I read it.

    1. Should we keep VersionedProtocol.
    2. How does a client figure if a server supports a particular capability

    On question 1:

    VersionedProtocol [1] does two things. It returns the server version of
    the protocol and separately, a "ProtocolSignature" Writable which allows
    you get a 'hash' of the server's protocol method signatures. There is an
    implication that the server will give out different versions of the
    protocol dependent on what version the client volunteers (not the case) and
    it is implied that the client does something with these method hash
    signatures. It doesn't.

    So, VP is a Writable that returns Writables we don't make use of implying a
    functionality unrealized.

    Thats how I read it. Objections? [3]

    It sounds like at least ProtocolSignature can go. If we did want to go the
    route ProtocolSignature implies, we should probably do the native protobuf
    thing and make use of ServiceDescriptors, protobuf descriptions of what a
    protobuf Service exposes [2].

    That leaves the VPs return of the server protocol version as all that
    remains 'useful'.

    But is it? Is version going to be useful going forward? If we lean on
    version, clients will have to keep a registry of versions to available
    methods. Or ask the server what it has and somehow sort though the return
    to figure what it can and cannot make sense of by method. Sounds like a
    bunch of work.

    At a minimum, VP will have to be protobuf'd so it is going to have to
    change. And we should probably add a bit more info to the return since we
    are going to the trouble of an RPC anyways.

    This serves as a lead in to question 2:

    Protobuf as is helps in the case where an ipc takes an extra parameter or
    adds extra info to the return; the majority of the evolutions that will be
    happening in the ipc interface. But what to do about the scenario Devaraj
    outlines at the head of the thread where we have shipped a method that
    causes the server to OOME in production or we add a method to the server
    that runs ten times faster than the old one? Or probably more likely, the
    server has a whole new 'feature' (as Todd calls it) orthogonal to the set
    the protocol version implies? How does the client figure the new feature
    is available?

    We could have the client try the invocation -- as Jimmy suggests -- and if
    it fails, register the fail in a client-wide map so we avoid retrying on
    each invocation (We should just do this anyways). The client could go back
    to the server and do the above suggested query of server capabilities and
    then adjust the call accordingly or since we are doing an ipc setup call
    anyways, we could have the server return the list of capabilities at this
    time. The client could cache what is available or not and just ask the
    server when convenient for it.

    Using the bitmap shorthand describing what is available seems like it would
    be less work to do than implementing protobuf service
    description/interrogation and then dynamically composing method calls.

    Proposal:

    + Remove VersionedProtocol and SignatureProtocol
    + Instead of VP, add a new Interface called VersionedService or probably
    better, ProtocolDescriptor, that all RPC Protocols implement. It has
    methods (getDescriptor) to return a pb Message that has the server version
    of the protocol and a bitmap of feature's the server implements. This is
    the call we will make when we set up the ipc proxy. Clients can cache the
    result. Every time we change a Service/Protocol, we set a particular bit
    in the Service/Protocol bitmap. This new Interface might also return the
    long form pb ServiceDescriptors (the pb getDescriptorForType from Service
    Interface). It could be useful debugging.

    What you lot think?

    St.Ack

    1.

    http://svn.apache.org/viewvc/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/VersionedProtocol.java?view=markup
    2.

    https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/Service
    3. We have VP and PS because, as I understand it, we once that we would
    support choosing between protocol and protocol versions and that we'd
    support both protobufs and Writables. This is no longer an wanted.







    On Fri, Aug 3, 2012 at 11:40 AM, Devaraj Das wrote:

    Responses inline..
    On Wed, Aug 1, 2012 at 11:04 AM, Todd Lipcon wrote:
    One possibility:

    During the IPC handshake, we could send the full version string /
    source checksum. Then, have a client-wide map which caches which
    methods have been found to be supported or not supported for an
    individual version. So, we don't need to maintain the mapping
    ourselves, but we also wouldn't need to do the full retry every time.
    Yeah this is what I was thinking as the alternate to the current approach
    of using VersionedProtocol.
    A different idea would be to introduce a call like
    "getServerCapabilities()" which returns a bitmap, and define a bit per
    time that we add a new feature.

    The advantage of these approaches vs a single increasing version
    number is that we sometimes want to backport a new IPC to an older
    version, but not backport all of the intervening IPCs. Having a bitmap
    allows us to "pick and choose" on backports without having to pull in
    a bunch of things we didn't necessarily want.
    Good point.
    On Wed, Aug 1, 2012 at 1:41 AM, Stack wrote:
    On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das <ddas@hortonworks.com>
    wrote:
    Wondering whether we should retain the VersionedProtocol now that we
    have protobuf implementation for most (all?) of the protocols. I think we
    still need the version checks and do them when we need to. Take this
    case:
    1. Protocol Foo has as one of the methods
    FooMethod(FooMethodRequest).
    2. Protocol Foo evolves over time, and the
    FooMethod(FooMethodRequest) now has a better implementation called
    FooMethod_improved(FooMethodRequest).
    3. HBase installations have happened with both the protocol
    implementations.
    4. Clients should be able to talk to both old and new servers (and
    invoke the newer implementation of FooMethod if the protocol implements
    it).
    (4) is possible when the getProtocolVersion is implemented by the
    protocol at the server. The client could check what the version of the
    protocol was (assuming VersionedProtocol semantics where the protocol
    version number is upgraded for such significant changes) and depending on
    that invoke the appropriate method...
    Having to map version-numbers of protocols to the methods-supported
    is probably arcane IMO but works..
    The other approach (that wouldn't require the version#) is to do
    something like - On the client side, get the protocol methods supported at
    the server (and cache it) and then look this map up whenever needed to
    decide which method to invoke.
    Any thoughts on whether we should invest time in the second approach
    yet?
    The VersionedProtocol w/ client being able to interrogate what
    methods
    a server supports strikes me as a facility that will be rarely used
    if
    at all and bringing it along, keeping up the directory of supported
    methods, will take a load of work on our part that we'll do less than
    perfectly so should it ever be needed, it won't work because we let
    it
    go stale.
    Yeah, this won't be a common case. It'd (hopefully) be rare. The directory
    of methods would be the methods in the protocol-interface at the server
    that could be figured by invoking reflection (and hence staleness issue
    shouldn't happen).
    What do you reckon?

    The above painted scenario too is a little on the exotic side. We
    can
    do something like Jimmy suggests in those rare cases we need to add a
    new method because there is insufficient wiggle-room w/i the
    particular PB method call (If we get into the issue Ted raises where
    we'd have to go back to the server twice because there is a third new
    method call, we're doing our API wrong).
    Agree that the exception handling hack can be played here.. In general,
    having some solution around this might be really helpful *if* we get some
    API wrong (for e.g., indirect implication on memory by the API semantics)
    and we need to fix it without breaking compatibility.. In HDFS, listFile
    proved to be a memory killer for extremely large directories and people
    implemented the iterator version of the same.
    The protocol needs a version though. We'll be still sending that
    'hrpc' long in the header preamble? Should we add a version long
    after the 'hrpc' long?
    The version in "hrpc" is the RPC version (as opposed to protocol version).
    I think that's orthogonal to this discussion..
    As to a directory of supported methods, do we need this in the
    protocol at all? Can't this be knowledge kept outside of the
    on-the-wire back and forth?

    St.Ack
    As I answered above, and as Todd also says, it probably makes sense to
    have a client wide cache for protocol<->supported-methods .. and look up
    the cache when and if the client needs to decide between different versions
    of a method, or picking a new method, based on the server it is talking
    to...
  • Enis Söztutar at Dec 28, 2012 at 1:38 am
    I think what Devaraj describes is a valid use case, and I am sure we will
    need it a few times. However, I suspect each of these might be unique, and
    we have to deal with how to handle backwards-forwards compat from the
    client differently (image META moving to zk, after 0.96). So we cannot
    easily generalize, and we may still have to drop support for features
    gradually.

    If we still keep the version, do we bump it every time a parameter is added
    to a method, or only when a new method is added? It does not sound very
    maintainable.

    Not knowing much about the recent changes, why don't we go full PB, and
    define actual rpc methods as services? (as in
    https://developers.google.com/protocol-buffers/docs/proto#services)

    On Thu, Dec 27, 2012 at 1:13 PM, Jimmy Xiang wrote:

    +1 for removing VersionedProtocol and SignatureProtocol
    +0 for VersionedService/ProtocolDescriptor

    If we do have VersionedService/ProtocolDesscriptor, it will most likely be
    used in some
    mixed environment (most likely, new client and mixed versions of HBase
    servers, since old client doesn't
    know any new feature, old client doesn't assume an existing feature will be
    gone in the future either).

    With PB, I think we are going to support a rolling-upgrade path. That
    means, some mixed
    versions of HBase servers can be compatible. For enterprise, I think it is
    not that hard to
    maintain compatible HBase clusters. So I don't think it is absolutely
    needed.

    Thanks,
    Jimmy
    On Thu, Dec 27, 2012 at 12:05 PM, Stack wrote:

    So, picking up this thread again because I'm working on
    https://issues.apache.org/jira/browse/HBASE-6521 "
    Address the handling of multiple versions of a protocol"Address the
    handling of multiple versions of a protocol", the original question was
    two-fold as I read it.

    1. Should we keep VersionedProtocol.
    2. How does a client figure if a server supports a particular capability

    On question 1:

    VersionedProtocol [1] does two things. It returns the server version of
    the protocol and separately, a "ProtocolSignature" Writable which allows
    you get a 'hash' of the server's protocol method signatures. There is an
    implication that the server will give out different versions of the
    protocol dependent on what version the client volunteers (not the case) and
    it is implied that the client does something with these method hash
    signatures. It doesn't.

    So, VP is a Writable that returns Writables we don't make use of
    implying a
    functionality unrealized.

    Thats how I read it. Objections? [3]

    It sounds like at least ProtocolSignature can go. If we did want to go the
    route ProtocolSignature implies, we should probably do the native protobuf
    thing and make use of ServiceDescriptors, protobuf descriptions of what a
    protobuf Service exposes [2].

    That leaves the VPs return of the server protocol version as all that
    remains 'useful'.

    But is it? Is version going to be useful going forward? If we lean on
    version, clients will have to keep a registry of versions to available
    methods. Or ask the server what it has and somehow sort though the return
    to figure what it can and cannot make sense of by method. Sounds like a
    bunch of work.

    At a minimum, VP will have to be protobuf'd so it is going to have to
    change. And we should probably add a bit more info to the return since we
    are going to the trouble of an RPC anyways.

    This serves as a lead in to question 2:

    Protobuf as is helps in the case where an ipc takes an extra parameter or
    adds extra info to the return; the majority of the evolutions that will be
    happening in the ipc interface. But what to do about the scenario Devaraj
    outlines at the head of the thread where we have shipped a method that
    causes the server to OOME in production or we add a method to the server
    that runs ten times faster than the old one? Or probably more likely, the
    server has a whole new 'feature' (as Todd calls it) orthogonal to the set
    the protocol version implies? How does the client figure the new feature
    is available?

    We could have the client try the invocation -- as Jimmy suggests -- and if
    it fails, register the fail in a client-wide map so we avoid retrying on
    each invocation (We should just do this anyways). The client could go back
    to the server and do the above suggested query of server capabilities and
    then adjust the call accordingly or since we are doing an ipc setup call
    anyways, we could have the server return the list of capabilities at this
    time. The client could cache what is available or not and just ask the
    server when convenient for it.

    Using the bitmap shorthand describing what is available seems like it would
    be less work to do than implementing protobuf service
    description/interrogation and then dynamically composing method calls.

    Proposal:

    + Remove VersionedProtocol and SignatureProtocol
    + Instead of VP, add a new Interface called VersionedService or probably
    better, ProtocolDescriptor, that all RPC Protocols implement. It has
    methods (getDescriptor) to return a pb Message that has the server version
    of the protocol and a bitmap of feature's the server implements. This is
    the call we will make when we set up the ipc proxy. Clients can cache the
    result. Every time we change a Service/Protocol, we set a particular bit
    in the Service/Protocol bitmap. This new Interface might also return the
    long form pb ServiceDescriptors (the pb getDescriptorForType from Service
    Interface). It could be useful debugging.

    What you lot think?

    St.Ack

    1.

    http://svn.apache.org/viewvc/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/VersionedProtocol.java?view=markup
    2.

    https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/Service
    3. We have VP and PS because, as I understand it, we once that we would
    support choosing between protocol and protocol versions and that we'd
    support both protobufs and Writables. This is no longer an wanted.







    On Fri, Aug 3, 2012 at 11:40 AM, Devaraj Das wrote:

    Responses inline..
    On Wed, Aug 1, 2012 at 11:04 AM, Todd Lipcon <todd@cloudera.com>
    wrote:
    One possibility:

    During the IPC handshake, we could send the full version string /
    source checksum. Then, have a client-wide map which caches which
    methods have been found to be supported or not supported for an
    individual version. So, we don't need to maintain the mapping
    ourselves, but we also wouldn't need to do the full retry every
    time.
    Yeah this is what I was thinking as the alternate to the current
    approach
    of using VersionedProtocol.
    A different idea would be to introduce a call like
    "getServerCapabilities()" which returns a bitmap, and define a bit
    per
    time that we add a new feature.

    The advantage of these approaches vs a single increasing version
    number is that we sometimes want to backport a new IPC to an older
    version, but not backport all of the intervening IPCs. Having a
    bitmap
    allows us to "pick and choose" on backports without having to pull
    in
    a bunch of things we didn't necessarily want.
    Good point.
    On Wed, Aug 1, 2012 at 1:41 AM, Stack wrote:
    On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das <ddas@hortonworks.com
    wrote:
    Wondering whether we should retain the VersionedProtocol now that
    we
    have protobuf implementation for most (all?) of the protocols. I think
    we
    still need the version checks and do them when we need to. Take this
    case:
    1. Protocol Foo has as one of the methods
    FooMethod(FooMethodRequest).
    2. Protocol Foo evolves over time, and the
    FooMethod(FooMethodRequest) now has a better implementation called
    FooMethod_improved(FooMethodRequest).
    3. HBase installations have happened with both the protocol
    implementations.
    4. Clients should be able to talk to both old and new servers (and
    invoke the newer implementation of FooMethod if the protocol implements
    it).
    (4) is possible when the getProtocolVersion is implemented by the
    protocol at the server. The client could check what the version of the
    protocol was (assuming VersionedProtocol semantics where the protocol
    version number is upgraded for such significant changes) and depending
    on
    that invoke the appropriate method...
    Having to map version-numbers of protocols to the
    methods-supported
    is probably arcane IMO but works..
    The other approach (that wouldn't require the version#) is to do
    something like - On the client side, get the protocol methods supported at
    the server (and cache it) and then look this map up whenever needed to
    decide which method to invoke.
    Any thoughts on whether we should invest time in the second
    approach
    yet?
    The VersionedProtocol w/ client being able to interrogate what
    methods
    a server supports strikes me as a facility that will be rarely used
    if
    at all and bringing it along, keeping up the directory of supported
    methods, will take a load of work on our part that we'll do less
    than
    perfectly so should it ever be needed, it won't work because we let
    it
    go stale.
    Yeah, this won't be a common case. It'd (hopefully) be rare. The directory
    of methods would be the methods in the protocol-interface at the server
    that could be figured by invoking reflection (and hence staleness issue
    shouldn't happen).
    What do you reckon?

    The above painted scenario too is a little on the exotic side. We
    can
    do something like Jimmy suggests in those rare cases we need to
    add a
    new method because there is insufficient wiggle-room w/i the
    particular PB method call (If we get into the issue Ted raises
    where
    we'd have to go back to the server twice because there is a third
    new
    method call, we're doing our API wrong).
    Agree that the exception handling hack can be played here.. In general,
    having some solution around this might be really helpful *if* we get
    some
    API wrong (for e.g., indirect implication on memory by the API
    semantics)
    and we need to fix it without breaking compatibility.. In HDFS,
    listFile
    proved to be a memory killer for extremely large directories and people
    implemented the iterator version of the same.
    The protocol needs a version though. We'll be still sending that
    'hrpc' long in the header preamble? Should we add a version long
    after the 'hrpc' long?
    The version in "hrpc" is the RPC version (as opposed to protocol version).
    I think that's orthogonal to this discussion..
    As to a directory of supported methods, do we need this in the
    protocol at all? Can't this be knowledge kept outside of the
    on-the-wire back and forth?

    St.Ack
    As I answered above, and as Todd also says, it probably makes sense to
    have a client wide cache for protocol<->supported-methods .. and look
    up
    the cache when and if the client needs to decide between different versions
    of a method, or picking a new method, based on the server it is talking
    to...
  • Michael Stack at Dec 28, 2012 at 3:12 am

    On Thu, Dec 27, 2012 at 5:37 PM, Enis Söztutar wrote:

    I think what Devaraj describes is a valid use case, and I am sure we will
    need it a few times. However, I suspect each of these might be unique, and
    we have to deal with how to handle backwards-forwards compat from the
    client differently (image META moving to zk, after 0.96). So we cannot
    easily generalize, and we may still have to drop support for features
    gradually.
    I agree. Just trying to make sure we have some facility in place to help
    us over some of the humps.

    If we still keep the version, do we bump it every time a parameter is added
    to a method, or only when a new method is added? It does not sound very
    maintainable.
    Version alone won't work.

    The 0.94 branch might be version 100.

    The 0.96 branch might be 105.

    If we want to backport the method that cuts CO2 emissions by 25% but only
    this method, what version do we give 0.94's protocol? We could make it 101
    but maybe 0.96.3 was 101? We could give it a version that has not been
    seen before but then it gets a little awkward to manage and understand.
    Regardless, client would have to keep a dictionary of methods per version
    number, a pain.

    The suggestion above was that the server gives off a list of features
    written in shorthand, a bitmap, where bits are set when a feature is added.
    This way a client can look at the bitmap and see if the C02 saving feature
    is available in the 0.94 server and if so, use that method.


    Not knowing much about the recent changes, why don't we go full PB, and
    define actual rpc methods as services? (as in
    https://developers.google.com/protocol-buffers/docs/proto#services)
    I thought about it. It has some nice facility that comes for free. For
    example, you can get an aforementioned pb'd description of the "protocol"
    and actually used the return to compose an invocation against the server.
    Nice. Our 'protocols' actually already implement Service.Interface from
    pb (actually Service.BlockingInterface). I'm not sure why as it looks to
    complicate things going by a quick examination today (I started stripping
    it out to see what would break). So it would not take too much to get a
    Stub on clientside and have servers implement the Service. We could try
    shoehorning our RPC so it implemented the necessary RpcController, etc.
    Interfaces.

    But it would seem Service is deprecated with a good while now [1] and folks
    are encouraged to do otherwise because as is, the generated code makes for
    too much "indirection" [1].

    I could try playing around some more w/ using Service to learn more about
    this 'indirection'. We could use the long-hand service descriptor in place
    of the above suggested bitmap figuring what the server provides.

    St.Ack

    1.
    https://developers.google.com/protocol-buffers/docs/reference/java-generated#service
  • Devaraj Das at Dec 28, 2012 at 9:32 am
    Now thinking more about it, if a server implements a method more
    efficiently, we probably could have new fields in the method argument to
    indicate the client is willing to accept the new semantics. A new server
    could detect that (by checking for existence of such a field), and an old
    server would simply ignore that field. The new server could do a different
    processing of the request, and the response, although the same message
    type, might have new fields to capture the response under the new semantics.

    Over time, the method code might evolve, and might become unmaintainable
    ... that's the worry. It might make sense to just break up the method into
    multiple implementations..

    I am +1 for getting a PB'ed description of the protocol, the client caching
    it, and then deciding which method to invoke based on what's supported in
    the server. This will also address the orthogonal case of the server
    letting the client know all its capabilities.

    Thoughts?

    On Thu, Dec 27, 2012 at 7:11 PM, Stack wrote:
    On Thu, Dec 27, 2012 at 5:37 PM, Enis Söztutar wrote:

    I think what Devaraj describes is a valid use case, and I am sure we will
    need it a few times. However, I suspect each of these might be unique, and
    we have to deal with how to handle backwards-forwards compat from the
    client differently (image META moving to zk, after 0.96). So we cannot
    easily generalize, and we may still have to drop support for features
    gradually.
    I agree. Just trying to make sure we have some facility in place to help
    us over some of the humps.

    If we still keep the version, do we bump it every time a parameter is added
    to a method, or only when a new method is added? It does not sound very
    maintainable.
    Version alone won't work.

    The 0.94 branch might be version 100.

    The 0.96 branch might be 105.

    If we want to backport the method that cuts CO2 emissions by 25% but only
    this method, what version do we give 0.94's protocol? We could make it 101
    but maybe 0.96.3 was 101? We could give it a version that has not been
    seen before but then it gets a little awkward to manage and understand.
    Regardless, client would have to keep a dictionary of methods per version
    number, a pain.

    The suggestion above was that the server gives off a list of features
    written in shorthand, a bitmap, where bits are set when a feature is added.
    This way a client can look at the bitmap and see if the C02 saving feature
    is available in the 0.94 server and if so, use that method.


    Not knowing much about the recent changes, why don't we go full PB, and
    define actual rpc methods as services? (as in
    https://developers.google.com/protocol-buffers/docs/proto#services)
    I thought about it. It has some nice facility that comes for free. For
    example, you can get an aforementioned pb'd description of the "protocol"
    and actually used the return to compose an invocation against the server.
    Nice. Our 'protocols' actually already implement Service.Interface from
    pb (actually Service.BlockingInterface). I'm not sure why as it looks to
    complicate things going by a quick examination today (I started stripping
    it out to see what would break). So it would not take too much to get a
    Stub on clientside and have servers implement the Service. We could try
    shoehorning our RPC so it implemented the necessary RpcController, etc.
    Interfaces.

    But it would seem Service is deprecated with a good while now [1] and folks
    are encouraged to do otherwise because as is, the generated code makes for
    too much "indirection" [1].

    I could try playing around some more w/ using Service to learn more about
    this 'indirection'. We could use the long-hand service descriptor in place
    of the above suggested bitmap figuring what the server provides.

    St.Ack

    1.

    https://developers.google.com/protocol-buffers/docs/reference/java-generated#service
  • Michael Stack at Dec 28, 2012 at 4:59 pm

    On Fri, Dec 28, 2012 at 1:31 AM, Devaraj Das wrote:

    Now thinking more about it, if a server implements a method more
    efficiently, we probably could have new fields in the method argument to
    indicate the client is willing to accept the new semantics. A new server
    could detect that (by checking for existence of such a field), and an old
    server would simply ignore that field. The new server could do a different
    processing of the request, and the response, although the same message
    type, might have new fields to capture the response under the new
    semantics.

    Over time, the method code might evolve, and might become unmaintainable
    ... that's the worry. It might make sense to just break up the method into
    multiple implementations..
    Yes. Protobufs gives us wiggle-room.


    I am +1 for getting a PB'ed description of the protocol, the client caching
    it, and then deciding which method to invoke based on what's supported in
    the server. This will also address the orthogonal case of the server
    letting the client know all its capabilities.

    This is how a client would learn of completely new functionality that has
    been added to the server?

    On client setup of proxy, as first request, instead of asking server for
    the version of the protocol it is serving, instead it could ask the server
    for the pb'd description of the protocol [1] and the client could look at
    this to see if the server supported new functionality?

    The returned descriptor would be much fatter than a bitmap.

    St.Ack

    1.
    https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/Descriptors.ServiceDescriptor
  • Devaraj Das at Dec 28, 2012 at 5:22 pm

    On Fri, Dec 28, 2012 at 8:59 AM, Stack wrote:
    On Fri, Dec 28, 2012 at 1:31 AM, Devaraj Das wrote:

    Now thinking more about it, if a server implements a method more
    efficiently, we probably could have new fields in the method argument to
    indicate the client is willing to accept the new semantics. A new server
    could detect that (by checking for existence of such a field), and an old
    server would simply ignore that field. The new server could do a different
    processing of the request, and the response, although the same message
    type, might have new fields to capture the response under the new
    semantics.

    Over time, the method code might evolve, and might become unmaintainable
    ... that's the worry. It might make sense to just break up the method into
    multiple implementations..
    Yes. Protobufs gives us wiggle-room.


    I am +1 for getting a PB'ed description of the protocol, the client caching
    it, and then deciding which method to invoke based on what's supported in
    the server. This will also address the orthogonal case of the server
    letting the client know all its capabilities.

    This is how a client would learn of completely new functionality that has
    been added to the server?

    On client setup of proxy, as first request, instead of asking server for
    the version of the protocol it is serving, instead it could ask the server
    for the pb'd description of the protocol [1] and the client could look at
    this to see if the server supported new functionality?

    The returned descriptor would be much fatter than a bitmap.
    Bitmap is fine as well if the PB'ed representation is too verbose.
  • Michael Stack at Jan 2, 2013 at 12:44 am

    On Thu, Dec 27, 2012 at 7:11 PM, Stack wrote:

    Not knowing much about the recent changes, why don't we go full PB, and
    define actual rpc methods as services? (as in
    https://developers.google.com/protocol-buffers/docs/proto#services)
    I thought about it. It has some nice facility that comes for free. For
    example, you can get an aforementioned pb'd description of the "protocol"
    and actually used the return to compose an invocation against the server.
    Nice. Our 'protocols' actually already implement Service.Interface from
    pb (actually Service.BlockingInterface). I'm not sure why as it looks to
    complicate things going by a quick examination today (I started stripping
    it out to see what would break). So it would not take too much to get a
    Stub on clientside and have servers implement the Service. We could try
    shoehorning our RPC so it implemented the necessary RpcController, etc.
    Interfaces.

    But it would seem Service is deprecated with a good while now [1] and
    folks are encouraged to do otherwise because as is, the generated code
    makes for too much "indirection" [1].

    I could try playing around some more w/ using Service to learn more about
    this 'indirection'. We could use the long-hand service descriptor in place
    of the above suggested bitmap figuring what the server provides.
    I experimented hooking up protobuf Service to our RPC. I put up a patch
    over on https://issues.apache.org/jira/browse/HBASE-6521 along w/ some
    notes made while messing.

    The main 'pro' is that our rpc would get a much needed spring cleaning.
    Main 'con' is that we would be changing code (smile). The main TODO is
    making sure no performance degradation (should be none server-side, need to
    make sure same is true client-side).

    This experiment has made me change my opinion regards 'versioning'. Above
    I suggest we remove VersionedProtocol and add in instead a protobuf
    ProtocolDescriptor that would have a 'version' as well as a short and long
    form description of server 'features'. Now I think we should just punt on
    version/descriptors altogether. Lets just go the route where a method is
    supported or not. That methods take a protobuf request and returns a
    protobuf response, as has been said already, gives us some wriggle room to
    evolve methods as time goes by. For protocol migrations that require more
    this 'vocabulary', lets deal w/ them on a case by case basis (As per Enis
    above).

    St.Ack
  • Elliott Clark at Jan 2, 2013 at 10:35 pm
    Removing the versioning altogether seems good. That leads to much less
    coupling between the client and the server.

    I would vote to use BlockingInterface (to replace our versioned protocol
    class) everywhere and just write our own rpc/ipc. Stack walked me through
    some of the code that is needed for using all of the Protobuf Service and
    Protobuf Blocking Channels; That route seems to have lots of it's own
    cruft. So if we're going to have a clean up, we shouldn't start out with
    something knowing the result will be crufty.

    Additionally we should move the exception responses into either the header
    or the body. As it currently stands having to conditionally cast the next
    message into either a response or an error just seems like we're
    re-implementing protobuf's optional.



    On Tue, Jan 1, 2013 at 4:44 PM, Stack wrote:
    On Thu, Dec 27, 2012 at 7:11 PM, Stack wrote:

    Not knowing much about the recent changes, why don't we go full PB, and
    define actual rpc methods as services? (as in
    https://developers.google.com/protocol-buffers/docs/proto#services)
    I thought about it. It has some nice facility that comes for free. For
    example, you can get an aforementioned pb'd description of the "protocol"
    and actually used the return to compose an invocation against the server.
    Nice. Our 'protocols' actually already implement Service.Interface from
    pb (actually Service.BlockingInterface). I'm not sure why as it looks to
    complicate things going by a quick examination today (I started stripping
    it out to see what would break). So it would not take too much to get a
    Stub on clientside and have servers implement the Service. We could try
    shoehorning our RPC so it implemented the necessary RpcController, etc.
    Interfaces.

    But it would seem Service is deprecated with a good while now [1] and
    folks are encouraged to do otherwise because as is, the generated code
    makes for too much "indirection" [1].

    I could try playing around some more w/ using Service to learn more about
    this 'indirection'. We could use the long-hand service descriptor in place
    of the above suggested bitmap figuring what the server provides.
    I experimented hooking up protobuf Service to our RPC. I put up a patch
    over on https://issues.apache.org/jira/browse/HBASE-6521 along w/ some
    notes made while messing.

    The main 'pro' is that our rpc would get a much needed spring cleaning.
    Main 'con' is that we would be changing code (smile). The main TODO is
    making sure no performance degradation (should be none server-side, need to
    make sure same is true client-side).

    This experiment has made me change my opinion regards 'versioning'. Above
    I suggest we remove VersionedProtocol and add in instead a protobuf
    ProtocolDescriptor that would have a 'version' as well as a short and long
    form description of server 'features'. Now I think we should just punt on
    version/descriptors altogether. Lets just go the route where a method is
    supported or not. That methods take a protobuf request and returns a
    protobuf response, as has been said already, gives us some wriggle room to
    evolve methods as time goes by. For protocol migrations that require more
    this 'vocabulary', lets deal w/ them on a case by case basis (As per Enis
    above).

    St.Ack
  • Michael Stack at Jan 7, 2013 at 9:15 pm

    On Wed, Jan 2, 2013 at 2:34 PM, Elliott Clark wrote:

    Removing the versioning altogether seems good. That leads to much less
    coupling between the client and the server.

    I would vote to use BlockingInterface (to replace our versioned protocol
    class) everywhere and just write our own rpc/ipc. Stack walked me through
    some of the code that is needed for using all of the Protobuf Service and
    Protobuf Blocking Channels; That route seems to have lots of it's own
    cruft. So if we're going to have a clean up, we shouldn't start out with
    something knowing the result will be crufty.
    Let me try doing the above (Removing versioning and not going the pb
    Service route). We can't use BlockingInterface to replace
    VersionedProtocol... BIs do not have a common ancestor. Let me play
    around.... I'll be back.

    Additionally we should move the exception responses into either the header
    or the body. As it currently stands having to conditionally cast the next
    message into either a response or an error just seems like we're
    re-implementing protobuf's optional.
    I think this a good idea. Will try this too.

    Thanks E,
    St.Ack

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupdev @
categorieshbase, hadoop
postedJul 31, '12 at 12:47a
activeJan 7, '13 at 9:15p
posts18
users8
websitehbase.apache.org

People

Translate

site design / logo © 2022 Grokbase