So, picking up this thread again because I'm working on
https://issues.apache.org/jira/browse/HBASE-6521 "
Address the handling of multiple versions of a protocol"Address the
handling of multiple versions of a protocol", the original question was
two-fold as I read it.
1. Should we keep VersionedProtocol.
2. How does a client figure if a server supports a particular capability
On question 1:
VersionedProtocol [1] does two things. It returns the server version of
the protocol and separately, a "ProtocolSignature" Writable which allows
you get a 'hash' of the server's protocol method signatures. There is an
implication that the server will give out different versions of the
protocol dependent on what version the client volunteers (not the case) and
it is implied that the client does something with these method hash
signatures. It doesn't.
So, VP is a Writable that returns Writables we don't make use of implying a
functionality unrealized.
Thats how I read it. Objections? [3]
It sounds like at least ProtocolSignature can go. If we did want to go the
route ProtocolSignature implies, we should probably do the native protobuf
thing and make use of ServiceDescriptors, protobuf descriptions of what a
protobuf Service exposes [2].
That leaves the VPs return of the server protocol version as all that
remains 'useful'.
But is it? Is version going to be useful going forward? If we lean on
version, clients will have to keep a registry of versions to available
methods. Or ask the server what it has and somehow sort though the return
to figure what it can and cannot make sense of by method. Sounds like a
bunch of work.
At a minimum, VP will have to be protobuf'd so it is going to have to
change. And we should probably add a bit more info to the return since we
are going to the trouble of an RPC anyways.
This serves as a lead in to question 2:
Protobuf as is helps in the case where an ipc takes an extra parameter or
adds extra info to the return; the majority of the evolutions that will be
happening in the ipc interface. But what to do about the scenario Devaraj
outlines at the head of the thread where we have shipped a method that
causes the server to OOME in production or we add a method to the server
that runs ten times faster than the old one? Or probably more likely, the
server has a whole new 'feature' (as Todd calls it) orthogonal to the set
the protocol version implies? How does the client figure the new feature
is available?
We could have the client try the invocation -- as Jimmy suggests -- and if
it fails, register the fail in a client-wide map so we avoid retrying on
each invocation (We should just do this anyways). The client could go back
to the server and do the above suggested query of server capabilities and
then adjust the call accordingly or since we are doing an ipc setup call
anyways, we could have the server return the list of capabilities at this
time. The client could cache what is available or not and just ask the
server when convenient for it.
Using the bitmap shorthand describing what is available seems like it would
be less work to do than implementing protobuf service
description/interrogation and then dynamically composing method calls.
Proposal:
+ Remove VersionedProtocol and SignatureProtocol
+ Instead of VP, add a new Interface called VersionedService or probably
better, ProtocolDescriptor, that all RPC Protocols implement. It has
methods (getDescriptor) to return a pb Message that has the server version
of the protocol and a bitmap of feature's the server implements. This is
the call we will make when we set up the ipc proxy. Clients can cache the
result. Every time we change a Service/Protocol, we set a particular bit
in the Service/Protocol bitmap. This new Interface might also return the
long form pb ServiceDescriptors (the pb getDescriptorForType from Service
Interface). It could be useful debugging.
What you lot think?
St.Ack
1.
http://svn.apache.org/viewvc/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/ipc/VersionedProtocol.java?view=markup2.
https://developers.google.com/protocol-buffers/docs/reference/java/com/google/protobuf/Service3. We have VP and PS because, as I understand it, we once that we would
support choosing between protocol and protocol versions and that we'd
support both protobufs and Writables. This is no longer an wanted.
On Fri, Aug 3, 2012 at 11:40 AM, Devaraj Das wrote:Responses inline..
On Wed, Aug 1, 2012 at 11:04 AM, Todd Lipcon wrote:
One possibility:
During the IPC handshake, we could send the full version string /
source checksum. Then, have a client-wide map which caches which
methods have been found to be supported or not supported for an
individual version. So, we don't need to maintain the mapping
ourselves, but we also wouldn't need to do the full retry every time.
Yeah this is what I was thinking as the alternate to the current approach
of using VersionedProtocol.
A different idea would be to introduce a call like
"getServerCapabilities()" which returns a bitmap, and define a bit per
time that we add a new feature.
The advantage of these approaches vs a single increasing version
number is that we sometimes want to backport a new IPC to an older
version, but not backport all of the intervening IPCs. Having a bitmap
allows us to "pick and choose" on backports without having to pull in
a bunch of things we didn't necessarily want.
Good point.
On Wed, Aug 1, 2012 at 1:41 AM, Stack wrote:On Tue, Jul 31, 2012 at 1:47 AM, Devaraj Das wrote:
Wondering whether we should retain the VersionedProtocol now that we
have protobuf implementation for most (all?) of the protocols. I think we
still need the version checks and do them when we need to. Take this case:
1. Protocol Foo has as one of the methods FooMethod(FooMethodRequest).
2. Protocol Foo evolves over time, and the
FooMethod(FooMethodRequest) now has a better implementation called
FooMethod_improved(FooMethodRequest).
3. HBase installations have happened with both the protocol
implementations.
4. Clients should be able to talk to both old and new servers (and
invoke the newer implementation of FooMethod if the protocol implements it).
(4) is possible when the getProtocolVersion is implemented by the
protocol at the server. The client could check what the version of the
protocol was (assuming VersionedProtocol semantics where the protocol
version number is upgraded for such significant changes) and depending on
that invoke the appropriate method...
Having to map version-numbers of protocols to the methods-supported
is probably arcane IMO but works..
The other approach (that wouldn't require the version#) is to do
something like - On the client side, get the protocol methods supported at
the server (and cache it) and then look this map up whenever needed to
decide which method to invoke.
Any thoughts on whether we should invest time in the second approach
yet?
The VersionedProtocol w/ client being able to interrogate what methods
a server supports strikes me as a facility that will be rarely used if
at all and bringing it along, keeping up the directory of supported
methods, will take a load of work on our part that we'll do less than
perfectly so should it ever be needed, it won't work because we let it
go stale.
Yeah, this won't be a common case. It'd (hopefully) be rare. The directory
of methods would be the methods in the protocol-interface at the server
that could be figured by invoking reflection (and hence staleness issue
shouldn't happen).
What do you reckon?
The above painted scenario too is a little on the exotic side. We can
do something like Jimmy suggests in those rare cases we need to add a
new method because there is insufficient wiggle-room w/i the
particular PB method call (If we get into the issue Ted raises where
we'd have to go back to the server twice because there is a third new
method call, we're doing our API wrong).
Agree that the exception handling hack can be played here.. In general,
having some solution around this might be really helpful *if* we get some
API wrong (for e.g., indirect implication on memory by the API semantics)
and we need to fix it without breaking compatibility.. In HDFS, listFile
proved to be a memory killer for extremely large directories and people
implemented the iterator version of the same.
The protocol needs a version though. We'll be still sending that
'hrpc' long in the header preamble? Should we add a version long
after the 'hrpc' long?
The version in "hrpc" is the RPC version (as opposed to protocol version).
I think that's orthogonal to this discussion..
As to a directory of supported methods, do we need this in the
protocol at all? Can't this be knowledge kept outside of the
on-the-wire back and forth?
St.Ack
As I answered above, and as Todd also says, it probably makes sense to
have a client wide cache for protocol<->supported-methods .. and look up
the cache when and if the client needs to decide between different versions
of a method, or picking a new method, based on the server it is talking
to...