I'd endorse Konstantin's suggestion that the only "downgrade" that will be
supported is "rollback," (and roll back works regardless of version).

There should be a time and version rule for allowed upgrades. "Upgrades to
1.x will be supported for 0.x for x>17. Upgrades to X.* will be supported
for (X-1).*, and also (X-2).* if (X-1).0 is less than one year old."

For interoperation between versions, I might conspicuously deprecate
HFTP/HTTP access to files in 1.0 while making a strong demand for
interoperability. "Client applications may read/write data between any two
versions not less than 1.0 that permit upgrade. 1.* need only support
HTTP/HFTP for sufficiently relaxed security regimes. Support for HTTP/HFTP
may be withdrawn in 2.0."
On 20 10 08 18:50, "Sanjay Radia" wrote:

The Hadoop 1.0 wiki has a section on compatibility.

Since the wiki is awkward for discussions, I am continuing the
discussion here.
I or someone will update the wiki when agreements are reached.

Here is the current list of compatibility requirements on the Hadoop
1.0 Wiki for the convenience of this email thread.
What does Hadoop 1.0 mean?
* Standard release numbering: Only bug fixes in 1.x.y releases
and new features in 1.x.0 releases.
* No need for client recompilation when upgrading from 1.x to
1.y, where x <= y
o Can't remove deprecated classes or methods until 2.0
* Old 1.x clients can connect to new 1.y servers, where x <= y
* New FileSystem clients must be able to call old methods when
talking to old servers. This generally will be done by having old
methods continue to use old rpc methods. However, it is legal to have
new implementations of old methods call new rpcs methods, as long as
the library transparently handles the fallback case for old servers.

A couple of additional compatibility requirements:

* HDFS metadata and data is preserved across release changes, both
major and minor. That is,
whenever a release is upgraded, the HDFS metadata from the old release
will be converted automatically
as needed.

The above has been followed so far in Hadoop; I am just documenting it
in the 1.0 requirements list.

* In a major release transition [ ie from a release x.y to a
release (x+1).0], a user should be able to read data from the cluster
running the old version. (OR shall we generalize this to: from x.y to
(x+i).z ?)

The motivation: data copying across clusters is a common operation for
many customers
(for example this is routinely at done at Yahoo.). Today, http (or
hftp) provides a guaranteed compatible way of copying data across
versions. Clearly one cannot force a customer to simultaneously
update all its hadoop clusters on to
a new major release. The above documents this requirement; we can
satisfy it via the http/hftp mechanism or some other mechanism.

Question: is one is willing to break applications that operate across
clusters (ie an application that accesses data across clusters that
cross a major release boundary? I asked the operations team at Yahoo
that run our hadoop clusters. We currently do not have any applicaions
that access data across clusters as part of a MR job. The reason
being that Hadoop routinely breaks wire compatibility across releases
and so such apps would be very unreliable. However, the copying of
data across clusters is t is crucial and needs to be supported.

Shall we add a stronger requirement for 1.0: wire compatibility
across major versions? This can be supported by class loading or other
games. Note we can wait to provide this when 2.0 happens. If Hadoop
provided this guarantee then it would allow customers to partition
their data across clusters without risking apps breaking across major
releases due to wire incompatibility issues.

Search Discussions

Discussion Posts


Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 10 of 14 | next ›
Discussion Overview
groupcommon-dev @
postedOct 21, '08 at 1:50a
activeOct 30, '08 at 5:17p



site design / logo © 2022 Grokbase