FAQ
Editorial: "rollback" should be restricted to exactly the function of
returning to the state before an upgrade was started. It's OK to discuss
other desirable features along with descriptive names, but "easy rollback,"
"partial rollback," "version rollback" and the like are all confusing.

Substance: In speaking of version numbers, don't confuse desired behavior
(what client can connect with which server) with the details of
implementation (disk formats did or did not change). We want to avoid
getting squeezed by the argument that some feature must wait a year because
it modifies the disk format but it too early to change the major version
number.

On 27 10 08 13:50, "Sanjay Radia" wrote:

I have merged the various Hadoop 1.0 Compatibility items that have
been discussed in this thread and categorized and listed them below.



Hadoop 1.0 Compatibility
==================

Standard release numbering:
- Only bug fixes in dot releases: m.x.y
- no changes to API, disk format, protocols or config etc.
- new features in major (m.x.0) and minor (m.x.0) releases


1. API Compatibility
-------------------------
No need for client recompilation when upgrading across minor releases
(ie. from m.x to m.y, where x <= y)
Classes or methods deprecated in m.x can be removed in (m+1).0
Note that this is stronger than what we have been doing in Hadoop 0.x
releases.
Motivation. This the industry standard compatibility rules for major
and
minor releases.

2 Data Compatibility
--------------------------
2.a HDFS metadata and data can change across minor or major releases ,
but such
changes are transparent to user application. That is release upgrade
must
automatically convert the metadata and data as needed. Further, a
release
upgrade must allow a cluster to roll back to the older version and its
older
disk format.
Motivation: Users expect File systems preserve data transparently
across
releases.

2.a-Stronger
HDFS metadata and data can change across minor or major releases, but
such
changes are transparent to user application. That is release upgrade
must
automatically convert the metadata and data as needed. During *minor*
releases,
disk format changes have to backward and forward compatible; i.e. an
older
version of Hadoop can be started on a newer version of the disk
format. Hence
a version roll back is simple, just restart the older version of Hadoop.
Major releases allow more significant changes to the disk format and
have be
only backward compatible; however major release upgrade must allow a
cluster to
roll back to the older version and its older disk format.
Motivation: Minor release are very easy to roll back for an admin.


2.a-WeakerAutomaticConversion:
Automatic conversion is supported across a small number of releases.
If a user
wants to jump across multiple releases he may be forced to go through
a few
intermediate release to get to the final desired release.

3. Wire Protocol Compatibility
----------------------------------------
We offer no wire compatibility in our 0.x release today.
The motivation *isn't* to make a our protocols public. Applications
will not
call the protocol directly but through a library (in our case
FileSystem class
and its implementations). Instead the motivation is that customers run
multiple clusters and have apps that access data across clusters.
Customers
cannot be expected to update all clusters simultaneously.


3.a Old m.x clients can connect to new m.y servers, where x <= y but
the old clients might get reduced functionality or performance. m.x
clients might not be able to connect to (m+1).z servers

3.b. New m.y clients must be able to connect to old m.x server, where
x< y but
only for old m.x functionality.
Comment: Generally old API methods continue to use old rpc methods.
However, it is legal to have new implementations of old API methods
call new
rpcs methods, as long as the library transparently handles the fallback
case for old servers.

3.c. At any major release transition [ ie from a release m.x to a
release (m+1).0], a user should be able to read data from the cluster
running the old version. (OR shall we generalize this to: from m.x to
(m+i).z ?)

Motivation: data copying across clusters is a common operation for many
customers. For example this is routinely at done at Yahoo; another use
case is
HADOOP-4058. Today, http (or hftp) provides a guaranteed compatible
way of
copying data across versions. Clearly one cannot force a customer to
simultaneously update all its Hadoop clusters on to a new major
release. The
above documents this requirement; we can satisfy it via the http/hftp
mechanism
or some other mechanism.

3.c-Stronger
Shall we add a stronger requirement for 1. 0 : wire compatibility
across major versions? That is not just for reading but for all
operations. This can be supported by class loading or other games.
Note we can wait to provide this when 2. 0 happens. If Hadoop
provided this guarantee then it would allow customers to partition
their data across clusters without risking apps breaking across major
releases due to wire incompatibility issues.

Motivation: Data copying is a compromise. Customers really want to run
apps across clusters running different versions. ( See item 2 )


4. Intra Hadoop Service Compatibility
--------------------------------------------------
The HDFS Service has multiple components (NN, DN, Balancer) that
communicate
amongst themselves. Similarly the MapReduce service has
components (JR and TT) that communicate amongst themselves.
Currently we require that the all the components of a service have the
same build version and hence talk the same wire protocols.
This build-version checking prevents rolling upgrades. It has the
benefit that the admin can ensure that the entire cluster has exactly
the same build version.

4.a HDFS and MapReduce require that their respective sub-components
have the same build version in order to form a cluster.
[ie. Maintain the current mechanism.]

4.a-Stronger: Intra-service wire-protocol compatibility
[I am listing this here to document it, but I don't think we are ready
to take
this on for Hadoop 1.0. Alternatively, we could require intra -service
wire
compatibility but check for build version till we are ready for rolling
upgrades]

Wire protocols between internal Hadoop components are compatible across
minor versions.
Examples are NN-DN, DN-DN and NN-Balancer, etc.
Old m.x components can talk to new m.y components (x<=y)
Wire compatibility can break across major versions.
Motivation: Allow rolling upgrades.

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 13 of 14 | next ›
Discussion Overview
groupcommon-dev @
categorieshadoop
postedOct 21, '08 at 1:50a
activeOct 30, '08 at 5:17p
posts14
users4
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase