Grokbase Groups Avro user May 2013
FAQ

On 5/22/13 10:34 AM, "Mason" wrote:
dear list,

I have what I imagine is a standard setup: a web application generates
data in MySQL, which I want to analyze in Hadoop; I run a nightly
process to extract tables of interest, Avroize, and dump into HDFS.

This has worked great so far because the tools I'm using make it easy to
load a directory tree of Avros with the same schema.

The issue is what to do when schema changes occur in the SQL database. I
believe column additions and deletions are handled automatically by the
Avro loaders I'm using, but I need to deal with a column rename.

My thinking is: I could bake the table schemas at time of ETL into the
Avros, for historical record, but then manually copy that schema out as
a "master" schema and apply it to all Avros for which it's appropriate;
then when a column rename occurs, go back and edit the master schema.

I've never used an external schema before, so please correct if I
misunderstand how they work.

Anyone have wisdom to share on this topic? I'd love to hear from anyone
who has done this, or has a better solution.
The first thing that comes to mind is the alias feature for field names:
http://avro.apache.org/docs/current/spec.html#Aliases

If you bare using Avro data files, these contain the schemas at the time
of writing for "historical record".

The trick is being able to distinguish between someone who renamed a
column from "foo" to "fubar" and a case where "foo" was removed and
"foobar" added. To do this, one has to have knowledge from the SQL
database DDL changes.

Once you have this, you can choose your reader schema appropriately --
likely by using the 'latest' schema decorated with field aliases where
appropriate, but there are other options.

-Mason

Search Discussions

Discussion Posts

Previous

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 2 | next ›
Discussion Overview
groupuser @
categoriesavro
postedMay 22, '13 at 5:34p
activeMay 23, '13 at 7:16p
posts2
users2
websiteavro.apache.org
irc#avro

2 users in discussion

Scott Carey: 1 post Mason: 1 post

People

Translate

site design / logo © 2021 Grokbase