Grokbase Groups Avro user May 2013
FAQ

On 5/22/13 2:26 PM, "Gregory (Grisha) Trubetskoy" wrote:
Hello!

I have a test.json file that looks like this:

{"first":"John", "last":"Doe", "middle":"C"}
{"first":"John", "last":"Doe"}

(Second line does NOT have a "middle" element).

And I have a test.schema file that looks like this:

{"name":"test",
"type":"record",
"fields": [
{"name":"first", "type":"string"},
{"name":"middle", "type":"string", "default":""},
{"name":"last", "type":"string"}
]}

I then try to use fromjson, as follows, and it chokes on the second line:

$ java -jar avro-tools-1.7.4.jar fromjson --schema-file test.schema
test.json > test.avro
Exception in thread "main" org.apache.avro.AvroTypeException: Expected
field name not found: middle
at org.apache.avro.io.JsonDecoder.doAction(JsonDecoder.java:477)
at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
at org.apache.avro.io.JsonDecoder.advance(JsonDecoder.java:139)
at
org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:219)
at
org.apache.avro.io.JsonDecoder.readString(JsonDecoder.java:214)
at
org.apache.avro.io.ValidatingDecoder.readString(ValidatingDecoder.java:107
)
at
org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.j
ava:348)
at
org.apache.avro.generic.GenericDatumReader.readString(GenericDatumReader.j
ava:341)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:15
4)
at
org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.j
ava:177)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:14
8)
at
org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:13
9)
at
org.apache.avro.tool.DataFileWriteTool.run(DataFileWriteTool.java:105)
at org.apache.avro.tool.Main.run(Main.java:80)
at org.apache.avro.tool.Main.main(Main.java:69)


The short story is - I need to convert a bunch of JSON where an element
may not be present sometimes, in which case I'd want it to default to
something sensible, e.g. blank or null.

According to the Schema Resolution "if the reader's record schema has a
field that contains a default value, and writer's schema does not have a
field with the same name, then the reader should use the default value
from its field."

I'm clearly missing something obvious, any help would be appreciated!
There are two things that seem to be missing here:
  1. The fromjson tool is configuring the "writer's schema" (and readers's)
the one you provided. Avro is expecting every
JSON fragment you are giving it to have the same schema.
  2. The tool will not work for all arbitrary json, it expects json in the
format that the Avro JSON Encoder writes. There are a few differences
with expectations, primarily when disambiguating union types and maps from
records.

To perform schema evolution while reading, you may need to separate json
fragments missing "middle" from those that have it, and run the tool
twice, with corresponding schemas for each case.
Alternatively the tool could be modified to handle schema resolution or
deal with different json encodings as
well(tools/src/main/java/org/apache/avro/tool/DataFileWriteTool).

Alternatively, you can avoid schema resolution and write two files, one
with data in each schema after separating the records. Then you can deal
with schema resolution in a later pass through the data with other tools
(e.g. data file reader + writer), or only lazily
when reading resolve the data into the schema you wish.


Grisha

Search Discussions

Discussion Posts

Previous

Follow ups

Related Discussions

Discussion Navigation
viewthread | post
posts ‹ prev | 2 of 3 | next ›
Discussion Overview
groupuser @
categoriesavro
postedMay 22, '13 at 9:27p
activeMay 23, '13 at 8:08p
posts3
users2
websiteavro.apache.org
irc#avro

People

Translate

site design / logo © 2021 Grokbase