I have directories in HDFS that get populated by nightly ETL job on which we have Hive external tables. The problem we are having is that when the table schema changes such that there are new columns in between existing columns.
Is there a straightforward way of having the older data files to have Nulls for the new columns?
My understanding is that Hive external tables read the data column by column sequentially, so for the older files the column order would be messed-up.
Is there a way to import data by specifying which value is for which column?
Would it be better if the tables are NOT external to begin with?
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.