|| at Nov 15, 2012 at 9:53 pm
So in this case we don't have JDBC access, and the insert statements are
just big by nature - we're creating location geometries a la "MULTIPOINT((3
4, -180 90, 0 0))" for 375 million species records. Fortunately the most
common species is only 1.6m records - so 1.6 million xy pairs + metadata.
That said, your message made me realize that we could add the geometry
independently of the metadata fields, which would bring the size down
dramatically. We'd still be looking at some large strings though, so some
idea of a max tuple size would be helpful.
I'll report back here what we find, but I'm still hopeful about hints from
On Thursday, November 15, 2012 1:18:08 PM UTC-8, Sam Ritchie wrote:
Robin, the defbufferop can emit multiple tuples -- why not chunk up some
of those huge statements into smaller insert statements?
Better yet, you could use the JDBC tap in Maple and just sink directly
into the SQL db, vs doing this intermediate business:https://github.com/Cascading/maple/blob/develop/src/jvm/com/twitter/maple/jdbc/JDBCTap.java
I have a defbufferop transforms input tuples into a string SQL insert
statement that will be used elsewhere. For one group of tuples, however,
there are 1.6 million records. So the resulting insert statement would use
about 125mb of disk space. When we try to run this, the reduce step fails
with a java heap space out of memory error. There's an issue about it here:https://github.com/MapofLife/fossa/issues/24
This is obviously an extreme case, but I wonder if anyone has a rule of
thumb about how large an output tuple can be. I've experimented with the
simple case below and was able to push out a 9mb tuple (10 million "a"s),
but not a 45 mb tuple (50 million).
Whatever the limit, we'll ultimately probably have to separately process
groups that would be above that limit. And it would be nice if there were
as few of those records as possible (i.e. we want to get close to the
(let [n 10000000
src [[(str (apply str (repeat n "a")))]]]
(?<- (hfs-seqfile "/tmp/yo" :sinkmode :replace)
Sam Ritchie, Twitter Inc
(Too brief? Here's why! http://emailcharter.org)