FAQ
I had this doubt for quite a long time.Could be absurd even but need the
solutions .
How do we compare efficiently compare 2 files each containing terabytes of
record ?
This could be related to external sorting as well.
But couldnt find a efficeint solution to it.
Can somebody please help in understanding how to proceed?
--
View this message in context: http://old.nabble.com/Compare-effectively-TerraBytesofRecords-with-another-Using-Hadoop-%28MapReduce%29--tp32503928p32503928.html
Sent from the Hadoop core-user mailing list archive at Nabble.com.

Search Discussions

  • Prashant at Sep 26, 2011 at 6:48 am

    On 09/26/2011 11:58 AM, Sharan34140 wrote:
    I had this doubt for quite a long time.Could be absurd even but need the
    solutions .
    How do we compare efficiently compare 2 files each containing terabytes of
    record ?
    This could be related to external sorting as well.
    But couldnt find a efficeint solution to it.
    Can somebody please help in understanding how to proceed?
    Before proceeding. Can you provide us with more details, like Is
    comparison to be done involves line by line comparison of files and
    display the diff or Is it a record ?. In either case one might have to
    override Fileinputformat which would accept two files in question and
    process them line by line or by record. And then in map we can emit the
    diff with Record number as key and diff as value. I have not tried this
    would be interesting if someone with experience can throw some light.

    Thanks
    Prashant

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 26, '11 at 6:28a
activeSep 26, '11 at 6:48a
posts2
users2
websitehadoop.apache.org...
irc#hadoop

2 users in discussion

Sharan34140: 1 post Prashant: 1 post

People

Translate

site design / logo © 2022 Grokbase