FAQ
Hi,
I want to use an input file which has lines of sequences in which each line (RNA sequence) will be mapped to the mapper (an executable programthat determines the secondary structure of each line of sequence). I am also using a reducer which concatenates the output linesfrom the mapper. But I have some problem that the final output is not sorted in an orderly manner as the input sequence (RNA-1,RNA-2,RNA-3....).
STDIN INPUT FILE : RNA-1 RNA-2 RNA-3.....
MAPPER OutPutMAP1<RNA-2><STRUCTURE-2>MAP2<RNA-1><STRUCTURE-1>MAP3<RNA-3><STRUCTURE-3>REDUCER OUTPUT<RNA-2><RNA-1><RNA-3>\t<STRUCTURE-1><STRUCTURE-2><STRUCTURE-3>\n OR<RNA-3><RNA-2><RNA-1>\t<STRUCTURE-1><STRUCTURE-2><STRUCTURE-3>\n
and what I am looking is to reduce in the following ordered manner: <RNA-1><RNA-2><RNA-3>\t<STRUCTURE-1><STRUCTURE-2><STRUCTURE-3>\nlooking forward to your input.

Regards,

Daniel T. Yehdego
Computational Science Program
University of Texas at El Paso, UTEP
dtyehdego@miners.utep.edu

Search Discussions

  • Mehmet Tepedelenlioglu at Sep 8, 2011 at 5:35 pm
    If you have a set of key value pairs you that you want to have in the same reducer, label them with an index key like so:

    <1,RNA1-STRUCT1>
    <1,RNA2-STRUCT2>
    <1,RNA3-STRUCT3>

    In this case RNA1, 2 and 3 with its corresponding structures will end up in the same reducer. So your mappers won't use RNAi as the key, but another grouping key.
    On Sep 8, 2011, at 10:07 AM, Daniel Yehdego wrote:


    Hi,
    I want to use an input file which has lines of sequences in which each line (RNA sequence) will be mapped to the mapper (an executable programthat determines the secondary structure of each line of sequence). I am also using a reducer which concatenates the output linesfrom the mapper. But I have some problem that the final output is not sorted in an orderly manner as the input sequence (RNA-1,RNA-2,RNA-3....).
    STDIN INPUT FILE : RNA-1 RNA-2 RNA-3.....
    MAPPER OutPutMAP1<RNA-2><STRUCTURE-2>MAP2<RNA-1><STRUCTURE-1>MAP3<RNA-3><STRUCTURE-3>REDUCER OUTPUT<RNA-2><RNA-1><RNA-3>\t<STRUCTURE-1><STRUCTURE-2><STRUCTURE-3>\n OR<RNA-3><RNA-2><RNA-1>\t<STRUCTURE-1><STRUCTURE-2><STRUCTURE-3>\n
    and what I am looking is to reduce in the following ordered manner: <RNA-1><RNA-2><RNA-3>\t<STRUCTURE-1><STRUCTURE-2><STRUCTURE-3>\nlooking forward to your input.

    Regards,

    Daniel T. Yehdego
    Computational Science Program
    University of Texas at El Paso, UTEP
    dtyehdego@miners.utep.edu
  • Robert at Sep 11, 2011 at 1:27 pm
    I downloaded the latest version of Ganglia and compiled and installed
    on my Hadoop cluster. Configured according to the documented
    procedures. The latest stable version of Ganglia is 3.2, and I am
    using hadoop-0.20.2-cdh31

    I just copied the gmond.conf from the distribution to the nodes. It
    has what look like default values 239.2.11.71 for mcast_join and port
    8649 throughout.

    The core (non hadoop) Ganglia reporting works fine, but Ganglia is not
    communicating with Hadoop in any reproducible way. I got reporting on
    one node once, got a *different* node reported from telnet localhost
    8649 once, but more generally get no reporting of hadoop metrics at
    all! When I bounce the cluster and/or gmond I may or may not get any
    difference in behavior. It is frustrating because the behavior seems
    to be random and not reproducible.

    I wonder if there is a problem with version compatibility? If there
    were release notes indicating a compatibility issue I didn't see them
    on the ganglia site. At this point, I'm tempted to give up on Ganglia
    for hadoop metrics and look for alternatives.

    Any ideas?
  • Robert at Sep 11, 2011 at 2:09 pm
    Sorry to follow up my own post but I thought I would give it one more
    shot this morning and change to dfs.servers=239.2.11.71:8649 (the
    multicast address).

    Though I am sure I tried that before, it works this time.
    Perhaps the Ganglia system was in some unusual state before.

    On 09/11/11 08:27, robert wrote:
    I downloaded the latest version of Ganglia and compiled and installed
    on my Hadoop cluster. Configured according to the documented
    procedures. The latest stable version of Ganglia is 3.2, and I am
    using hadoop-0.20.2-cdh31

    I just copied the gmond.conf from the distribution to the nodes. It
    has what look like default values 239.2.11.71 for mcast_join and port
    8649 throughout.

    The core (non hadoop) Ganglia reporting works fine, but Ganglia is not
    communicating with Hadoop in any reproducible way. I got reporting on
    one node once, got a *different* node reported from telnet localhost
    8649 once, but more generally get no reporting of hadoop metrics at
    all! When I bounce the cluster and/or gmond I may or may not get any
    difference in behavior. It is frustrating because the behavior seems
    to be random and not reproducible.

    I wonder if there is a problem with version compatibility? If there
    were release notes indicating a compatibility issue I didn't see them
    on the ganglia site. At this point, I'm tempted to give up on Ganglia
    for hadoop metrics and look for alternatives.

    Any ideas?




Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupcommon-user @
categorieshadoop
postedSep 8, '11 at 5:08p
activeSep 11, '11 at 2:09p
posts4
users3
websitehadoop.apache.org...
irc#hadoop

People

Translate

site design / logo © 2022 Grokbase