FAQ

Search Discussions

21 discussions - 106 posts

  • Hi, probably that was discussed before in this list, but i couldn't find. We are implementing log analysis tools for some web sites that have high traffic. We have millions of logs of a web site in a ...
    Gökhan ÇapanGökhan Çapan
    Dec 24, 2009 at 9:17 am
    Dec 25, 2009 at 1:17 pm
  • I am not sure if this is a bug, or something more subtle, but here is the problem that I am having. When I LOAD a dataset, change it with an ORDER, LIMIT it, then CROSS it with itself, the results ...
    Corry HainesCorry Haines
    Dec 15, 2009 at 12:03 am
    Dec 24, 2009 at 6:28 pm
  • Hi: We tried to get top N results after a groupby and sort, and got different results with or without storing the full sorted results. Here is a skeleton of our pig script. raw_data = Load ...
    Chuang liuChuang liu
    Dec 18, 2009 at 12:46 am
    Dec 22, 2009 at 6:37 pm
  • I followed the instructions on the Pig Setup page but I can't seem to be able to attach to my HDFS cluster. Is there a configuration file I'm missing or a environment variable that I'm missing?
    Aryeh BerkowitzAryeh Berkowitz
    Dec 23, 2009 at 4:42 pm
    Dec 23, 2009 at 6:15 pm
  • Just a quick question out there before I go doing this myself but has anyone written a StoreFunc (or even better a reversible one that does both load/store) for JSON... basically I have a relation ...
    Zaki rahamanZaki rahaman
    Dec 4, 2009 at 6:04 pm
    Dec 12, 2009 at 10:19 pm
  • I am trying to figure out a way to identify the potential bottlenecks in my pig script by putting some timestamps in the log before storing my output. Thinking of using the following : --debug tmp = ...
    Prasenjit mukherjeePrasenjit mukherjee
    Dec 18, 2009 at 9:48 am
    Jan 12, 2010 at 4:05 pm
  • Hi, I'm pretty sure the answer to my question is no, but I have to ask. Is it possible within Pig to store different groups of data into different output files where the grouping is dynamic (i.e. not ...
    Bill GrahamBill Graham
    Dec 15, 2009 at 8:00 pm
    Dec 28, 2009 at 2:19 am
  • Hi all, Thanks for sharing PigMix with us. I do have a related question to this thread. The generate_data.sh references test.jar in datagenjar=$PIG_HOME/build/test/classes/test.jar. This jar is never ...
    Iman EIman E
    Dec 11, 2009 at 6:57 pm
    Dec 15, 2009 at 5:38 am
  • Hi, I'm trying to use FileLocalizer in a UDF to check if a path passed in as a parameter is a file or a directory. I saw in some of the pig interval code that something like this: PigContext pc = ...
    Tamir KamaraTamir Kamara
    Dec 3, 2009 at 6:50 am
    Dec 3, 2009 at 1:25 pm
  • pigServer.setJobName() does not work for me. Now it always shows something like Job5377610087230523458.jar for Hadoop jobs. Is there any way to give them more meaningful names? Thanks, Yonggang
    Yonggang QiaoYonggang Qiao
    Dec 17, 2009 at 10:42 pm
    Dec 18, 2009 at 12:17 am
  • Hi: Is there an easy way to get top N records based on a field value in pig script? Thanks. bp
    Buping duBuping du
    Dec 17, 2009 at 8:15 pm
    Dec 17, 2009 at 10:32 pm
  • Hi all, I realized a week or two ago that PigStorage(',') wasn't adequate to parse files that had commas embedded in properly CSV quoted fields. I went ahead and built a CSV parser for pig 0.3 that ...
    James KebingerJames Kebinger
    Dec 8, 2009 at 11:12 pm
    Dec 9, 2009 at 7:51 am
  • Please forgive my ignorance, but is there a comment character in Pig scripts? It occurs to me I've never seen an example with a comment in it, and leading # or ; characters don't appear to work as ...
    James KebingerJames Kebinger
    Dec 4, 2009 at 11:57 pm
    Dec 5, 2009 at 12:48 am
  • I found this edge case issue: the ORDER statement assumes non-empty partitions to operate on. A simplified example below. in = LOAD 'a.gz' AS (label:int); sel = DISTINCT in PARALLEL <X ; ord = ORDER ...
    Skepticus SmithSkepticus Smith
    Dec 24, 2009 at 7:37 pm
    Jan 13, 2010 at 4:00 am
  • Hi everyone, Is there any way I can setup job specific properties without changing pig.properties every time? Can pig grunt "set" do that? or I need to wrap it into java and use the setProperty() to ...
    Xiaomeng WanXiaomeng Wan
    Dec 18, 2009 at 9:22 pm
    Dec 19, 2009 at 2:51 am
  • Is it possible to provide params to pig, on the command line or in properties, that get passed through to the hadoop jobs pig runs? Specifically, -D args to get picked up by the options parser on the ...
    Derek BrownDerek Brown
    Dec 9, 2009 at 3:07 am
    Dec 10, 2009 at 8:54 pm
  • Thanks guys! When I think about it, it may be good enough to do this at the CLI level as that is probably the most common use case for this (in most of the other "API" style modes the apps can ...
    VijayVijay
    Dec 30, 2009 at 8:04 pm
    Jan 4, 2010 at 3:06 pm
  • It looks like the way to use muti-query from Java is as follows: 1. pigServer.setBatchOn(); 2. register your queries with pigServer 3. List<ExecJob jobs = pigServer.executeBatch(); 4. for (ExecJob ...
    Dmitriy RyaboyDmitriy Ryaboy
    Dec 15, 2009 at 8:45 pm
    Dec 15, 2009 at 9:40 pm
  • Are there set operations on bags beyond the DIFF operator? I'd like to compare bags to find elements in both of them (intersection). I can imagine union and set addition and subtraction being useful ...
    James KebingerJames Kebinger
    Dec 13, 2009 at 1:11 am
    Dec 14, 2009 at 5:15 pm
  • Greetings, Due to the holiday season, the Hadoop/HBase/Etc. Meetup is not going to happen. If anyone wants to get together for casual coffee or drinks, though, let me know! We'll be back on schedule ...
    Bradford StephensBradford Stephens
    Dec 29, 2009 at 11:30 pm
    Dec 29, 2009 at 11:30 pm
  • Hadoop Fans, it's been a few weeks since we've hosted public training sessions, and now we're happy to announce three sessions in three cities over the next three months. These sessions are all ...
    Christophe BiscigliaChristophe Bisciglia
    Dec 21, 2009 at 9:07 pm
    Dec 21, 2009 at 9:07 pm
Group Navigation
period‹ prev | Dec 2009 | next ›
Group Overview
groupuser @
categoriespig, hadoop
discussions21
posts106
users32
websitepig.apache.org

32 users for December 2009

Dmitriy Ryaboy: 12 posts Gökhan Çapan: 9 posts Corry Haines: 8 posts Jeff Zhang: 8 posts Mridul Muralidharan: 6 posts Richard Ding: 6 posts Zaki Rahaman: 6 posts Rekha Joshi: 5 posts James Kebinger: 4 posts Alan Gates: 3 posts Aryeh Berkowitz: 3 posts Bill Graham: 3 posts Buping du: 3 posts Chuang liu: 3 posts Derek Brown: 3 posts Iman E: 3 posts Yonggang Qiao: 3 posts Jianyong Dai: 2 posts Rob Stewart: 2 posts Tamir Kamara: 2 posts
show more