FAQ
I'm new to node.js so forgive what's probably a very newbie question...

I've been trying to use writestreams in a few different use cases - one as
an HTTP request stream and one just writing to a plain file - and no matter
what, I always observe the same behavior, where nothing actually ever gets
written to the stream, until program execution is over.

So, for example, if I write this code:
var fs = require('fs');

var ws = fs.createWriteStream("/tmp/out", {
flags: 'w+'
});

var i = 0;

while (i < 100000) {
console.log("writing");
ws.write("random text\n");
i++;
}

ws.end();


I will see all the "writing" lines outputted, and only after all of the
"writing" lines are outputted to my terminal do any of the "random text"
lines get sent to my file. I could have set it to write 10 lines, or a
billion lines, and I see the same behavior.

My problem is, I'm trying to write a routine that generates JSON for
several million objects and writes those JSON objects to elasticsearch, via
the elasticsearchclient module, which sends data to elasticsearch via an
HTTP request (which is also a writestream). However, my routine always
fails, because node.js runs out of memory before any data actually gets
written to the stream. My routine works great if I only try to index 10
documents - once program execution ends, it sends all 10 documents over at
once, and they are indexed - but when I try to index the entire database,
it fails, even though I send the writes along 1000 at a time, and it has
ample time to start sending at least some of the documents . It would all
work great if it could just start sending the data as soon as it's
buffered, but nothing I've found gets it to do that. What I really need is
a "flush" command or something but there isn't one listed in the
documentation. The documentation would seem to indicate that this should
happen automatically, but it just doesn't.

Nothing in the documentation seems to indicate that writestreams should
work this way, so I find this very baffling and frustrating. Is there any
way to flush the writestream, to force it to start writing to the stream
before it runs out of memory? It seems like a pretty obvious thing, and in
other programming languages I've never had a problem like this, but I'm
just not finding the documentation where it explains this. Usually, in
most languages, you write to a stream, and it tries to output the data as
quickly as it can - it doesn't usually buffer until your program is done
executing. I try listening for the "drain" event but the "drain" event
never fires. The stream is always writable = false, right from the start -
the kernel buffer seems to be full right away. Nothing really seems to
work the way it's documented...

I'm running node 0.6.17. I'm pretty sure I'm missing something very
obvious here, but I've scoured the documentation and the forums for hours
and I can't find anything that helps me solve my problem. If anyone can
please help, I'd really appreciate it. Thanks.

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Search Discussions

  • Ben Noordhuis at May 14, 2012 at 3:56 am

    On Mon, May 14, 2012 at 12:50 AM, Stephen Weiss wrote:
    I'm new to node.js so forgive what's probably a very newbie question...

    I've been trying to use writestreams in a few different use cases - one as
    an HTTP request stream and one just writing to a plain file - and no matter
    what, I always observe the same behavior, where nothing actually ever gets
    written to the stream, until program execution is over.

    So, for example, if I write this code:
    var fs = require('fs');

    var ws = fs.createWriteStream("/tmp/out", {
    flags: 'w+'
    });

    var i = 0;

    while (i < 100000) {
    console.log("writing");
    ws.write("random text\n");
    i++;
    }

    ws.end();


    I will see all the "writing" lines outputted, and only after all of the
    "writing" lines are outputted to my terminal do any of the "random text"
    lines get sent to my file.  I could have set it to write 10 lines, or a
    billion lines, and I see the same behavior.

    My problem is, I'm trying to write a routine that generates JSON for several
    million objects and writes those JSON objects to elasticsearch, via the
    elasticsearchclient module, which sends data to elasticsearch via an HTTP
    request (which is also a writestream).  However, my routine always fails,
    because node.js runs out of memory before any data actually gets written to
    the stream.  My routine works great if I only try to index 10 documents -
    once program execution ends, it sends all 10 documents over at once, and
    they are indexed - but when I try to index the entire database, it fails,
    even though I send the writes along 1000 at a time, and it has ample time to
    start sending at least some of the documents .  It would all work great if
    it could just start sending the data as soon as it's buffered, but nothing
    I've found gets it to do that.  What I really need is a "flush" command or
    something but there isn't one listed in the documentation.  The
    documentation would seem to indicate that this should happen automatically,
    but it just doesn't.

    Nothing in the documentation seems to indicate that writestreams should work
    this way, so I find this very baffling and frustrating.  Is there any way to
    flush the writestream, to force it to start writing to the stream before it
    runs out of memory?  It seems like a pretty obvious thing, and in other
    programming languages I've never had a problem like this, but I'm just not
    finding the documentation where it explains this.  Usually, in most
    languages, you write to a stream, and it tries to output the data as quickly
    as it can - it doesn't usually buffer until your program is done executing.
    I try listening for the "drain" event but the "drain" event never fires.
    The stream is always writable = false, right from the start - the kernel
    buffer seems to be full right away.  Nothing really seems to work the way
    it's documented...

    I'm running node 0.6.17.  I'm pretty sure I'm missing something very obvious
    here, but I've scoured the documentation and the forums for hours and I
    can't find anything that helps me solve my problem.  If anyone can please
    help, I'd really appreciate it.  Thanks.
    In your example, you're doing all the work in a single "tick" of the
    event loop, effectively queuing up 100K write requests.

    If you slice up the requests like below, you give node.js the
    opportunity to process them concurrently:

    var i = 0;
    function work() {
    while (i < 100000) {
    console.log("writing");
    ws.write("random text\n");
    if (++i % 1000 == 0) return process.nextTick(work);
    }
    }
    work();

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Jimb Esser at May 14, 2012 at 4:03 am
    Because (almost) all I/O is asynchronous in node, the event loops
    needs to have a chance to run. I've done very little with streams,
    but looking at the docs, write() will return false if the buffer is
    full, and that won't be flushed (or perhaps even start to write if its
    waiting for an asynchronous file open or something) until the event
    loop has had a chance to run (after control returns to node from your
    main .js file in this case). Restructuring your loop to wait for the
    'drain' event whenever a write indicates it is full should fix your
    issue:

    var fs = require('fs');
    var ws = fs.createWriteStream("/tmp/out", {
    flags: 'w+'
    });

    var i = 0;
    function writesome() {
    while (i < 1000000) {
    i++;
    console.log("writing");
    if (!ws.write("random text\n")) {
    // buffer is full, don't write any more until we're notified
    ws.once('drain', writesome);
    return;
    }
    }
    ws.end();
    }
    writesome();

    Generally, with node, even if you're doing something fairly
    straightforward, you need to think in an async/event-driven manner.
    This is a bit annoying when writing the simple things, but is
    wonderful when it's embraced and you're doing anything more complex.
    If you wanted to, for example, open 4 streams and write to whichever
    isn't full/busy, that becomes trivial with the code above that's
    operating on events.

    - jimb
    On May 13, 3:50 pm, Stephen Weiss wrote:
    I'm new to node.js so forgive what's probably a very newbie question...

    I've been trying to use writestreams in a few different use cases - one as
    an HTTP request stream and one just writing to a plain file - and no matter
    what, I always observe the same behavior, where nothing actually ever gets
    written to the stream, until program execution is over.

    So, for example, if I write this code:
    var fs = require('fs');

    var ws = fs.createWriteStream("/tmp/out", {
    flags: 'w+'

    });

    var i = 0;

    while (i < 100000) {
    console.log("writing");
    ws.write("random text\n");
    i++;

    }

    ws.end();

    I will see all the "writing" lines outputted, and only after all of the
    "writing" lines are outputted to my terminal do any of the "random text"
    lines get sent to my file.  I could have set it to write 10 lines, or a
    billion lines, and I see the same behavior.

    My problem is, I'm trying to write a routine that generates JSON for
    several million objects and writes those JSON objects to elasticsearch, via
    the elasticsearchclient module, which sends data to elasticsearch via an
    HTTP request (which is also a writestream).  However, my routine always
    fails, because node.js runs out of memory before any data actually gets
    written to the stream.  My routine works great if I only try to index 10
    documents - once program execution ends, it sends all 10 documents over at
    once, and they are indexed - but when I try to index the entire database,
    it fails, even though I send the writes along 1000 at a time, and it has
    ample time to start sending at least some of the documents .  It would all
    work great if it could just start sending the data as soon as it's
    buffered, but nothing I've found gets it to do that.  What I really need is
    a "flush" command or something but there isn't one listed in the
    documentation.  The documentation would seem to indicate that this should
    happen automatically, but it just doesn't.

    Nothing in the documentation seems to indicate that writestreams should
    work this way, so I find this very baffling and frustrating.  Is there any
    way to flush the writestream, to force it to start writing to the stream
    before it runs out of memory?  It seems like a pretty obvious thing, and in
    other programming languages I've never had a problem like this, but I'm
    just not finding the documentation where it explains this.  Usually, in
    most languages, you write to a stream, and it tries to output the data as
    quickly as it can - it doesn't usually buffer until your program is done
    executing.   I try listening for the "drain" event but the "drain" event
    never fires.  The stream is always writable = false, right from the start -
    the kernel buffer seems to be full right away.  Nothing really seems to
    work the way it's documented...

    I'm running node 0.6.17.  I'm pretty sure I'm missing something very
    obvious here, but I've scoured the documentation and the forums for hours
    and I can't find anything that helps me solve my problem.  If anyone can
    please help, I'd really appreciate it.  Thanks.
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupnodejs @
categoriesnodejs
postedMay 14, '12 at 3:42a
activeMay 14, '12 at 4:03a
posts3
users3
websitenodejs.org
irc#node.js

People

Translate

site design / logo © 2017 Grokbase