FAQ
Hey guys . I tried to scrape a data from a website using PHP cURL lib but I
failed since cURl allows you to scrape only static content . But the
content I want to scrape changes via javascript(AJAX) since cURL cant
hanfle that I couldnt handle scraping via cURL . So I heard the this type
of things can be done via node . Basically I need to make my node app
handle this js wait for some time until AJAX is done and the pass it to php
.So is it possible to do via node.js ? I dont know node and I have to start
from scratch so I am here you to point out the right node framework to use
to get the result I explained .

--
Job Board: http://jobs.nodejs.org/
Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
You received this message because you are subscribed to the Google
Groups "nodejs" group.
To post to this group, send email to nodejs@googlegroups.com
To unsubscribe from this group, send email to
nodejs+unsubscribe@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/nodejs?hl=en?hl=en

Search Discussions

  • Rektide at Oct 6, 2012 at 10:04 pm
    Only just picked it up last week, but it worked well enough-- node.io. It exposes a
    jQuery-esque interface for querying scraped pages. Extremely high level, "just works"
    scraping module, in my book!

    It also has a fairly sizable task-processing system built in, which I have not used.

    Good luck:
    https://github.com/chriso/node.io

    -rektide
    On Sat, Oct 06, 2012 at 01:34:03PM -0700, Narek Musakhanyan wrote:
    Hey guys . I tried to scrape a data from a website using PHP cURL lib but
    I failed  since cURl allows you to scrape only static content . But the
    content I want to scrape changes via javascript(AJAX)  since cURL cant
    hanfle that I couldnt handle scraping via cURL . So I heard the this type
    of things can be done via node . Basically I need to make my node app
    handle this js wait for some time until AJAX is done and the pass it to
    php .So is it possible to do via node.js ? I dont know node and I have to
    start from scratch so I am here you to point out the right node framework
    to use to get the result I explained .
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Dave Kuhn at Oct 7, 2012 at 3:46 am
    Good suggestions so far, though i highly recommend you check out phantomjs.org. Phantom is a headless version of WebKit which is the rendering engine behind Chrome & Safari. It's the most comprehensive solution to handling AJAX content when scraping in my book since it's technically the same as interacting with a page loaded by your browser.

    --
    Dave Kuhn
    Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

    On Saturday, October 6, 2012 at 3:04 PM, rektide wrote:

    Only just picked it up last week, but it worked well enough-- node.io. It exposes a
    jQuery-esque interface for querying scraped pages. Extremely high level, "just works"
    scraping module, in my book!

    It also has a fairly sizable task-processing system built in, which I have not used.

    Good luck:
    https://github.com/chriso/node.io

    -rektide
    On Sat, Oct 06, 2012 at 01:34:03PM -0700, Narek Musakhanyan wrote:
    Hey guys . I tried to scrape a data from a website using PHP cURL lib but
    I failed since cURl allows you to scrape only static content . But the
    content I want to scrape changes via javascript(AJAX) since cURL cant
    hanfle that I couldnt handle scraping via cURL . So I heard the this type
    of things can be done via node . Basically I need to make my node app
    handle this js wait for some time until AJAX is done and the pass it to
    php .So is it possible to do via node.js ? I dont know node and I have to
    start from scratch so I am here you to point out the right node framework
    to use to get the result I explained .

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Stephan Bardubitzki at Oct 7, 2012 at 11:14 pm
    Another option would be

    https://github.com/MatthewMueller/cheerio

    Tutorial:

    http://vimeo.com/31950192

    On Sat, Oct 6, 2012 at 8:46 PM, Dave Kuhn wrote:

    Good suggestions so far, though i highly recommend you check out
    phantomjs.org. Phantom is a headless version of WebKit which is the
    rendering engine behind Chrome & Safari. It's the most comprehensive
    solution to handling AJAX content when scraping in my book since it's
    technically the same as interacting with a page loaded by your browser.

    --
    Dave Kuhn
    Sent with Sparrow <http://www.sparrowmailapp.com/?sig>

    On Saturday, October 6, 2012 at 3:04 PM, rektide wrote:

    Only just picked it up last week, but it worked well enough-- node.io. It
    exposes a
    jQuery-esque interface for querying scraped pages. Extremely high level,
    "just works"
    scraping module, in my book!

    It also has a fairly sizable task-processing system built in, which I have
    not used.

    Good luck:
    https://github.com/chriso/node.io

    -rektide

    On Sat, Oct 06, 2012 at 01:34:03PM -0700, Narek Musakhanyan wrote:

    Hey guys . I tried to scrape a data from a website using PHP cURL lib but
    I failed since cURl allows you to scrape only static content . But the
    content I want to scrape changes via javascript(AJAX) since cURL cant
    hanfle that I couldnt handle scraping via cURL . So I heard the this type
    of things can be done via node . Basically I need to make my node app
    handle this js wait for some time until AJAX is done and the pass it to
    php .So is it possible to do via node.js ? I dont know node and I have to
    start from scratch so I am here you to point out the right node framework
    to use to get the result I explained .


    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines:
    https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en


    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines:
    https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Chad Engler at Oct 8, 2012 at 5:18 pm
    This is probably the same person who asked this question on
    StackOverflow:



    http://stackoverflow.com/questions/12630891/scrape-data-generated-by-jav
    ascript-on-server-side-from-webpages-aspx



    Where I have already answered his question, he just didn't like it:



    http://stackoverflow.com/questions/12630891/scrape-data-generated-by-jav
    ascript-on-server-side-from-webpages-aspx#comment17032399_12630891



    -Chad



    From: nodejs@googlegroups.com On Behalf
    Of Dave Kuhn
    Sent: Saturday, October 06, 2012 11:46 PM
    To: nodejs@googlegroups.com
    Subject: Re: [nodejs] Dynamic content scrape with Node.js



    Good suggestions so far, though i highly recommend you check out
    phantomjs.org. Phantom is a headless version of WebKit which is the
    rendering engine behind Chrome & Safari. It's the most comprehensive
    solution to handling AJAX content when scraping in my book since it's
    technically the same as interacting with a page loaded by your browser.



    --
    Dave Kuhn

    Sent with Sparrow <http://www.sparrowmailapp.com/?sig>



    On Saturday, October 6, 2012 at 3:04 PM, rektide wrote:

    Only just picked it up last week, but it worked well enough--
    node.io. It exposes a

    jQuery-esque interface for querying scraped pages. Extremely
    high level, "just works"

    scraping module, in my book!



    It also has a fairly sizable task-processing system built in,
    which I have not used.



    Good luck:

    https://github.com/chriso/node.io



    -rektide



    On Sat, Oct 06, 2012 at 01:34:03PM -0700, Narek Musakhanyan
    wrote:

    Hey guys . I tried to scrape a data from a website using
    PHP cURL lib but

    I failed since cURl allows you to scrape only static
    content . But the

    content I want to scrape changes via javascript(AJAX)
    since cURL cant

    hanfle that I couldnt handle scraping via cURL . So I
    heard the this type

    of things can be done via node . Basically I need to
    make my node app

    handle this js wait for some time until AJAX is done and
    the pass it to

    php .So is it possible to do via node.js ? I dont know
    node and I have to

    start from scratch so I am here you to point out the
    right node framework

    to use to get the result I explained .



    --

    Job Board: http://jobs.nodejs.org/

    Posting guidelines:
    https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines

    You received this message because you are subscribed to the
    Google

    Groups "nodejs" group.

    To post to this group, send email to nodejs@googlegroups.com

    To unsubscribe from this group, send email to

    nodejs+unsubscribe@googlegroups.com

    For more options, visit this group at

    http://groups.google.com/group/nodejs?hl=en?hl=en



    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines:
    https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Greelgorke at Oct 9, 2012 at 7:53 am
    why so complicated? just find out the url of the ajax request and do it
    yourself with whatever lib you want...

    Am Montag, 8. Oktober 2012 18:53:27 UTC+2 schrieb Chad Engler:
    This is probably the same person who asked this question on StackOverflow:




    http://stackoverflow.com/questions/12630891/scrape-data-generated-by-javascript-on-server-side-from-webpages-aspx



    Where I have already answered his question, he just didn’t like it:




    http://stackoverflow.com/questions/12630891/scrape-data-generated-by-javascript-on-server-side-from-webpages-aspx#comment17032399_12630891



    -Chad



    *From:* nod...@googlegroups.com <javascript:> [mailto:
    nod...@googlegroups.com <javascript:>] *On Behalf Of *Dave Kuhn
    *Sent:* Saturday, October 06, 2012 11:46 PM
    *To:* nod...@googlegroups.com <javascript:>
    *Subject:* Re: [nodejs] Dynamic content scrape with Node.js



    Good suggestions so far, though i highly recommend you check out
    phantomjs.org. Phantom is a headless version of WebKit which is the
    rendering engine behind Chrome & Safari. It's the most comprehensive
    solution to handling AJAX content when scraping in my book since it's
    technically the same as interacting with a page loaded by your browser.



    --
    Dave Kuhn

    Sent with Sparrow <http://www.sparrowmailapp.com/?sig>



    On Saturday, October 6, 2012 at 3:04 PM, rektide wrote:

    Only just picked it up last week, but it worked well enough-- node.io. It
    exposes a

    jQuery-esque interface for querying scraped pages. Extremely high level,
    "just works"

    scraping module, in my book!



    It also has a fairly sizable task-processing system built in, which I have
    not used.



    Good luck:

    https://github.com/chriso/node.io



    -rektide



    On Sat, Oct 06, 2012 at 01:34:03PM -0700, Narek Musakhanyan wrote:

    Hey guys . I tried to scrape a data from a website using PHP cURL lib but

    I failed since cURl allows you to scrape only static content . But the

    content I want to scrape changes via javascript(AJAX) since cURL cant

    hanfle that I couldnt handle scraping via cURL . So I heard the this type

    of things can be done via node . Basically I need to make my node app

    handle this js wait for some time until AJAX is done and the pass it to

    php .So is it possible to do via node.js ? I dont know node and I have to

    start from scratch so I am here you to point out the right node framework

    to use to get the result I explained .



    --

    Job Board: http://jobs.nodejs.org/

    Posting guidelines:
    https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines

    You received this message because you are subscribed to the Google

    Groups "nodejs" group.

    To post to this group, send email to nod...@googlegroups.com <javascript:>

    To unsubscribe from this group, send email to

    nodejs+un...@googlegroups.com <javascript:>

    For more options, visit this group at

    http://groups.google.com/group/nodejs?hl=en?hl=en



    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines:
    https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nod...@googlegroups.com <javascript:>
    To unsubscribe from this group, send email to
    nodejs+un...@googlegroups.com <javascript:>
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Dave Kuhn at Oct 9, 2012 at 4:08 pm
    True, you can get pretty far doing that but it gets difficult when crucial bits of information are hidden inside script tags and the like. Not to mention managing cookies for ASP.NET pages amongst others is a pain in the butt. You can avoid all that hassle with a fully resolved DOM and automatic support for cookies which Phantom JS will give you.

    --
    Dave Kuhn
    Sent with Sparrow (http://www.sparrowmailapp.com/?sig)

    On Tuesday, October 9, 2012 at 12:25 AM, greelgorke wrote:

    why so complicated? just find out the url of the ajax request and do it yourself with whatever lib you want...

    Am Montag, 8. Oktober 2012 18:53:27 UTC+2 schrieb Chad Engler:
    This is probably the same person who asked this question on StackOverflow:

    http://stackoverflow.com/questions/12630891/scrape-data-generated-by-javascript-on-server-side-from-webpages-aspx

    Where I have already answered his question, he just didn’t like it:

    http://stackoverflow.com/questions/12630891/scrape-data-generated-by-javascript-on-server-side-from-webpages-aspx#comment17032399_12630891

    -Chad

    From: nod...@googlegroups.com On Behalf Of Dave Kuhn
    Sent: Saturday, October 06, 2012 11:46 PM
    To: nod...@googlegroups.com
    Subject: Re: [nodejs] Dynamic content scrape with Node.js

    Good suggestions so far, though i highly recommend you check out phantomjs.org (http://phantomjs.org). Phantom is a headless version of WebKit which is the rendering engine behind Chrome & Safari. It's the most comprehensive solution to handling AJAX content when scraping in my book since it's technically the same as interacting with a page loaded by your browser.



    --
    Dave Kuhn
    Sent with Sparrow (http://www.sparrowmailapp.com/?sig)



    On Saturday, October 6, 2012 at 3:04 PM, rektide wrote:
    Only just picked it up last week, but it worked well enough-- node.io (http://node.io). It exposes a

    jQuery-esque interface for querying scraped pages. Extremely high level, "just works"

    scraping module, in my book!



    It also has a fairly sizable task-processing system built in, which I have not used.



    Good luck:

    https://github.com/chriso/node.io



    -rektide


    On Sat, Oct 06, 2012 at 01:34:03PM -0700, Narek Musakhanyan wrote:

    Hey guys . I tried to scrape a data from a website using PHP cURL lib but

    I failed since cURl allows you to scrape only static content . But the

    content I want to scrape changes via javascript(AJAX) since cURL cant

    hanfle that I couldnt handle scraping via cURL . So I heard the this type

    of things can be done via node . Basically I need to make my node app

    handle this js wait for some time until AJAX is done and the pass it to

    php .So is it possible to do via node.js ? I dont know node and I have to

    start from scratch so I am here you to point out the right node framework

    to use to get the result I explained .


    --

    Job Board: http://jobs.nodejs.org/

    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines

    You received this message because you are subscribed to the Google

    Groups "nodejs" group.

    To post to this group, send email to nod...@googlegroups.com

    To unsubscribe from this group, send email to

    nodejs+un...@googlegroups.com

    For more options, visit this group at

    http://groups.google.com/group/nodejs?hl=en?hl=en



    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nod...@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+un...@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
  • Mark Hahn at Oct 6, 2012 at 10:12 pm
    1) You should consider using the node `request` to scrape instead of cURL.

    2) Any scraping is only going to return what you request. This is only
    going to be the initially provided static content. You are getting this
    from the server, not the client. There is no way to get anything from the
    client.

    3) You will have to simulate the client and run the JS inside of your app.
    The easiest way to do this is to use a "headless" client. I suggest you
    use Zombie at http://zombie.labnotes.org

    On Sat, Oct 6, 2012 at 1:34 PM, Narek Musakhanyan wrote:

    Hey guys . I tried to scrape a data from a website using PHP cURL lib but
    I failed since cURl allows you to scrape only static content . But the
    content I want to scrape changes via javascript(AJAX) since cURL cant
    hanfle that I couldnt handle scraping via cURL . So I heard the this type
    of things can be done via node . Basically I need to make my node app
    handle this js wait for some time until AJAX is done and the pass it to php
    .So is it possible to do via node.js ? I dont know node and I have to start
    from scratch so I am here you to point out the right node framework to use
    to get the result I explained .

    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines:
    https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en
    --
    Job Board: http://jobs.nodejs.org/
    Posting guidelines: https://github.com/joyent/node/wiki/Mailing-List-Posting-Guidelines
    You received this message because you are subscribed to the Google
    Groups "nodejs" group.
    To post to this group, send email to nodejs@googlegroups.com
    To unsubscribe from this group, send email to
    nodejs+unsubscribe@googlegroups.com
    For more options, visit this group at
    http://groups.google.com/group/nodejs?hl=en?hl=en

Related Discussions

Discussion Navigation
viewthread | post
Discussion Overview
groupnodejs @
categoriesnodejs
postedOct 6, '12 at 9:02p
activeOct 9, '12 at 4:08p
posts8
users7
websitenodejs.org
irc#node.js

People

Translate

site design / logo © 2022 Grokbase