I am a newbie and am in the process of setting up a standalone storm
cluster to begin with but have a few questions.
What i would like to do is setup a real time metrics display web page by
collecting URL info from my Apache logs.
I have a setup of three backend apache web servers that log traffic and i
assume that storm can help me setup
a real time statistics display system such that i can display which
documents are being accessed frequently on my system.
I was wondering if there are any examples / documentation out there on how
to go about setting up the intial setup on getting
my apache logs into the storm cluster. My logs are rotated every hour. I
assume i will have to copy them into the storm cluster
via some cron script and then once they deposit the storm topology will
pick them up do their thing and then move them to a processed queue.
All assumptions at this point but any help + guidance in the right
direction will be appreciated.