Grokbase
Topics Posts Groups | in
x
[ help ]

[Catalyst] advise on data processing in Cat/DBIC/Model

View PostFlat  Thread  Threaded | Next >
Rainer Clasen [Catalyst] advise on data processing in Cat/DBIC/Model
| +1 vote
[ Profile | Reply to group ] [ Flat  Thread  Threaded ]
Hello,

within my current project, some value is collected up to once a day:

CREATE TABLE a_value {
  day date PRIMARY KEY,
  other_values integer NOT NULL,
  value integer
  another_value integer
);

Data comes in a bit sporadic - so I cannot rely each day having an entry.
Actually there also be longer periods (weeks/month/??) without data.

I'm currently a bit at a loss on how to "properly" cook up this data to
easily display it in fixed time steps. I'm thinking of a list of *all*
days/weeks/month/... in a certain timerange. Such a list would allow the
view easy access to present the data (say as html table with one row per
time step or as input for GD::Graph).

This means there are basically two tasks:
- aggregate the data for each time step: No-brainer with DBIx::Class.
- get NULL entries for time steps without data: The intersting part.

I can come up the following solutions to generate the NULL entries:

- use a SQL stored procedure or temp table with the start-dates of the
  desired time-steps, do an outer join and stuff this in a DBIC
  result_source as described in the DBIC cookbook under "arbitrary SQL".

  example query for ->name():
SELECT
  d.id,
  steps AS day,
  d.value,
  COALESCE( d.other_value, $4 ) AS other_value
FROM
  timeseries( $1, $2, $3) AS steps
  LEFT JOIN ( SELECT * FROM data WHERE other_value = $4 ) d
   ON ( d.day >= $2 AND d.day + $1 < $3;
  $1 = time steps. eg. '1 day'
  $2 = start date. eg. '2007-11-1'
  $3 = end date. eg '2007-11-30'
  $4 = other_value to filter on.
  timeseries(step,start,end) = stored procedure that returns the 
start-dates of the time-steps within the specified time-range.

  pro: 
  - also usable without catalyst / DBIC.
  - easy data aggregation
  - same interface as non-cooked data
  con: 
  - Outer joins tend to be slow.
  - the NULL entries have to be transfered from DB->application
  - the "arbitrary SQL" example makes me feel a bit sick.
  - the "WHERE other_value..." subquery needs to be different in certain
    cases.

- Catalyst Model, that creates a DBIC resultset and feeds it to a plain
  perl "aggregation module" (which returns the aggregated data in the
  desired time steps).

  pro:
  - aggregation module is quite easy to write
  -  "            "    is also usable without catalyst / DBIC
  - allows easy tweaking of the resultset
  con:
  - still a bit tricky to teach the DB about the time-steps for data
    aggregation (think of non trivial steps like '4 days').  
  - alternatively the aggregation has to be done "manually"
  - same applies for selecting the actual columns I want to
    retrieve/aggregate.

Which do you consider the more elegant way? Am I missing the "obvious" and
therefore trivial solution (as usual)? Am I completely nuts?

Rainer

--
KeyID=759975BD fingerprint=887A 4BE3 6AB7 EE3C 4AE0 B0E1 0556 E25A 7599 75BD

Thread : [Catalyst] advise on data processing in Cat/DBIC/Model
1)
Rainer Clasen Hello, within my current project, some value is collected up to once a day: CREATE TABLE a_value {...
2)
Matt S Trout I tend to do -sort- of this. Except that instead of using a function like timeseries() I'll create...
3)
Rainer Clasen Ah, that's similar to what I had before. Well, yours shows a bit more experience ;) I haven't...
4)
Toby Corkindale I believe that's the method Joe Celko recommends as well. You can extend it later with public...
spacer
View PostFlat  Thread  Threaded | Next >