On Tue, 13 Oct 2009 12:59:47 -0700, Paul Rubin wrote:sturlamolden <sturlamolden at yahoo.no> writes:

The obvious way to compute a running median involves a tree structure

so you can quickly insert and delete elements, and find the median.

That would be asymtotically O(n log n) but messy to implement.

QuickSelect will find the median in O(log n) time.

That makes no sense, you can't even examine the input in O(log n) time.

If that were a relevant argument, no algorithms would ever be described

as running in O(log n).

Obviously to run in O(log n) you must have already built the tree. If you

include the time required to build the tree, then O(n) is the lower-bound

(because you have to see each element to put it in the tree) but that

applies to any data structure. The point is, once you have that tree, you

can perform repeated calculations in O(log n) time (or so it has been

claimed).

For example, if you want the m running medians of m data points, you

might do the following:

find the median of element 1

find the median of element 1 and 2

find the median of element 1 through 3

find the median of element 1 through 4

...

find the median of element 1 through m

Each median calculation may take O(log n), where n varies from 1 to m,

giving a total order of:

log 1 + log 2 + log 3 + ... + log m

=> O(log m!)

steps, which is much better than:

1 + 2 + 3 + ... + m = 1/2*m*(m+1)

=> O(m**2)

Anyway, the problem isn't to find the median of n elements. It's to

find n different medians of n different sets. You might be able to get

close to linear time though, depending on how the data acts.

But since the data sets aren't independent, a tree structure seems like

the way to go to me. Inserting and deleting elements should be fast, and

assuming the claim of calculating the median in O(log n) is correct,

that's quite fast too.

--

Steven