Hi folks,

I know that density function will give a estimated density for a give
dataset. Now from that I want to have a percentage estimation for a
certain range. For examle:
y = density(c(-20,rep(0,98),20))
plot(y, xlim=c(-4,4))
Now if I want to know the percentage of data lying in (-20,2). Basically
it should be the area of the curve from -20 to 2. Anybody knows a simple
function to do it?

Thanks,

D.

at Jan 27, 2012 at 11:09 pm

On 28/01/12 11:44, Duke wrote:
Hi folks,

I know that density function will give a estimated density for a give
dataset. Now from that I want to have a percentage estimation for a
certain range. For examle:
y = density(c(-20,rep(0,98),20))
plot(y, xlim=c(-4,4))
Now if I want to know the percentage of data lying in (-20,2).
Basically it should be the area of the curve from -20 to 2. Anybody
knows a simple function to do it?
You could try:

foo <- with(y,splinefun(x,y))
integrate(foo,lower=-20,upper=2)

Note that

integrate(foo,lower=min(y\$x),upper=max(y\$x))

yields "1.000951 with absolute error < 0.00011", rather than giving
exactly 1, so there's a bit of slop in the system.

cheers,

Rolf Turner
at Jan 29, 2012 at 4:11 am
If you use logspline estimation (logspline package) instead of kernel density estimation then this is simple as there are cumulative area functions for logspline fits.

If you need to do this with kernel density estimates then you can just find the area over your region for the kernel centered at each data point and average those values together to get the area under the entire density estimate.

at Jan 29, 2012 at 9:03 pm
If v is your original data,
v <- c(-20, rep(0,98), 20)
why not use
mean( -20 < v & v < 2)
as your estimate of the probability that v is in (-20,2)?

Estimating a density is like taking the derivative
of a smooth of the empirical distribution function,
so why not eliminate the middleman instead of integrating
the estimated density? Any difference between the two
the data involved. (Not that I am any sort of expert
in this matter.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
If you use logspline estimation (logspline package) instead of kernel density estimation then this is
simple as there are cumulative area functions for logspline fits.

If you need to do this with kernel density estimates then you can just find the area over your region
for the kernel centered at each data point and average those values together to get the area under the
entire density estimate.

at Jan 30, 2012 at 2:52 pm
Great suggestions and comments, Bill, Greg and Rolf. You provided me
some valuable ways to deal with the data I am working with. Thank you
all so much!

Bests,

D.
On 1/29/12 4:03 PM, William Dunlap wrote:
If v is your original data,
v<- c(-20, rep(0,98), 20)
why not use
mean( -20< v& v< 2)
as your estimate of the probability that v is in (-20,2)?

Estimating a density is like taking the derivative
of a smooth of the empirical distribution function,
so why not eliminate the middleman instead of integrating
the estimated density? Any difference between the two
the data involved. (Not that I am any sort of expert
in this matter.)

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com
If you use logspline estimation (logspline package) instead of kernel density estimation then this is
simple as there are cumulative area functions for logspline fits.

If you need to do this with kernel density estimates then you can just find the area over your region
for the kernel centered at each data point and average those values together to get the area under the
entire density estimate.

