Sorry to beat this into the ground, but the more I read, the more questions I have.
Wolfgang's paper mentions Oracle's calculation of density (page 6 of "Histograms: Myths and Facts"):
* Without a histogram density = 1/NDV?
With a height-balanced histogram density = Ó cnt2 / ( num_rows~ * Ó cnt )å
With a frequency histogram density =1/( 2 * num_rows~ )
I am not exactly sure how close density is supposed to be to the mathematical (or rather, statistical) definition of skew - my biggest problem with statistical skew is that a set with symmetric distribution is not considered skewed at all, but in terms of Oracle histograms, the set could have one or more popular values which would benefit from a histogram bucket.
Also, I was reading Jonathan Lewis's book ("Cost-Based Oracle Fundamentals"), chapter 7 - wow, using SKEW as a column name for an arbitrary table sure makes my research more confusing! *grin*
So how does Oracle define skew? If the method_opt for size SKEW ONLY is not working 100% (Wolfgang mentions for 10.2.0.1, have not verified with 10.2.0.2 yet), does anyone know exactly why? I am very curious what this procedure is doing.
Just to be clear, I am trying to learn more about what exactly "skew" is, as Oracle defines it. In an ideal world, Oracle would not be limited to 254 histogram buckets, and the statistics would be able to describe all the data precisely. As a DBA, I am trying to fill in the gaps between Utopia and reality.
aits - adsd
university of illinois