Aggregation Size Estimate Dependency on Fact Data

Topics: Resolved
Apr 9, 2010 at 2:44 PM


On top of the aggregation manager, there's a line with information about the estimated size of an aggregation. This estimate is given in percent of the fact data. There's a brief explanation about this estimate on the website:

The instantaneous aggregation size estimate (which shows at the top of the screen) does as good a job as is possible estimating the size based upon the EstimatedCount of dimension attributes, the EstimatedRows of the measure group, and the number of dimension keys included in the aggregation. (If the EstimatedCount/EstimatedRows properties are not set or are not up to date, then use the Update Estimated Counts feature.) When it shows a range of estimated sizes, the uncertainty is due to the unknown sparsity when you crossjoin multiple dimensions. For exact aggregation sizes, deploy the aggregation design, run ProcessIndex, then use the Physical Aggregation Sizes screen which can be opened via the context menu of an individual partition in Agg Manager.

I would really like to know what exactly that number means, and why it is given relative to the fact data size. Is the space requirement for an aggregation not clear just by the number of measures in the cube, and the number of members per included attribute? Isn't it independent of the amount of records in the fact table? Couldn't it be calculated precisely in advance instead of just giving an estimate? Maybe I am misunderstanding something here. Could someone clarify a bit how this number is calculated and why it has to be like it is?



Apr 10, 2010 at 1:35 PM

It's definitely showing an estimate. The reason is that one of the guiding principles of SSAS is that non-data shouldn't take up any space on disk. So let's say you've got a cube with a Store dimension with 100 members and a Date dimension with 365 days. If you built an agg at the Store/Day grain, it could theoretically be 36,500 rows. But not every store has data every day. And SSAS doesn't store a row that has no data. That's what the word "sparsity" means when applied to cubes. So simplying multiplying the estimated counts of dimension attributes will greatly overstate the real agg rowcount.

Anyway, we couldn't know the exact agg rowcount without running a SQL query or MDX query. And that wouldn't be desirable.

If you want to see the real agg rowcount, expand down to a partition and look at the Physical Aggregation Sizes menu option. Unfortunately that screen only shows the rowcount differences, and it doesn't factor into the percentage that the agg might have dimension keys.

Just to be clear, it helps me to think that an agg is worthwhile only because it is smaller than the fact table. And it can be smaller than the fact table because it has fewer rows, or fewer columns (i.e. dimension keys), or both.