Friday, October 13, 2006

Combining Dimensions [Reed]


The way AS2005 manages attributes in a dimension changes the design dynamic a bit: Which attributes should be combined into a single dimension? I've run into this issue from two sides.

On the one hand, we had a Project dimension with a lot of changing attributes. For example, the Project Manager would change frequently. To avoid dealing with creating lots of Type II instances of the Project, we decided to pull the Project Manager off as a separate key in the fact, making it a "Rapidly Changing Dimension". In 2000, this was often a very effective strategy. It's simple and easy to implement, but in 2005, we found that query performance was very bad. Even thougy the PM to Project relationship is very sparse, each crosstab query had to do a cartesian product and find the empty intersections. When we moved the PM back into the Project dimension, AutoExist suddenly made it really fast to find the projects for a given manager, or the managers for a given project.

Conversely, I had a client with lots of customers and lots of products, but where the relationship between customers and products was very stable and very sparse. It seemed bizarre, but we explored putting both customers and products into a single dimension, with the root key as the Customer/SKU combination. Again, the query performance gain was substantial, especially when looking for which customers buy which products, etc. In 2000, I would never have considered putting Customer and Product into a single dimension, but given the high cardinality, stable relationship, and inherent sparsity, it seems to be a really good idea in 2005.

- Reed

No comments: