Thursday, July 27, 2006

MDX and NON EMPTY


A few times over the past couple of months I have run into a situation where removing NON EMPTY from a slow MDX query radically improved the response time. The symptom was clear, but not predictable. Earlier this week, we were able to pin down at least one specific scenario that definitely triggered the NON EMPTY slowness.


We were using Excel (2003) as the client, and had a large set (customers) on rows. Excel's default NON EMPTY behavior was filtering away the uninvolved customers. Everything worked well until we added a second measure. The query went from 1 second to 10+ minutes (or so). The measures were both calculated members that did a direct reference to base measures, and they did have the NonEmptyBehavior property set to the base measure.


After some troubleshooting, we observed that the speed hit happened only when the measures were on the Rows axis. If we put the measures on the Columns axis (i.e., a different axis from the large set), it went back to being really fast. The reason adding a second measure slowed it down was because with one measure, Excel puts the measure into the Where clause, but with two, it defaults to put the "Data" on rows.


So the solution (aka workaround) is to simply add the multiple measures to the report before adding the large dimension. With two measures on the report, you can move the measures to columns and then add the large dimension. It's a bit of a pain, but the difference between 10 minutes and 1 second is worth it.


My speculation is that 2005 is trying to optimize NON EMPTY behavior so that it's similar to the speed benefits of NonEmptyCrossjoin. If it gets the optimization right (which happens when the measures are on a different axis), it's blazingly fast. If it gets the optimization wrong (which happens when the measures are on the same axis), then the calculated member is evaluated for each member of the large set and it's distressingly slow.


- Reed Jacobson


No comments: