Friday, July 28, 2006

New Existing operator


I just found out about the Existing operator. You put it in front of a set and it filters the set by using all the current members in the current context. This simplifies using Attributes without User Hierarchies.


For example, suppose you need to do the equivalent of the LastChild aggregation function, but with the Standard Edition: Use the Existing keyword to find the last period. The issue is that without a multi-level time hierarchy, how do you know which is the "last" month in the current filtered space. So Existing Time.Month.Members is essentially equivalent to Descendants(Time.CurrentMember,Time.Month), except that it doesn't require a User Hierarchy. Thus Tail(Existing Time.Month.Members) give the last month in the current filter set. (Note that this doesn't give the Last non-empty member, just the Last Child.)


- Reed Jacobson


Visual Totals and Row Filters


I was troubleshooting with a client about some slow queries using Excel (2003) as a client for AS2005.


We had a large (customer) dimension on rows, and measures on columns. We added a second customer attribute (region) to rows. The pivot table returned very fast. As soon as we filtered by a region member (one or more), the report slowed to a crawl. Putting the region into the slice (filter) area and then selecting one or more items was fine, but with region on rows, it was slow. The problem is that users want to see the region next to the customer name.


Turning off Visual Totals gave fast results as well, but that allowed unwanted customers to make it through the filter, and the grand total was not what we wanted. In effect, we wanted the meaning of Visual Totals (with Region included on rows), with the performance of Region in Slicer (or no Visual Totals).


The best workaround we could come up with was to create two instances of the Region attribute, so there's a regular Region attribute (which goes on rows) and a Filter Region attribute (which goes into slicer). Filtering by the one in the Slicer area allows the report to be fast, even with Visual Totals, but still lets the user see the regions in the body of the report. We had to create similar parallel attributes for any customer attribute the user might want to show on rows--while simultaneously using as a filter.


My speculation as to the reason for this is that if the region filter is in the Slicer, it's the core cube "pyramid" engine that does the operations, and it's fast. Whether Visual Totals is specified or not is irrelevant as long as the Region is in the Slicer, because no "total" is being re-calculated by MDX. If the Region filter is in rows, then the All member of the Region attribute (which is implicitly there for every customer) has to be recalculated as a Visual Total by MDX, and that's really slow. Any time the MDX has to calculate for each member of the Customer hierarchy (whether it eventually gets filtered away by Non Empty or not), the overall report is slow. If an MDX expression only has to calculate for the rows that end up in the final report, it's really fast. The moral for me is to keep the high-cardinality operations in the Pyramid engine and restrict the Calculation engine to relatively small scope.


- Reed Jacobson


 


Thursday, July 27, 2006

MDX and NON EMPTY


A few times over the past couple of months I have run into a situation where removing NON EMPTY from a slow MDX query radically improved the response time. The symptom was clear, but not predictable. Earlier this week, we were able to pin down at least one specific scenario that definitely triggered the NON EMPTY slowness.


We were using Excel (2003) as the client, and had a large set (customers) on rows. Excel's default NON EMPTY behavior was filtering away the uninvolved customers. Everything worked well until we added a second measure. The query went from 1 second to 10+ minutes (or so). The measures were both calculated members that did a direct reference to base measures, and they did have the NonEmptyBehavior property set to the base measure.


After some troubleshooting, we observed that the speed hit happened only when the measures were on the Rows axis. If we put the measures on the Columns axis (i.e., a different axis from the large set), it went back to being really fast. The reason adding a second measure slowed it down was because with one measure, Excel puts the measure into the Where clause, but with two, it defaults to put the "Data" on rows.


So the solution (aka workaround) is to simply add the multiple measures to the report before adding the large dimension. With two measures on the report, you can move the measures to columns and then add the large dimension. It's a bit of a pain, but the difference between 10 minutes and 1 second is worth it.


My speculation is that 2005 is trying to optimize NON EMPTY behavior so that it's similar to the speed benefits of NonEmptyCrossjoin. If it gets the optimization right (which happens when the measures are on a different axis), it's blazingly fast. If it gets the optimization wrong (which happens when the measures are on the same axis), then the calculated member is evaluated for each member of the large set and it's distressingly slow.


- Reed Jacobson