Thursday, February 24, 2005

Hybrid MOLAP/ROLAP dimensions

Some dimensions—if you push them to their physical extreme, something we never did in 2000—get very, very large at the bottom. As a simple example, you could have 50M customers. In 32-bit AS2K you would never dream of making that a dimension—or at least not a MOLAP dimension—and you wouldn’t want to make it into a ROLAP dimension either, because that would eliminate any MOLAP data storage anywhere in the cube. In 2005, I can easily imagine a situation where I would want the most aggregated attributes (Country, State, City) of a user hierarchy to be MOLAP—so that they can support good MOLAP aggregations—and the bottom attributes (such as Customer, or even Order ID) as ROLAP, so they don’t require MOLAP processing.

The developers imagined it as well, but apparently couldn’t get it to work in the Yukon time frame. So the whole dimension has to be either MOLAP or ROLAP. In the first place, my experience with large ROLAP dimensions used with a MOLAP cube is very limited (which is to say, not yet existent), so it may not be all that much of an issue.

If it turns out that a truly “hybrid” dimension is desirable, there are a couple of ways I can think of for creating the hybrid. One is to split the dimension in two, with the higher-level (MOLAP) one a reference dimension by way of the lower-level (ROLAP) one. Another is to create a many-to-many dimension to connect the two. It will be interesting to experiment with different alternatives in specific scenarios.

-- Reed

Wednesday, February 23, 2005

Drill Through and Fact Dimensions

Fact Dimensions did not work in Beta 2. At the time I assumed that they were just “degenerate” dimensions, so that you could create dimensions directly from low-cardinality values stored directly in the fact table—such as whether a row was part of a promotion or something like that. I also assumed (without ever really looking at it closely) that Drill Through was basically the same as in AS 2000—a SQL Query retrieval of the rows that contribute to a specific cell.

In the Dec CTP, they fixed fact dimensions—and it turns out that they are completely different from what I had thought. A fact dimension is essentially using the fact table itself as a dimension—with multiple possible attributes—and it is the key to getting Drill Through to work.

Suppose that you had a PO number as a field in the fact table, but didn’t want to include that in any dimension. In 2000, you could retrieve the PO directly from the fact table via Drill Through and a SQL Query. In 2005, all Drill-Through requests go to the UDM (which actually seems internally consistent with the concept of the UDM). But that means that any column from the fact table that you want returned in the Drill Through result has to be part of a dimension. Hence the Fact dimension. First add a primary key to the fact table. (You can, in principle, use a combination of columns to get a unique key, but I think it’s good practice—and I’ve had better success—just adding an AutoNumber primary key column.) You then create a dimension using the fact table primary key as the key, and add any other columns from the fact table (such as the PO number) as attributes. Then make the dimension ROLAP storage. Once you add the dimension to the cube (and probably hide it from the users), you can include any of its attributes in the Drill Through Action definition. It actually worked quite well for us. Good luck.


Tuesday, February 22, 2005

Proactive Caching

Another question that came up at the Issaquah Ascend class was related to Proactive Caching. Specifically, where is it stored? In case you haven’t heard about it yet, this is one of the interesting new features of Analysis Services 2005 that really changes maintenance strategies from the way you currently keep cubes up-to-date in AS2000. It’s a huge step towards real-time OLAP without the traditional issues associated with ROLAP storage. With proactive caching, you have MOLAP storage just as you always did with AS2000, but now you also have relational storage. (So unlike AS2000, where in theory you can toss your source database and run solely off of the cube – assuming you never wanted to update it again – you must maintain a relationship to the relational source in 2005 in order for proactive caching to work).

As changes to the relational source occur, updates to the MOLAP cache also occur—eventually (based on how you configure triggering events and frequency). However, as with AS2000, you still need to provide a query for the partition that correctly identifies new rows in the relational source. In theory, you could have two versions of the MOLAP cache concurrently - the cache responding to queries and the cache that is being rebuilt to conform to the changes in the relational source. When the rebuild is complete, the obsolete cache goes away.

- Stacia

Tuesday, February 1, 2005

Gaining Perspective on Perspectives

One of the questions that came up in the Ascend class held in (rainy) Issaquah in January:  What happens if you have the same perspective name across multiple cubes in the same database? It seems logical that it should not be possible to create multiple perspectives with the same name inside the same cube, but would that also be true if the perspectives were in different cubes in the same database? We didn’t take the time to actually try this out in class, but I surmised that it wasn’t possible. Afterwards, I tested this out once in the BI Development Studio (BIDS). I also tested by running the Deployment Utility against the .asdatabase file after I modified the file manually to add two perspectives with the same name in different cubes. Whether using BIDS or the utility, I was politely informed that the perspective name that I tried to reuse was already in use and stopped from continuing.

This behavior seems reasonable and shouldn’t surprise anyone. But perhaps a more interesting question that stems from the original question: why would you want multiple cubes in the same database? In AS2000, it was not uncommon to have multiple cubes in the same database because there was a one-to-one relationship between a fact table and a cube. Now in AS2005, you can have one cube with multiple fact tables (using measure groups) and now the one-to-one relationship is between a fact table and a measure group. The vision is now to have a “one cube fits all” solution – something I would never have believed I would find myself saying, having reeducated a lot of clients about the perils of having too much in a single cube.  To be honest, I haven’t pondered all of the ramifications of this new design approach, but as of yet can’t come up with an argument against having a single cube. I’m looking forward to following the certain debate to come regarding the pros and cons of One Cube.

So with everything in one cube, it can sure get confusing for an end user to navigate that cube. Enter perspectives to save the day. To my front-end tools of choice, the default perspective of the cube (which includes everything) appears side-by-side with any additional perspectives that were added to the cube. The idea of perspectives is to filter a cube to show only dimensions with measure groups that make sense together – whether oriented around subject matter or end user roles. It’s important to note that there is no security on perspectives - cube and dimension level security will be honored, but you can’t limit certain users/groups to certain perspectives. BOL explicitly states that perspectives are not intended to be part of the security architecture. By pointing this out, I provoked a stimulating conversation on the subject of security in the class. The bottom line is that the analogy between virtual cubes and perspectives falls apart when it comes to security. Be sure to consider the impact of this constraint when you plan the migration of your AS2000 cubes.

-- Stacia