Friday, July 15, 2005

MDX Questions and Answers


Q. When someone creates a subcube in an MDX statement, where does AS store the subcube internally.  Is there something like the relational tempdb for AS?


A. The subcube subquery is a logical construct, so the data itself is not really "stored" anywhere. Of course, when you execute the query against the subcube, the values are stored in server memory, the same as any query. The definition of the subcube subquery in effect filters the data retrieved by the query. It is not at all like making an extract into a tempdb. One of the big differences between SQL and MDX is that SQL needs a rowset to group. MDX reports create a structure and then retrieve the value for each cell independently (more or less). [Reed Jacobson]


Q. When querying a subcube, is AS able to access and benefit from the underlying cube's aggregations?


A. Absolutely, because it is querying the underlying cube. [Reed Jacobson]


Is there a way to analyze the MDX query execution plan and optimize it like there is for relational queries.  I think the answer is no, it is probably to hard to represent since so much depends on how aggregations happened to be created vs. pre-set indexes on the relational side


It's a totally different process than SQL. You can get clues for performance problems by using the system monitor or profiler, but the whole approach is backwards from SQL. [Reed Jacobson]


Q. One student has encountered difficulty in MDX writing a year-to-date comparison to the same previous year-to-date time period.   For example, they know how to do a current year-to-date query that would take them from the beginning of the year through July 6th.  However, is there an easy way to query for previous year information through previous year July 6th

A. It is a little tricky to write the formula. I typically don't like to use a flag for "the current day" because that doesn't let a user look at the number "as of last monday". But a combination of ParallelPeriod and PeriodsToDate is sufficient. Roughly, the formula is Sum(PeriodsToDate([Year],1,ParallelPeriod([Year],1,[Date].CurrentMember)). [I'm doing this from memory and don't swear by the syntax.] 


My position is that MDX Queries are relatively simple (compared to SQL queries), but that MDX Formulas can be quite difficult. If anything, Yukon is likely to make formulas both more powerful and more difficult :-). [Reed Jacobson]


Wednesday, July 13, 2005

Drivers


Someone in the class was told that drivers for other db's than SQL Server (i.e. Oracle, DB2) would not be supported at RTM.  Someone in the class form MSFT said this is not true.  Do you know the answer off-hand?


[Stacia Misner] It depends on the context, I think. For example, in Reporting Services, there are certain data sources supported out of the box – SQL Server, Oracle, ODBC, and Analysis Services. Report Builder though will only support SQL Server and Analysis Services. That leaves Integration Services – I remember thinking each build since Beta 2 had a different collection of adapters so I didn’t bother memorizing what – wanted to see it stabilize first. Traditionally, DTS supported Oracle natively – don’t recall if it tries to in SSIS. At the very least – you can use OLE DB or ODBC or write your own.


 


[Reed Jacobson] Oracle can be used as a data source from virtually anywhere. (I don't know anything about Report Builder, so if it will only access SQL Server, that's news to me--it would still be easy to create an AS UDM that models the ORACLE source--even as pure ROLAP--and then Report Builder should be able to get to it. The only place I know you can't use ORACLE is as the metadata databases for Reporting Services--those have to be stored in a SQL Server database.


Monday, July 11, 2005

AS Settings


Are there any AS settings that affect whether cubes process query requests  or cube processing using multiple processors.  If not, how is AS in using multiple processors? 


 [Stacia Misner] The first place I’d look would be the server properties in SSMS. Right-click on the server name in the Object Explorer and take a look at the properties there.



[Reed Jacobson] For processing, it's an option in the Process dialog box (that you get when you use the explicit Process command, either from BIDS or from SSMS). You can choose whether to go parallel or sequential. I believe that if you choose parallel, it makes intelligent use of the number of available processors. Regarding queries, my guess would be the same as Stacias.


Monday, June 27, 2005

Questions and Answers


As classes have completed, we've had more questions rolling in. We'll attempt over the next few posts to provide some answers from both Reed and Stacia to give a couple points of view.


 


Question:  In the SSIS demos we associate a config file with our package for the one time we are running it.   What is the mechanism for permanently associating the config with a package?  I am guessing it is something in SQL Server Agent, but could not get my Agent Service to start.


[Stacia Misner] when you use DTEXECUI you can associate a config file by selecting it in a dialog box. You can then generate a command-line with arguments that uses DTEXEC that also references that config file. You basically copy/paste that string into a scheduler like SQL Server Agent. I think the final demo in Module 2 should step you through at least the DTEXECUI piece. So there is no such thing as permanently associating a config file with a package in my mind. It’s what you choose to specify at the time of execution – whether through BIDS, DTEXEC, or DTEXECUI. The beauty is being able to use the same package with different config files as needed.



[Reed Jacobson] What Stacia says about adding an argument to specify the config file is definitely true. But I am suspicious (but haven't tested) that if you put a config file in the same folder and enable configurations and don't include a run-time argument, then it will use the config file. This would be worth confirming. Also, In Beta 2 (when we wrote the course) DTEXECUI would crash on almost anything, so unless Stacia added a lot to the demo scripts, the demo may not go as far as it can now. In one class, we built the argument string (using DTEXECUI) and then opened a command window and ran DTEXEC, pasting in the argument string from DTEXCUI. It ran beautifully. The next step would be to put the DTEXEC + arguments into an Agent job. The only thing you have to watch out for then is what "user" is running the job and with what credentials.


Friday, June 24, 2005

New Questions Part 2


1.   On our training slide we say the OWC is not likely to be continued.  Is this still the case?  Or can people safely create OWC reports and not worry about the immediate future.



I wouldn't worry about the immediate future. It's still the best (& cheapest) for many embedded applications--and they use it in BI Dev Studio. You probably do want to figure out a way to implement Create Subcube to filter (to avoid strange behavior around Where clause), but that's not hard--just grab the connection from the OWC and execute a Create Subcube command. That's what the filter at the top of the OWC in BIDS does to con the OWC into working properly.


 


2. Why is OWC not a thin client.  Do people need to have Excel on their machine to use it?



It's not thin because it needs a) an activeX control and b) PTS. That's about 4-5 MB of download. Users don't need Excel. If they don't have a 2003 licence, they can drill and slice, but not create new ones.


Wednesday, June 22, 2005

Here are some questions I was recently asked


1. Can you describe an example of when a calculated cell would be used?  What about an example having multiple passes?


In 2005, Scripts basically replace calculated cells (although calc cells are still supported). They are alternate mechanisms for accomplishing the same internal behavior. In 2000, one simple way in which I used calc cells was to have an IIF where value if true was a number and value if false was a string—I just made a calc cell for the subcube that needed the string and applied the string as the value. (In 2005, you can have different types for value if true than for value if false so this is no longer necessary, but you could definitely use a calc cell (or script command) to apply a special format (e.g., a background color) to a specific subcube (e.g., members of a heading level).



More typically scripts are intended for financial calculations—one example would be if you need a different formula for net income in the current year (where taxes are not known) than in the prior years where they are finalized. A script command could also apply a security layer, since scripts are applied before security. So, for example, you could use a script command to replace the actual salary at the leaf level with the average salary of the group (without affecting higher level totals), and end users would not be able to see the detail.


A classic example of multiple passes is when you need to calculate a tax amount based on net income—after tax. So you have a circularity that you need to iterate down to a negligible amount.


2. What does the flattened command in DMX do?  Why does DM data need to be flattened?



I don’t know. I don’t write DMX, because the DTS Data Mining Query task can create it for you. It happens to put FLATTENED in, so I let it—in the same way that it puts in PREDICTION JOIN. I don’t have to understand DMX syntax in order to use Data Mining. And to me that’s a good thing. (But you have to put up with my soapbox that to use Data Mining effectively, you do have to understand at least enough statistics to know what would make a good source data set.)


3. What is the meaning of native vs. managed code?  How is that relevant to SSIS work?



Managed code is .Net and uses the Common Language Runtime (CLR). It’s “managed” in the sense that code that uses the CLR exclusively can be marked as “safe”, which means it cannot damage other processes. Older code—VB6, C, C++, etc.—is compiled directly to the operating system machine code (at least indirectly), so it is “Native”. SSIS (and its libraries) was written in C++, which means the code generated is “native” or “not .Net”. In order to access the libraries from a .Net language, a “wrapper” is needed—and provided. In principle, “native” code is “faster” than “managed” code—because there’s a layer of overhead missing, and that’s the ostensible reason SSIS is written using native code. But Microsoft encourages managed code (partly for ease of writing and partly for security), and even if you write a custom transformation, you can do it using C# or VB.Net and it will be “almost” as fast as if you used native code.


My own opinion is that it’s easier to use the SSIS libraries if you keep in mind that they are not fundamentally .Net code, so you can set aside some expectations of how a typical .Net object model works when using the SSIS libraries. Most .Net developers I’ve talked with find the SSIS library functions confusing (more confusing, for example, than Analysis Services’ AMO libraries).


4. Why do you say that there is no SSIS server?  What runs the packages?



When you write a SSIS custom transformation and are testing it in debug mode, the SSIS designer runs the package. You can debug an SSIS package without having the service running. But you can’t attach a DLL (your custom transformation) until the package is running. That’s why you need to put a breakpoint into the package and get it started before you can attach your DLL to the debug process. This is different from SSAS, where even during design time, you can’t browse data until you deploy (perhaps to a test server), and it’s the msmdsrv process that serves the data, so on the one hand, you can’t debug your SSAS User Defined Function DLL without starting the SSAS Service, but on the other hand, you can attach your DLL to the msmdsrv process without having to pre-set a breakpoint in the cube somewhere.


It was my understanding (in the Beta 2 timeframe) that the SSIS Server service did not need to be running in order to run the package (using DTEXEC or DTEXECUI), and that the service allowed you to a) kill running jobs if necessary and b) manage storage of packages either in MSDB or a “relative” file system. But back then, I had a hard time getting DTEXECUI to keep from crashing at all anyway. In the more recent CTP builds, where DTEXEC actually works, it appears that the service does need to be running in order to execute a package, but I’m not 100% sure of that.


5. In DM slides you discuss how statistics creates the model then looks for the statistics and in DM it is the reverse.  Could you please provide a brief example so I can get a better sense of what this means.



In the statistics class I took as part of my MBA program, we had to build regression models. We had to first think of logical relationships, second create a mathematical model to represent the relationship, and finally test the model by running data through it. When we tried to shortcut the process and run regressions against the data in search of good correlation scores (and then made up a logical “reason” afterwards), the professor accused us of “data mining”. The problem with “data mining” was (and is) that you can find correlations in historical data that may not have predictive value. (As your mutual fund prospectus states: “Past performance is no indication of future results” or as the slide says, quoting my statistics professor “Correlation does not imply causation.”)


Data mining models today do exactly what we were told not to do back in the late Jurassic period when I was in B-school. They look for correlations in the data and create coefficients for the models based on that data. One big difference is that the models are themselves designed to look for specific types of correlation (note for example Association model for Market Basket analysis), but it is still important to test the model (perhaps by splitting the available historical data in half) and to think about the logical basis for the correlations.


6. In the old AS2K course we created ROLAP aggs and went out to the db to see them.  This was great for helping people to understand what an aggregation is.  I tried doing this in our AS Core demo in Yukon, but could not see the aggs.  Do you know how to do this?



I couldn’t get ROLAP aggregation tables either, but I haven’t tried since Beta2. I agree that they were useful for instructional purposes, but not for production—so I would have mixed feelings if it really is not possible in RTM.


7. When discussing the idea of the UDM, a key characteristic mentioned is that it can decide whether of not to pull data from the cubes or the relational db.  Does this decision process occur in any situations when proactive caching has not been enabled?



It really doesn’t have a lot to do with proactive caching. When you create a dimension or measure group, you can specify (in the “UDM”) whether storage should be ROLAP or MOLAP. That’s what makes the fundamental decision.


It is true that when you enable pro-active caching, it’s possible for even a query against a MOLAP measure group to get redirected to the relational source, but I can’t imagine that being desirable in very many situations. My fundamental belief about query speed is that consistency is even more important than speed. So if a query takes 2 seconds 95% of the time, but occasionally takes 20 minutes, that’s worse than taking 4 seconds 100% of the time. That’s one of the problems I have with (data driven) proactive caching on a measure group—if it increases the risk of fluctuation in query speed, I won’t want to use it (very often). If you can force it to keep using the old MOLAP cache until the new one gets completed, that would make it more acceptable to me. My informal surveys lead me to believe that it’s a relatively small percentage of applications that require the level of “real time” update that would require data driven proactive caching on measure groups.


8. Is the UDM really just a fancy\new term for a cube?  Or is there something different between a cube and a UDM?



My understanding is that the goal in 2005 is to be able to put the entire data model into one logical cube, in which case cube = UDM. If, for some reason, you split the data into multiple cubes, I would think that the combination of cubes would = UDM. I have been trying to find a case where it is truly impossible to put an entire data model into a single cube. The only examples I’ve seen have been cases where it could have been put into a single cube. In the April CTP, Adventure Works (the SSAS model) has a separate cube for some data mining models. I don’t know whether that is permanent and/or necessary.


[This part is a total tangent from your question, but whatever…] The real situation where I think you would want to separate the data model into multiple cubes is one where a large enterprise wants to have regional servers with data specific to each region, and then a corporate server with high-level aggregations from all the regional servers. I don’t think this would be possible to model with a single cube or even with multiple cubes in a single UDM—because you don’t want all the data on a single server; you want it distributed. I think you would need to create two logical models ( = UDM = Cube), one for the region and one for the corporate, and create several physical instances of one (one for each region) and one instance for the corporate summary. But this would not allow for seamless browsing from a single client PivotTable. It might be interesting to think through whether this scenario would be possible with a single UDM, using multiple distributed partitions for the regions—but I haven’t gotten that far yet.


9. Is there a way for an attribute hierarchy to be excluded from the "real" cube.  For some reason I thought I saw somewhere that there is, but I am not seeing how you could have an attribute hierarchy that is not in the cube.



I’m not quite sure what you mean. If you want to exclude a dimension from some measure groups while keeping it in other measure groups, you do that in Dimension Usage. If you want an attribute to be part of a dimension, but don’t want it to be a structural part (of any measure group), then you set AttributeHierarchyEnabled = False for that one attribute.


If a dimension is associated with a measure group, the granularity attribute and all attributes that are (directly or indirectly) member properties of the granularity attribute are associated with that measure group. You can make an attribute unavailable for physical storage (i.e, for inclusion in an aggregation), by setting the AggregationUsage property of the attribute (in the Cube Definition) to None—but that (unfortunately) applies to all instances of the attribute in the entire cube, not just a specific measure group (or better, a specific partition).


You could use Perspectives (or Security) to make an attribute hierarchy invisible, but that is logical, not physical, so it would still be part of the “real” cube. I think it’s better to think of dimensions  


10. If there are 30 attribute hierarchies in a dimension, is that like having 30 dimensions in the old AS2K?  Is there tremendous data explosion with something like that?



It’s more like having 30 levels in AS2K because (by default) in AS2K each level was included in the structure of the data partition and was available for inclusion in an aggregation. In AS2K, a cube with a lot of levels would require a lot of time for designing aggregations, even if the size of the final cube wasn’t that big, just because AS has to consider all the possible permutations when designing aggregations for a given amount of disk space. In 2005, it’s a lot easier to have tons of attributes than it was in 2K to have tons of levels, which is why so many attributes are left out of the right column of the aggregation design wizard—just to keep the number of permutations manageable. Data explosion is still limited by the aggregations you design (i.e., Max size of data limit in Aggregation Wizard). The place I can see a potential increase is in the size of a single data partition “row”. In AS2K, a data partition row included a key for each level in the cube. If the same is true in 2005, the row for a measure group data partition could be a lot bigger because there might be a lot more attribute hierarchies than there were levels in 2K. But there is no documentation (for either 2K or 2005) on the internal data structure, and I haven’t tried hacking the files to figure out the structure (yet) in 2005.


11. Why are min and max considered to be pseudo-additive measures?



My definition of pseudo-additive is that it has all the same benefits as Sum (i.e., can take advantage of strategic aggregations), but is not technically addition. Min and Max definitely can be included in strategic aggregation tables, but they are not technically addition. I just used to loosely talk about “additive measures” as including anything that could have strategic aggregations, but too many people would get confused about referring to Min, Max, weighted average, etc, as “additive” so I made up “pseudo-additive” as a way to say the same thing without being technically incorrect.


12. Can you please explain the purpose of the xml.defaults statement in the AMO code set.  What is the outcome of setting this to true vs. false.



Do you mean the CaptureXML property? Setting that to True means that an Update statement generates the XML (as when creating an AS project in Visual Studio), while setting it to False means that an Update statement makes the change on the deployed database (as when attaching to a database from Visual Studio—which I think is generally a very bad idea.)


13. How would you explain why AS was completely re-written.



In one sense, the underlying conception is the same—the way strategic aggregations work. So to say it’s completely re-architected is a slight overstatement.


But to answer your question, AS2K dimensions were completely founded on levels in a hierarchy. That is a very rigid structure. One example I like is Calendar vs. Fiscal hierarchies. While the Year and Quarter levels may be different, often the Month and Day levels are identical between the two hierarchies. In AS2K, this was simply impossible to properly model. Another example is with non-orthogonal attributes—such as Size vs Color: both are attributes of a Product, but they don’t form any kind of natural hierarchy, so in AS2K, there was no way to recognize any relationships that might actually exists (for example, that size 44 exists only for Red and White, but not for Black or Gray).


In 2005, a dimension is founded on a leaf-level key with additional attributes of that key. Those attributes may or may not form hierarchical relationships. This is a very flexible structure. You can easily model both the Calendar/Fiscal and the Color/Size problems I mentioned earlier. With that flexibility comes some additional complexity, but based on the changes in Apr CTP vs Beta 2, many of those complexities are likely to be handled by default, rather than by manual work.


 - Reed



 


Tuesday, May 10, 2005

SSIS vs Bulk Load


Based on 2000 experience with DTS, a lot of people prefer set operations or bulk load over transformations. One big benefit of SSIS transformations is that you can execute several operations in one pass--which you can't do with set operations. And if you use the SQL Server Destination adapter (and native SQL connection manager), you get speeds much faster than OLEDB--in fact you can get virtually identical speed to bulk load. One thing to watch out for is that when using the SQL Server Destination, the package must run on same machine as destination database.


Reed


Wednesday, May 4, 2005

When is Ragged Right right for Fixed-length files?


It's natural to expect that if you have a Fixed Length file, that you should choose the SSIS Fixed Length file type. But Fixed Length is really for binary files, where there are no CRLF characters at the end of the lines. If you have a fixed length text file, and want to eliminate the delimiter, use Ragged Right flat file, and for the final column, specify CRLF as the delimiter.


- Reed


Monday, May 2, 2005

Diagrams are Back!


One of the most (ahem) active threads in the SQL 2005 beta newsgroups was discussion around the loss of the trusty database diagramming tool, formerly found in SQL Enterprise Manager. I am happy to report that as of the April Community Technical Preview build, diagrams are back in the SQL Management Studio. Even though Microsoft has stated that the primary emphasis for diagramming will be with both Whidbey and with Visio Enterprise Architect, it’s nice to have this often used functionality returned to the tool-set that many of us use day to day to create or understand Star or Snowflake Schemas. While there are many tools that offer more extensive diagramming feature sets, the integration and ubiquity of the Database Diagramming Tool makes it invaluable. Personally, I use it to visually understand the layout of Star and Snowflake schemas that I am becoming familiar with. I also use it to create prototype relational databases for proofs of concepts.


Feature-wise, it seems to be much the same as the previous version. I’ve noticed two differences worth mentioning (so far):


1)  The procedure for adding related tables is slightly different. In the SQL 2000 version this was accomplished through the add-table dialogue; now it is accomplished by selecting a table in the diagram and clicking the “Add Related Tables toolbar icon”


2)  The ability to open tables is no longer part of the diagramming tools. That functionality is available as part of SMS, just not directly from within the Diagramming Tool as in the SQL 2000 version.


Kudos to the SQL Server Development Team for incorporating our feedback around the Diagramming Tool.


- Tom Huguelet


 


Tuesday, April 26, 2005

DTS Service Benefit


I've sometimes wondered what the real benefits of the DTS Service aulkre--aside from managing the server storage and monitoring jobs. I just found out that the DTS Service caches the enumeration of components. Turning on the service keeps the enumerations from having to happen each time you run the package. For some people this has made orders of magnitude difference in execution speed.


- Reed


Wednesday, April 6, 2005

Link to Unknown Member Blog


Christian Wade from Conchango has an excellent blog entry on issues relative to the Unknown Member. He makes a compelling case for handling unknown members in the ETL whenever possible, and describes an excellent way of keeping clear which fact records need to be reprocessed.

 


 

- Reed

 

Thursday, February 24, 2005

Hybrid MOLAP/ROLAP dimensions


Some dimensions—if you push them to their physical extreme, something we never did in 2000—get very, very large at the bottom. As a simple example, you could have 50M customers. In 32-bit AS2K you would never dream of making that a dimension—or at least not a MOLAP dimension—and you wouldn’t want to make it into a ROLAP dimension either, because that would eliminate any MOLAP data storage anywhere in the cube. In 2005, I can easily imagine a situation where I would want the most aggregated attributes (Country, State, City) of a user hierarchy to be MOLAP—so that they can support good MOLAP aggregations—and the bottom attributes (such as Customer, or even Order ID) as ROLAP, so they don’t require MOLAP processing.


The developers imagined it as well, but apparently couldn’t get it to work in the Yukon time frame. So the whole dimension has to be either MOLAP or ROLAP. In the first place, my experience with large ROLAP dimensions used with a MOLAP cube is very limited (which is to say, not yet existent), so it may not be all that much of an issue.


If it turns out that a truly “hybrid” dimension is desirable, there are a couple of ways I can think of for creating the hybrid. One is to split the dimension in two, with the higher-level (MOLAP) one a reference dimension by way of the lower-level (ROLAP) one. Another is to create a many-to-many dimension to connect the two. It will be interesting to experiment with different alternatives in specific scenarios.


-- Reed


Wednesday, February 23, 2005

Drill Through and Fact Dimensions


Fact Dimensions did not work in Beta 2. At the time I assumed that they were just “degenerate” dimensions, so that you could create dimensions directly from low-cardinality values stored directly in the fact table—such as whether a row was part of a promotion or something like that. I also assumed (without ever really looking at it closely) that Drill Through was basically the same as in AS 2000—a SQL Query retrieval of the rows that contribute to a specific cell.



In the Dec CTP, they fixed fact dimensions—and it turns out that they are completely different from what I had thought. A fact dimension is essentially using the fact table itself as a dimension—with multiple possible attributes—and it is the key to getting Drill Through to work.


Suppose that you had a PO number as a field in the fact table, but didn’t want to include that in any dimension. In 2000, you could retrieve the PO directly from the fact table via Drill Through and a SQL Query. In 2005, all Drill-Through requests go to the UDM (which actually seems internally consistent with the concept of the UDM). But that means that any column from the fact table that you want returned in the Drill Through result has to be part of a dimension. Hence the Fact dimension. First add a primary key to the fact table. (You can, in principle, use a combination of columns to get a unique key, but I think it’s good practice—and I’ve had better success—just adding an AutoNumber primary key column.) You then create a dimension using the fact table primary key as the key, and add any other columns from the fact table (such as the PO number) as attributes. Then make the dimension ROLAP storage. Once you add the dimension to the cube (and probably hide it from the users), you can include any of its attributes in the Drill Through Action definition. It actually worked quite well for us. Good luck.


--Reed


Tuesday, February 22, 2005

Proactive Caching


Another question that came up at the Issaquah Ascend class was related to Proactive Caching. Specifically, where is it stored? In case you haven’t heard about it yet, this is one of the interesting new features of Analysis Services 2005 that really changes maintenance strategies from the way you currently keep cubes up-to-date in AS2000. It’s a huge step towards real-time OLAP without the traditional issues associated with ROLAP storage. With proactive caching, you have MOLAP storage just as you always did with AS2000, but now you also have relational storage. (So unlike AS2000, where in theory you can toss your source database and run solely off of the cube – assuming you never wanted to update it again – you must maintain a relationship to the relational source in 2005 in order for proactive caching to work).


As changes to the relational source occur, updates to the MOLAP cache also occur—eventually (based on how you configure triggering events and frequency). However, as with AS2000, you still need to provide a query for the partition that correctly identifies new rows in the relational source. In theory, you could have two versions of the MOLAP cache concurrently - the cache responding to queries and the cache that is being rebuilt to conform to the changes in the relational source. When the rebuild is complete, the obsolete cache goes away.


- Stacia


Tuesday, February 1, 2005

Gaining Perspective on Perspectives


One of the questions that came up in the Ascend class held in (rainy) Issaquah in January:  What happens if you have the same perspective name across multiple cubes in the same database? It seems logical that it should not be possible to create multiple perspectives with the same name inside the same cube, but would that also be true if the perspectives were in different cubes in the same database? We didn’t take the time to actually try this out in class, but I surmised that it wasn’t possible. Afterwards, I tested this out once in the BI Development Studio (BIDS). I also tested by running the Deployment Utility against the .asdatabase file after I modified the file manually to add two perspectives with the same name in different cubes. Whether using BIDS or the utility, I was politely informed that the perspective name that I tried to reuse was already in use and stopped from continuing.



This behavior seems reasonable and shouldn’t surprise anyone. But perhaps a more interesting question that stems from the original question: why would you want multiple cubes in the same database? In AS2000, it was not uncommon to have multiple cubes in the same database because there was a one-to-one relationship between a fact table and a cube. Now in AS2005, you can have one cube with multiple fact tables (using measure groups) and now the one-to-one relationship is between a fact table and a measure group. The vision is now to have a “one cube fits all” solution – something I would never have believed I would find myself saying, having reeducated a lot of clients about the perils of having too much in a single cube.  To be honest, I haven’t pondered all of the ramifications of this new design approach, but as of yet can’t come up with an argument against having a single cube. I’m looking forward to following the certain debate to come regarding the pros and cons of One Cube.



So with everything in one cube, it can sure get confusing for an end user to navigate that cube. Enter perspectives to save the day. To my front-end tools of choice, the default perspective of the cube (which includes everything) appears side-by-side with any additional perspectives that were added to the cube. The idea of perspectives is to filter a cube to show only dimensions with measure groups that make sense together – whether oriented around subject matter or end user roles. It’s important to note that there is no security on perspectives - cube and dimension level security will be honored, but you can’t limit certain users/groups to certain perspectives. BOL explicitly states that perspectives are not intended to be part of the security architecture. By pointing this out, I provoked a stimulating conversation on the subject of security in the class. The bottom line is that the analogy between virtual cubes and perspectives falls apart when it comes to security. Be sure to consider the impact of this constraint when you plan the migration of your AS2000 cubes.



-- Stacia


Thursday, January 27, 2005

Attributes queried separately


You may notice that when AS2005 processes a dimension, it creates a separate query for each attribute. One reason for doing this is that it allow you to encorporate data from different data sources. It will, however, sometimes query for multiple attributes--particularly at the leaf level--and in a snowflaked dimension, that may result in SQL Join clauses.

What it is really doing is a creating single query for all columns "related" to one attribute. That would include the Key, the Name, and all Member Properties. In a generic (manually created) dimension, all attributes are defined as member properties of the key attribute. In the sample AdventureWork database Product dimension, you can see a join at the leaf level between Product and Subcategory. The Category member property, however is moved from the key attribute up to the SubCategory attribute, so you should see the join in the query for Subcategory.


The developers told me that one of the ways that IntelliCube (or whatever its name ends up being) will assign member properties is based on same-table of snowflake dimension (and, if I remember correctly, that portion of Intellicube already works in Beta 2). You can, of course, manually move the member properties as needed. I would think that if a member property causes a join in a properly snowflaked dimension, it would indicate that you should move the member property to a higher level--to the attribute that is the PK of the higher table.


-- Reed