Thursday, December 23, 2004

Thin Client OLAP tool


There's a new, free, thin client OLAP tool available for download from the microsoft.com site. It's developed by the Grape consortium in asia. I haven't had a chance to look at it very closely, and I don't think it works yet with Yukon, but still seems to warrant a look, particularly since it's now a Microsoft download. To find it, Google for "OLAP GrapeCity Microsoft" and look for the reference at microsoft.com.


- Reed


Monday, December 20, 2004

Using Scripts to Simulate High-level Write-Back


Data write-back in Analysis Services has always been a bit frustrating because it's ultimately limited to leaf-level storage. (Not that leaf-level data write-back is not extremely useful--think of the millions of budgets that are input by using Excel spreadsheets--leaf-level data input without the benefits of an OLAP engine.) At any rate, Yukon write-back is still going to be ultimately leaf-level. One option for efficient high-level write-back is to "allocate" the values for each level to a single "dummy" member. But there's another option that we used occasionally in AS2K that would be even easier to implement in Yukon.


In AS2K, you can create Calculated Cell expressions that change the value of a single cell. So you can simulate data write-back by writing a calculated cell that assigns the target value to the specific single-cell subcube.


In Yukon, you get the same effect as a Calculated Cell by simply executing a script assignment. So, in a hypothetical cube with two non-measure attributes--account, and period, if the user enters the value 1000 for March Net Profit, you could "store" that value by using the simple assignment statement


([March], [Net Profit], [Amount]) = 1000;


Over the course of a session, your code collects these assignment statements, and at the "commit" portion, exports them to a process that parses, allocates and loads into a processable fact table.


-- Reed


Wednesday, December 15, 2004

Writing back descriptive data


OLAP is fundamentally numeric. Through "official" data write-back, you can't write comment values to a cell. That doesn't mean your app can't do it on the side


Inside an app, you could capture the tuple of the cell that gets the comment, convert the tuple into a unique relational key, and store the comment with the tuple. You then do the reverse on query.


Emir pointed out that you could accomplish the same effect completely within the cube: create a "comments" dimension that begins essentially empty. Then use dimension write-back to create the unique key for a tuple and add it as a new dimension member, with the comment text as the "name" of the member.


You could then create an action on the cell that looks up the unique value--perhaps using a User Defined Function (UDF).



- Reed


Thursday, November 4, 2004

CrossJoin vs SubCube


In Yukon MDX, you can crossjoin two sets by simply enclosing them in parentheses. In a way, it's like using the CrossJoin function without using the word. In other words, you just list the sets as if they were parameters, but leave the function name off.


But that's not really what's happening. Parentheses form a subcube. A subcube is very similar to a crossjoin, but with slight differences. With the Crossjoin function, you can't have the same hierarchy appear more than once, but with a subcube (that is, with the parentheses alone), you can use a specific member of one or more hierarchies, and then use the Tree() function to fill in all the remaining attribute hierarchies.


Thus, the expression Crossjoin([Bikes],Tree([Product])) is invalid, because the Category attribute appears both in [Bikes] and in Tree([Product]), but the expression ([Bikes],Tree([Product])) is valid, because the subcube operator just "fills in" the unused hierarchies.


One interesting thing about the subcube notation is that it is identical to tuple notation, except that as a tuple, you use single-members instead of sets. The intriguing part about that is that a tuple really is the crossjoin of individual members, so the notational similarity really does match an underlying semantic similarity. Form follows function. The Bauhaus architects would be happy. (See, I learned something in Munich.)


- Reed


Monday, November 1, 2004

Chertsey is not London


[This post has nothing to do with Yukon, and should in no way be interpreted as my having being the least bit disappointed by my experience in, uh, London]


If you ever go to a training class at Microsoft in London, check to see whether it is really in Chertsey, which is where their new training facility is. (Apparently, the facility came along with the Great Plains acquisition.) And if you go to Chertsey, be aware that you are going to a charming little town out in Surrey, and don't expect to pop down to the west end for a show in the evening. And if you stay at the Hilton Cobham (which really is a nice place), expect to be out in the middle of the forest at a popular "hideaway" where the only way out is by car or taxi--an easy hour to Chertsey during rush hour.


Thursday, October 28, 2004

Use Hierarchies are Fake


One of the strongest points I tried to make while teaching about the new world of Yukon Analysis Services is that User Hierarchies are completely devoid of meaning--except insofar as they map to attribute hierarchies. One of the students had an interesting scenario, where she wanted to calculate something only at the product level, never at any higher level of a user hierarchy. Adding IIF structures to test for every possible user hierarchy seemed daunting--and unsupportable. So we tried a simple test.


We put a set of User Hieararchy members on Rows in a report, and then created calculated members to show the name of the current member of the attribute hierarchies. So on rows, we had [User Hierarchy].Members, but on columns displayed [Attribute Hierarchy].CurrentMember.UniqueName, etc. In all cases, the current member of the attribute hierarchy was the member mapped to from the user hierarchy.


This was perfect. In the MDX, we could test for [attribute hierarchy].currentmember.level.ordinal <> 0, and it wouldn't matter where we were in whatever user hierarchy.


- Reed


Tuesday, October 26, 2004

DTS Programming


Let's be honest. It's, er, challenging;to code against the DTS object model. But a lot of the reason is because DTS--oops, Integration Services--is not really a set of objects, but rather a development platform with two primary engines, the runtime engine (for control flow) and the pipeline engine (for data flow). All the tasks and transforms are "plug-ins" that somebody wrote to fit into one of the two main engines. Once you get that distinction clear, and pay close attention to when you are writing to the DTS engine, and when you are writing to the plug-in componen; much of the confusion goes away.


I started using a pseudo-hungarian naming convention where the last letter of the prefix was 'o' for the 'outer' or engine object, and 'i' for the 'inner' or plug-in object. Once I did that, I realized that there was a remarkable pattern going. All the communication with the 'outer' object is essentially the same, regardless of what you're creating, and the 'inner' object code is where the uniqueness of the plug-in reveals itself.


Perhaps that's why they decided to give DTS a fancy new name.


- Reed


Monday, October 25, 2004

SQL and MDX


I just finished delivering the first two rounds of the BI Development course--in Munich and London.


One of the most popular sections was a comparison between SQL and MDX queries. In it, I try to clarify a) how the query structures are similar, b) how they way they are interpreted is different, and c) how MDX is very powerful and extensible for reporting--given the modular nature of set functions.


This is not particularly new for Yukon, but in fact the UDM is a major feature of Yukon, and access to the UDM requires MDX queries, so it becomes increasingly relevant in Yukon.


- Reed


Tuesday, October 19, 2004

When is Data Mining not Data Mining


After I taught my nice, dry, boring exposition on Data Mining in Munich, Raman Kohli (the Ascend liason who accompanied me from Redmond to make sure I didn't screw up too badly), came in and suggested the idea that data mining should never be referred to as such. That you should talk only about the solutions--Market Basket Analysis, Trend Analysis, etc--and not even use the words 'data mining'. Now, I managed to ignore most of what Raman said to me, but this one actually clicked home. Each of the Data Mining models is actually built to solve a particular type of problem, or analysis, and 'data mining' just happens to be the engine or technique that populates the model with coefficients from actual data.


So, in London, I changed the title slide of the Data Mining section to 'Market Basket Analysis and other Cool Tools'. Raman thought I was making fun of him, which I wasn't (OK, maybe I was, but just a little).


-- Reed


Sunday, October 17, 2004

Limiting Delimiters


I had already noticed that Yukon MDX did not require braces around a single member set. So the fragment SELECT [Sales Amount] ON COLUMNS now works fine. What I didn't know until a student pointed it out today is that the MDX in a WITH MEMBER clause no longer needs apostrophes around the expression. So you can say With Member [Measures].[Test] as Sum(Product) instead of ... as 'Sum(Product)'. One appreciates those little things that make ones life simpler.


-- Reed


Thursday, October 14, 2004

Conditional Strings and Numbers


One of those minor but terribly annoying facts of MDX in AS2K was that the IIF function was really two functions, one of which could return a number and another that could return a string. Thus, you could never return "OK" if True and 2.134 if False.


 


It is hard to describe how euphoric I got just from discovering that you can now mix and match strings and numbers in an IIF function. Maybe I just need more sleep.


 


-- Reed


Sunday, October 10, 2004

Inverse Sets


In an earlier entry “Getting something out of nothing”, Scott explained his solution for using a separate measure group to copy only the desired values from a cube, leaving the rest null.


I had an idea that might be even simpler in some situations. Rather than create a measure group, create a new dimension--similar to a Scenario dimension--where all the values in the fact table have one value. Then you can simply scope on the second member, assigning the selected values from the first.


- Reed


Thursday, October 7, 2004

Willkommen!


Greetings from Munich. I have been teaching the first full delivery of the BI Development course. A few rough spots, but overall I think it's gone fairly well.

 

I managed to spend my whole first day here touring the city desperately searching for a US-German power socket adapter. I managed to make full use of trains, subways, trams, busses, and foot, all over the city, only to find what I needed back at the main train station. But it was a fun way to explore the city. Even funner was going with the elegant receptionist at the hotel down to the workshop in the basement to hack-saw up the adapter to allow my cell phone charger to fit.

 

-- Reed

Tuesday, October 5, 2004

Getting something out of nothing


I was working on a cube that where I wanted values visible for some cells and all other cells to be null. I wanted to define a subcube that would have visible values and apply any needed MDX. Then I would define the complement of this subcube and set the cell values to null. The problem was that while the subcube where I wanted values was easy to define, the complement of that subcube was very difficult to define


I next tried to create a calculated member that would be null and then use a SCOPE statement and an assignment statement to make the calculated member have values where I wanted to see data. The problem with this is that you can only use assignment statements to assign values on the physical space of the cube. You can not assign a calculated member a new value.


My solution was to create a new measure group and measure. In the measure group SQL statement I had “WHERE 1=2”. This means that the measure group will always be empty. But because the measure is a physical part of the cube, I can use a SCOPE statement and an assignment statement to apply MDX to the cells that I want to have values, setting the measure in the empty measure group equal to the measure in the measure group that contains data. The cells that I want null are already null in this new measure group. The final step is the set Visible=False in my original measure.


Scott


Sunday, October 3, 2004

Hurricanes


I just finished up the Reporting Services pre-conference, and the regular session at PASS before flying to Germany to do the first full-week delivery of the Ascend BI course.

 

Unfortunately, because Jeanne swept through Orlando, the airport was shut down from Saturday noon until Monday noon, so none of us were able to get to the pre-pre-conference session on Yukon. I flew to Pensacola on Sunday, thinking it would be easier to get to Orlando from there, only to find that my flight to Orlando was cancelled at the last minute because Jeanne had closed down Atlanta, preventing the crew from arriving.

 

So to avoid cancelling the Tuesday pre-conference session, I caught the 5:20 early flight out, arriving in Orlando with 20 minutes to spare before my pre-conference session began. The PASS organizers were a little nervous. The session was on Reporting Services--mostly 2000--and went quite well.

 

Hopefully, tomorrow I can make it over to see a bit of the closing day of Oktoberfest.

 

-- Reed

Sunday, July 11, 2004

UDM, MDX, and SQL


One of the major new features in Yukon BI is the Unified Dimensional Model, or UDM, and one of the big benefits of the UDM is the seamless integration of relational and multi-dimensional (MOLAP) data. The idea is that you define the data logically and then decide separately how parts should be physically implement--using purely relational storage, by adding explicit MOLAP aggregations, or by some cool new combinations. It's a great concept.


Most people I've talked to who have heard about the UDM in Yukon have interpreted the idea of “seamless integration“ to mean that they could arbitrarily write either SQL or MDX queries against the UDM. I have to admit that I am one of those who made that assumption. The reality is that to use the UDX you always (for all practical purposes) use MDX, regardless of whether the data is stored relationally or multi-dimensionally. When I first found out that accessing the UDM requires MDX, I was a bit, shall we say, disappointed. I'm a big fan of MDX, and know that there are a lot of upsides to using MDX for reporting, but it has always been plagued by a few serious “downsides“ that made it unusable in some situations, so it was hard to envision building all reporting queries using MDX.


But most of the downsides of MDX have actually been limitations of Analysis Services 2000. For example, it's easy in SQL to create a query to group by, say, Invoice Number, even if there are 10 million invoices. (Not smart, necessarily, but easy.) In AS2K (at least 32 bit), it was simply impossible to create that kind of query. But the limitation was actually how big the dimension could be, not the MDX query per se.


Other limitations that were part of MDX have been removed in Yukon. For example, in SQL, if the filter is WHERE Type In (1,2,3), but there is no type 2, the result set simply does not include item 2. In AS2K MDX, if you request Types 1,2,3 on rows, and there is no type 2, the query fail. This is simply fixed in Yukon.


Some other time, I'll wax eloquent about the upsides of MDX as a reporting language.


Reed