SQL CUBE Usage - Examples

In the previous article, we discussed about the usage of ROLLUP statement in SQL with examples. We saw how ROLLUP modifies the output of a statement by running it against a test table. In this article, we will dwell on the usage of CUBE with examples.If you are jumping right into this article from the internet, I'd recommend you to read this article first, as it contains all the test tables that are required to run the examples presented in this post. 
Ok, we are ready to move on. We now have a test table SAM_STR, which is populated with numbers from 1000 to 9999 using DBMS_RANDOM.NUMBER method. Note that the results presented in this post and the previous one might differ, as I'm recreating the table for explanation purposes.

We then add two rows into this table, MAIN_CAT, [ categorizes the table based on a specified range of values ] and EVEN_ODD, which categorizes the rows based on 'E' for Even and 'O' for ODD. So, the table data looks like


SELECT MAIN_CAT,COUNT(*) FROM SAM_STR GROUP BY MAIN_CAT
A    299
B    261
C    256
D    184

At its usage, CUBE resembles the way ROLLUP is used, but the similarity ends here. In terms of the output, CUBE entirely brings a different set of results, if the number of columns used in GROUP BY clause is more than one. So, the statement

SELECT MAIN_CAT,COUNT(*) FROM SAM_STR GROUP BY CUBE(MAIN_CAT), yields the following output.

    1000 -> Total Row
A    299 -> Standard Output, Grouped by MAIN_CAT value of 'A'
B    261 ->Standard Output, Grouped by MAIN_CAT value of 'B'
C    256 ->Standard Output, Grouped by MAIN_CAT value of 'C'
D    184 -> Standard Output, Grouped by MAIN_CAT value of 'D'

Replacing CUBE with ROLLUP in the above SQL results in the same result set for me, but the order of the results are different.
Now to the twisting part. Consider the following SQL statement

SELECT MAIN_CAT, EVEN_ODD,COUNT(*) FROM SAM_STR GROUP BY CUBE(MAIN_CAT,EVEN_ODD)

The result would be divided into multiple sets as in the case of ROLLUP but we have more dimensions in the case of CUBE. The result sets are shown in the diagram below.

I have made my best attempts to group the rows which are shown in the result. We have a total of 15 rows in the result, which are grouped into four sets.

Green -> Which is equivalent to GROUP BY MAIN_CAT, EVEN_ODD. We have a total of 8 rows that matches this criteria. The equivalent SQL statement to get the 8 rows alone is our standard GROUP BY statement

SELECT MAIN_CAT, EVEN_ODD,COUNT(*) FROM SAM_STR GROUP BY MAIN_CAT,EVEN_ODD

Blue -> Which is the total number of rows at MAIN_CAT level.  Forgetting the EVEN_ODD column for a minute, the equivalent SQL statement for this would be [ 4 rows ]

SELECT MAIN_CAT, COUNT(*) FROM SAM_STR GROUP BY MAIN_CAT

Yellow -> Which is the total number of rows at EVEN_ODD level. Forgetting the MAIN_CAT column for a minute, the equivalent SQL statement for this would be [ 2 rows ]
SELECT EVEN_ODD, COUNT(*) FROM SAM_STR GROUP BY EVEN_ODD

We have accounted for 14 out of 15 rows in our result. The last row is the purple one.

Purple -> Which is simply the count(*) for the result. (note, we don't have any 'where' clause in our query and hence the result is simply the count(*) on our table. Or equivalently, [ 1 row ]

SELECT COUNT(*) FROM SAM_STR GROUP BY ' '

Now to the highlighting point. When we give CUBE, the table is scanned only once to get these 15 rows. The plan is provided below

--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |       |       |       |
|   1 |  SORT GROUP BY       |             |       |       |       |
|   2 |   GENERATE CUBE      |             |       |       |       |
|   3 |    SORT GROUP BY     |             |       |       |       |
|   4 |     TABLE ACCESS FULL| SAM_STR     |       |       |       |
--------------------------------------------------------------------


The same 15 row result can be achieved by the following SQL statement as well:-

SELECT MAIN_CAT, EVEN_ODD,COUNT(*) FROM SAM_STR GROUP BY MAIN_CAT,EVEN_ODD
UNION ALL
SELECT MAIN_CAT,'', COUNT(*) FROM SAM_STR GROUP BY MAIN_CAT
UNION ALL
SELECT ' ',EVEN_ODD, COUNT(*) FROM SAM_STR GROUP BY EVEN_ODD
UNION ALL
SELECT '','',COUNT(*) FROM SAM_STR GROUP BY ' '

But, this would scan the table four times.
--------------------------------------------------------------------
| Id  | Operation            |  Name       | Rows  | Bytes | Cost  |
--------------------------------------------------------------------
|   0 | SELECT STATEMENT     |             |       |       |       |
|   1 |  UNION-ALL           |             |       |       |       |
|   2 |   SORT GROUP BY      |             |       |       |       |
|   3 |    TABLE ACCESS FULL | SAM_STR     |       |       |       |
|   4 |   SORT GROUP BY      |             |       |       |       |
|   5 |    TABLE ACCESS FULL | SAM_STR     |       |       |       |
|   6 |   SORT GROUP BY      |             |       |       |       |
|   7 |    TABLE ACCESS FULL | SAM_STR     |       |       |       |
|   8 |   SORT GROUP BY      |             |       |       |       |
|   9 |    TABLE ACCESS FULL | SAM_STR     |       |       |       |

So, CUBE does this all for us in a single statement producing multiple super set of results. The general formula is:- A cube operation involving 'n' columns would result in 2^n result sets. Here, we had 2 columns and hence we got 4 result sets. (as provided by different colors)
Interchanging the columns in the query yields no difference for us in the result set. Thus, 

SELECT EVEN_ODD,MAIN_CAT, COUNT(*) FROM SAM_STR GROUP BY CUBE(EVEN_ODD,MAIN_CAT)

and

SELECT MAIN_CAT, EVEN_ODD,COUNT(*) FROM SAM_STR GROUP BY CUBE(MAIN_CAT,EVEN_ODD)
yield the same set of rows in the result. However, this is not the case with ROLLUP.

In the next article, we will discuss ROLLUP and CUBE in depth.



  


No comments:

Post a Comment