|
Pivot Query
[php]
A pivot query is when you want to take some data such as:
C1 C2 C3
----- ----- ------
a1 b1 x1
a1 b1 x2
a1 b1 x3
…
and you would like to display in the following format:
C1 C2 C3(1) C3(2) C3(3)
----- ----- ------ ----- ----
a1 b1 x1 x2 x3
…
This turns rows into columns. For example taking the distinct jobs within a department and making them be columns so the output would look like:
DEPTNO JOB_1 JOB_2 JOB_3
---------- --------- --------- ---------
10 CLERK MANAGER PRESIDENT
20 ANALYST ANALYST CLERK
30 CLERK MANAGER SALESMAN
instead of this:
DEPTNO JOB
---------- ---------
10 CLERK
10 MANAGER
10 PRESIDENT
20 ANALYST
20 CLERK
20 MANAGER
30 CLERK
30 MANAGER
30 SALESMAN
I'm going to show two examples for pivots. The first will be another implementation of the preceding question. The second shows how to pivot any resultset in a generic fashion and gives you a template for doing so.
In the first case, let's say you wanted to show the top 3 salary earners in each department as columns. The query needs to return exactly 1 row per department and the row would have 4 columns; the DEPTNO, the name of the highest paid employee in the department, the name of the next highest paid, and so on. Using this new functionality - this is almost easy (before these functions - this was virtually impossible):
ops$tkyte@DEV816> select deptno,
2 max(decode(seq,1,ename,null)) highest_paid,
3 max(decode(seq,2,ename,null)) second_highest,
4 max(decode(seq,3,ename,null)) third_highest
5 from (SELECT deptno, ename,
6 row_number() OVER
7 (PARTITION BY deptno
8 ORDER BY sal desc NULLS LAST) seq
9 FROM emp)
10 where seq <= 3
11 group by deptno
12 /
DEPTNO HIGHEST_PA SECOND_HIG THIRD_HIGH
---------- ---------- ---------- ----------
10 KING CLARK MILLER
20 SCOTT FORD JONES
30 BLAKE ALLEN TURNER
This simply created an inner resultset that had a sequence assigned to employees by department number in order of salary. The decode in the outer query keeps only rows with sequences 1, 2, or 3 and assigns them to the correct 'column'. The GROUP BY gets rid of the redundant rows and we are left with our collapsed result. It may be easier to understand what I mean by that if you see the resultset without the GROUP BY and MAX:
scott@TKYTE816> select deptno,
2 max(decode(seq,1,ename,null)) highest_paid,
3 max(decode(seq,2,ename,null)) second_highest,
4 max(decode(seq,3,ename,null)) third_highest
5 from (select deptno, ename,
6 row_number() over
7 (partition by deptno
8 order by sal desc nulls last)
9 seq from emp)
10 where seq <= 3
11 group by deptno
12 /
DEPTNO HIGHEST_PA SECOND_HIG THIRD_HIGH
---------- ---------- ---------- ----------
10 KING
10 CLARK
10 MILLER
20 SCOTT
20 FORD
20 JONES
30 ALLEN
30 BLAKE
30 MARTIN
9 rows selected.
The MAX aggregate function will be applied by the GROUP BY column DEPTNO. In any given DEPTNO above only one row will have a non-null value for HIGHTEST_PAID, the remaining rows in that group will always be NULL. The MAX function will pick out the non-null row and keep that for us. Hence, the group by and MAX will 'collapse' our resultset, removing the NULL values from it and giving us what we want.
If you have a table T with columns C1, C2 and you would like to get a result like:
C1 C2(1) C2(2) …. C2(N)
where column C1 is to stay cross record (going down the page) and column C2 will be pivoted to be in record (going across the page), the values of C2 are to become columns instead of rows - you will generate a query of the form:
Select c1
max(decode(rn,1,c2,null)) c2_1,
max(decode(rn,2,c2,null)) c2_2,
…
max(decode(rn,N,c2,null)) c2_N
from (select c1, c2
row_number() over (partition by C1
order by <something>
rn from T
<some predicate>
group by C1
In the above example, C1 was simply DEPTNO and C2 was ENAME. Since we ordered by SAL DESC, the first three columns we retrieved were the top three paid employees in that department (bearing in mind that if four people made the top three, we would of course lose one).
The second example is a more generic 'I want to pivot my resultset'. Here, instead of having a single column C1 to anchor on and a single column C2 to pivot - we'll look at the more general case where C1 is a set of columns as is C2. As it turns out, this is very similar to the above. Suppose you want to report by JOB and DEPTNO the employees in that job and their salary. The report needs to have the employees going across the page as columns, however, not down the page, and the same with their salaries. Additionally, the employees need to appear from left to right in order of their salary. The steps would be:
scott@TKYTE816> select max(count(*)) from emp group by deptno, job;
MAX(COUNT(*))
-------------
4
This tells us the number of columns; now we can generate the query:
scott@TKYTE816> select deptno, job,
2 max(decode(rn, 1, ename, null)) ename_1,
3 max(decode(rn, 1, sal, null)) sal_1,
4 max(decode(rn, 2, ename, null)) ename_2,
5 max(decode(rn, 2, sal, null)) sal_2,
6 max(decode(rn, 3, ename, null)) ename_3,
7 max(decode(rn, 3, sal, null)) sal_3,
8 max(decode(rn, 4, ename, null)) ename_4,
9 max(decode(rn, 4, sal, null)) sal_4
10 from (select deptno, job, ename, sal,
11 row_number() over (partition by deptno, job
12 order by sal, ename)
13 rn from emp)
14 group by deptno, job
15 /
DEPTNO JOB ENAME_1 SAL_1 ENAME_2 SAL_2 ENAME_3 SAL_3 ENAME_ SAL_4
------ --------- ------ ----- --------- ----- ---------- ----- ------ -----
10 CLERK MILLER 1300
10 MANAGER CLARK 2450
10 PRESIDENT KING 5000
20 ANALYST FORD 3000 SCOTT 3000
20 CLERK SMITH 800 ADAMS 1100
20 MANAGER JONES 2975
30 CLERK JAMES 99
30 MANAGER BLAKE 99
30 SALESMAN ALLEN 99 MARTIN 99 TURNER 99 WARD 99
9 rows selected.
We inserted values of 99 into the salary column of all the employees in department 30, earlier in the chapter. To pivot a resultset, we can generalize further. If you have a set of columns C1, C2, C3, ... CN and you want to keep columns C1 ... Cx cross record and Cx+1 ... CN in record, the syntax of the query would be:
Select C1, C2, ... CX,
max(decode(rn,1,C{X+1},null)) cx+1_1,...max(decode(rn,1,CN,null)) CN_1
max(decode(rn,2,C{X+1},null)) cx+1_2,...max(decode(rn,1,CN,null)) CN_2
...
max(decode(rn,N,c{X+1},null)) cx+1_N,...max(decode(rn,1,CN,null)) CN_N
from (select C1, C2, ... CN
row_number() over (partition by C1, C2, ... CX
order by <something>
rn from T
<some predicate>
group by C1, C2, ... CX
In the example above, we used C1 as DEPTNO, C2 as JOB, C3 as ENAME, and C4 as SAL.
One other thing that we must know is the maximum number of rows per partition that we anticipate. This will dictate the number of columns we will be generating. SQL needs to know the number of columns, there is no way around that fact, and without this knowledge we will not be able to pivot. This leads us into the next, more generic, example of pivoting. If we do not know the number of total columns until runtime, we'll have to use dynamic SQL to deal with the fact that the SELECT list is variable. We can use PL/SQL to demonstrate how to do this; this will result in a generic routine that can be reused whenever you need a pivot. This routine will have the following specification:
scott@TKYTE816> create or replace package my_pkg
2 as
3 type refcursor is ref cursor;
4 type array is table of varchar2(30);
5 procedure pivot(p_max_cols in number default NULL,
6 p_max_cols_query in varchar2 default NULL,
7 p_query in varchar2,
8 p_anchor in array,
9 p_pivot in array,
10 p_cursor in out refcursor);
12 end;
Package created.
Here, you must input values for either P_MAX_COLS or P_MAX_COLS_QUERY. SQL needs to know the number of columns in a query and this parameter will allow us to build a query with the proper number of columns. The value you should send in here will be the output of a query similar to:
scott@TKYTE816> select max(count(*)) from emp group by deptno, job;
This is the count of the discrete values that are currently in rows, which we will put into columns. You can either use a query to obtain this number, or insert the number yourself if you already know it.
The P_QUERY parameter is simply the query that gathers your data together. Using the example shown earlier the query would be:
10 from (select deptno, job, ename, sal,
11 row_number() over (partition by deptno, job
12 order by sal, ename)
13 rn from emp)
The next two inputs are arrays of column names. The P_ANCHOR tells us what columns will stay cross record (down the page) and P_PIVOT states the columns that will go in record (across the page). In our example from above, P_ANCHOR = ('DEPTNO', 'JOB') and P_PIVOT = ('ENAME','SAL'). Skipping over the implementation for a moment, the entire call put together might look like this:
scott@TKYTE816> variable x refcursor
scott@TKYTE816> set autoprint on
scott@TKYTE816> begin
2 my_pkg.pivot
3 (p_max_cols_query => 'select max(count(*)) from emp
4 group by deptno,job',
5 p_query => 'select deptno, job, ename, sal,
6 row_number() over (partition by deptno, job
7 order by sal, ename)
8 rn from emp a',
9
10 p_anchor => my_pkg.array('DEPTNO','JOB'),
11 p_pivot => my_pkg.array('ENAME', 'SAL'),
12 p_cursor => :x);
13 end;
PL/SQL procedure successfully completed.
DEPTNO JOB ENAME_ SAL_1 ENAME_2 SAL_2 ENAME_3 SAL_3 ENAME_ SAL_4
------ --------- ------ ----- ---------- ----- ---------- ----- ------ -----
10 CLERK MILLER 1300
10 MANAGER CLARK 2450
10 PRESIDENT KING 5000
20 ANALYST FORD 3000 SCOTT 3000
20 CLERK SMITH 800 ADAMS 1100
20 MANAGER JONES 2975
30 CLERK JAMES 99
30 MANAGER BLAKE 99
30 SALESMAN ALLEN 99 MARTIN 99 TURNER 99 WARD 99
9 rows selected.
As you can see, that dynamically rewrote our query using the generalized template we developed. The implementation of the package body is straightforward:
scott@TKYTE816> create or replace package body my_pkg
2 as
3
4 procedure pivot(p_max_cols in number default null,
5 p_max_cols_query in varchar2 default null,
6 p_query in varchar2,
7 p_anchor in array,
8 p_pivot in array,
9 p_cursor in out refcursor)
10 as
11 l_max_cols number;
12 l_query long;
13 l_cnames array;
14 begin
15 -- figure out the number of columns we must support
16 -- we either KNOW this or we have a query that can tell us
17 if (p_max_cols is not null)
18 then
19 l_max_cols := p_max_cols;
20 elsif (p_max_cols_query is not null)
21 then
22 execute immediate p_max_cols_query into l_max_cols;
23 else
24 raise_application_error(-20001, 'Cannot figure out max cols');
25 end if;
26
27
28 -- Now, construct the query that can answer the question for us...
29 -- start with the C1, C2, ... CX columns:
30
31 l_query := 'select ';
32 for i in 1 .. p_anchor.count
33
34 loop
35 l_query := l_query || p_anchor(i) || ',';
36 end loop;
37
38 -- Now add in the C{x+1}... CN columns to be pivoted:
39 -- the format is "max(decode(rn,1,C{X+1},null)) cx+1_1"
40
41 for i in 1 .. l_max_cols
42 loop
43 for j in 1 .. p_pivot.count
44 loop
45 l_query := l_query ||
46 'max(decode(rn,'||i||','||
47 p_pivot(j)||',null)) ' ||
48 p_pivot(j) || '_' || i || ',';
49 end loop;
50 end loop;
51
52 -- Now just add in the original query
53
54 l_query := rtrim(l_query,',') || ' from (' || p_query || ') group by ';
55
56 -- and then the group by columns...
57
58 for i in 1 .. p_anchor.count
59 loop
60 l_query := l_query || p_anchor(i) || ',';
61 end loop;
62 l_query := rtrim(l_query,',');
63
64 -- and return it
65
66 execute immediate 'alter session set cursor_sharing=force';
67 open p_cursor for l_query;
68 execute immediate 'alter session set cursor_sharing=exact';
69 end;
70
71 end;
72 /
Package body created.
It only does a little string manipulation to rewrite the query and open a REF CURSOR dynamically. In the likely event the query had a predicate with constants and such in it, we set cursor sharing on and then back off for the parse of this query to facilitate bind variables (see the section on tuning for more information on that). Now we have a fully parsed query that is ready to be fetched from.
---------
[/php] |
|