查看: 2342|回复: 1

[转载] Pivot Query(行列转换)TOM专家高级编程

[复制链接]
论坛徽章:
47
马上加薪
日期:2014-02-19 11:55:142011新春纪念徽章
日期:2011-01-25 15:42:332011新春纪念徽章
日期:2011-01-25 15:42:152011新春纪念徽章
日期:2011-01-25 15:41:502011新春纪念徽章
日期:2011-01-25 15:41:012010新春纪念徽章
日期:2010-03-01 11:20:512010年世界杯参赛球队:日本
日期:2010-02-26 11:04:222010新春纪念徽章
日期:2010-01-04 08:33:08祖国60周年纪念徽章
日期:2009-10-09 08:28:00生肖徽章2007版:牛
日期:2009-09-10 11:14:59
跳转到指定楼层
1#
发表于 2006-7-18 17:57 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
Pivot Query
[php]
A pivot query is when you want to take some data such as:

C1        C2      C3
-----     -----   ------
a1        b1      x1
a1        b1      x2
a1        b1      x3


and you would like to display in the following format:

C1        C2      C3(1)   C3(2)  C3(3)
-----     -----   ------  -----  ----
a1        b1      x1      x2     x3


This turns rows into columns. For example taking the distinct jobs within a department and making them be columns so the output would look like:

DEPTNO     JOB_1     JOB_2     JOB_3
---------- --------- --------- ---------
        10 CLERK     MANAGER   PRESIDENT
        20 ANALYST   ANALYST   CLERK
        30 CLERK     MANAGER   SALESMAN

instead of this:

DEPTNO     JOB
---------- ---------
        10 CLERK
        10 MANAGER
        10 PRESIDENT
        20 ANALYST
        20 CLERK
        20 MANAGER
        30 CLERK
        30 MANAGER
        30 SALESMAN

I'm going to show two examples for pivots. The first will be another implementation of the preceding question. The second shows how to pivot any resultset in a generic fashion and gives you a template for doing so.

In the first case, let's say you wanted to show the top 3 salary earners in each department as columns. The query needs to return exactly 1 row per department and the row would have 4 columns; the DEPTNO, the name of the highest paid employee in the department, the name of the next highest paid, and so on. Using this new functionality - this is almost easy (before these functions - this was virtually impossible):

ops$tkyte@DEV816> select deptno,
  2             max(decode(seq,1,ename,null)) highest_paid,
  3             max(decode(seq,2,ename,null)) second_highest,
  4             max(decode(seq,3,ename,null)) third_highest
  5    from (SELECT deptno, ename,
  6                 row_number() OVER
  7                     (PARTITION BY deptno
  8                      ORDER BY sal desc NULLS LAST) seq
  9          FROM emp)
10    where seq <= 3
11    group by deptno
12  /
   
    DEPTNO HIGHEST_PA SECOND_HIG THIRD_HIGH
---------- ---------- ---------- ----------
        10 KING       CLARK      MILLER
        20 SCOTT      FORD       JONES
        30 BLAKE      ALLEN      TURNER

This simply created an inner resultset that had a sequence assigned to employees by department number in order of salary. The decode in the outer query keeps only rows with sequences 1, 2, or 3 and assigns them to the correct 'column'. The GROUP BY gets rid of the redundant rows and we are left with our collapsed result. It may be easier to understand what I mean by that if you see the resultset without the GROUP BY and MAX:

scott@TKYTE816> select deptno,
  2    max(decode(seq,1,ename,null)) highest_paid,
  3    max(decode(seq,2,ename,null)) second_highest,
  4    max(decode(seq,3,ename,null)) third_highest
  5  from (select deptno, ename,
  6        row_number() over
  7          (partition by deptno
  8           order by sal desc nulls last)
  9        seq from emp)
10  where seq <= 3
11  group by deptno
12  /
   
    DEPTNO HIGHEST_PA SECOND_HIG THIRD_HIGH
---------- ---------- ---------- ----------
        10 KING
        10            CLARK
        10                       MILLER
   
        20 SCOTT
        20            FORD
        20                       JONES
   
        30 ALLEN
        30            BLAKE
        30                       MARTIN
   
9 rows selected.

The MAX aggregate function will be applied by the GROUP BY column DEPTNO. In any given DEPTNO above only one row will have a non-null value for HIGHTEST_PAID, the remaining rows in that group will always be NULL. The MAX function will pick out the non-null row and keep that for us. Hence, the group by and MAX will 'collapse' our resultset, removing the NULL values from it and giving us what we want.

If you have a table T with columns C1, C2 and you would like to get a result like:

C1     C2(1)   C2(2)  ….  C2(N)

where column C1 is to stay cross record (going down the page) and column C2 will be pivoted to be in record (going across the page), the values of C2 are to become columns instead of rows - you will generate a query of the form:

Select c1
  max(decode(rn,1,c2,null)) c2_1,
  max(decode(rn,2,c2,null)) c2_2,
  …
  max(decode(rn,N,c2,null)) c2_N
from (select c1, c2
      row_number() over (partition by C1
                         order by <something>
      rn from T
      <some predicate>
group by C1

In the above example, C1 was simply DEPTNO and C2 was ENAME. Since we ordered by SAL DESC, the first three columns we retrieved were the top three paid employees in that department (bearing in mind that if four people made the top three, we would of course lose one).

The second example is a more generic 'I want to pivot my resultset'. Here, instead of having a single column C1 to anchor on and a single column C2 to pivot - we'll look at the more general case where C1 is a set of columns as is C2. As it turns out, this is very similar to the above. Suppose you want to report by JOB and DEPTNO the employees in that job and their salary. The report needs to have the employees going across the page as columns, however, not down the page, and the same with their salaries. Additionally, the employees need to appear from left to right in order of their salary. The steps would be:

scott@TKYTE816> select max(count(*)) from emp group by deptno, job;
   
MAX(COUNT(*))
-------------
            4

This tells us the number of columns; now we can generate the query:

scott@TKYTE816> select deptno, job,
  2    max(decode(rn, 1, ename, null)) ename_1,
  3    max(decode(rn, 1, sal, null)) sal_1,
  4    max(decode(rn, 2, ename, null)) ename_2,
  5    max(decode(rn, 2, sal, null)) sal_2,
  6    max(decode(rn, 3, ename, null)) ename_3,
  7    max(decode(rn, 3, sal, null)) sal_3,
  8    max(decode(rn, 4, ename, null)) ename_4,
  9    max(decode(rn, 4, sal, null)) sal_4
10  from (select deptno, job, ename, sal,
11        row_number() over (partition by deptno, job
12                           order by sal, ename)
13        rn from emp)
14  group by deptno, job
15  /
   
DEPTNO JOB       ENAME_1 SAL_1 ENAME_2   SAL_2 ENAME_3    SAL_3 ENAME_ SAL_4
------ --------- ------  ----- --------- ----- ---------- ----- ------ -----
    10 CLERK     MILLER   1300
    10 MANAGER   CLARK    2450
    10 PRESIDENT KING     5000
   
    20 ANALYST   FORD     3000 SCOTT      3000
    20 CLERK     SMITH     800 ADAMS      1100
    20 MANAGER   JONES    2975
   
    30 CLERK     JAMES      99
    30 MANAGER   BLAKE      99
    30 SALESMAN  ALLEN      99 MARTIN       99 TURNER        99 WARD      99
   
9 rows selected.

We inserted values of 99 into the salary column of all the employees in department 30, earlier in the chapter. To pivot a resultset, we can generalize further. If you have a set of columns C1, C2, C3, ... CN and you want to keep columns C1 ... Cx cross record and Cx+1 ... CN in record, the syntax of the query would be:

Select C1, C2, ... CX,
  max(decode(rn,1,C{X+1},null)) cx+1_1,...max(decode(rn,1,CN,null)) CN_1
  max(decode(rn,2,C{X+1},null)) cx+1_2,...max(decode(rn,1,CN,null)) CN_2
  ...
  max(decode(rn,N,c{X+1},null)) cx+1_N,...max(decode(rn,1,CN,null)) CN_N
from (select C1, C2, ... CN
      row_number() over (partition by C1, C2, ... CX
                         order by <something>
      rn from T
      <some predicate>
group by C1, C2, ... CX

In the example above, we used C1 as DEPTNO, C2 as JOB, C3 as ENAME, and C4 as SAL.

One other thing that we must know is the maximum number of rows per partition that we anticipate. This will dictate the number of columns we will be generating. SQL needs to know the number of columns, there is no way around that fact, and without this knowledge we will not be able to pivot. This leads us into the next, more generic, example of pivoting. If we do not know the number of total columns until runtime, we'll have to use dynamic SQL to deal with the fact that the SELECT list is variable. We can use PL/SQL to demonstrate how to do this; this will result in a generic routine that can be reused whenever you need a pivot. This routine will have the following specification:

scott@TKYTE816> create or replace package my_pkg
  2  as
  3    type refcursor is ref cursor;
  4    type array is table of varchar2(30);
  5      procedure pivot(p_max_cols       in number   default NULL,
  6                      p_max_cols_query in varchar2 default NULL,
  7                      p_query          in varchar2,
  8                      p_anchor         in array,
  9                      p_pivot          in array,
10                      p_cursor in out refcursor);
12  end;
   
Package created.

Here, you must input values for either P_MAX_COLS or P_MAX_COLS_QUERY. SQL needs to know the number of columns in a query and this parameter will allow us to build a query with the proper number of columns. The value you should send in here will be the output of a query similar to:

scott@TKYTE816> select max(count(*)) from emp group by deptno, job;

This is the count of the discrete values that are currently in rows, which we will put into columns. You can either use a query to obtain this number, or insert the number yourself if you already know it.

The P_QUERY parameter is simply the query that gathers your data together. Using the example shown earlier the query would be:

10  from (select deptno, job, ename, sal,
11        row_number() over (partition by deptno, job
12                           order by sal, ename)
13        rn from emp)

The next two inputs are arrays of column names. The P_ANCHOR tells us what columns will stay cross record (down the page) and P_PIVOT states the columns that will go in record (across the page). In our example from above, P_ANCHOR = ('DEPTNO', 'JOB') and P_PIVOT = ('ENAME','SAL'). Skipping over the implementation for a moment, the entire call put together might look like this:

scott@TKYTE816> variable x refcursor
   
scott@TKYTE816> set autoprint on
   
scott@TKYTE816> begin
  2  my_pkg.pivot
3  (p_max_cols_query => 'select max(count(*)) from emp
4                        group by deptno,job',
5   p_query => 'select deptno, job, ename, sal,
6   row_number() over (partition by deptno, job
7                      order by sal, ename)
8   rn from emp a',
9
10     p_anchor => my_pkg.array('DEPTNO','JOB'),
11     p_pivot  => my_pkg.array('ENAME', 'SAL'),
12     p_cursor => :x);
13  end;
   
PL/SQL procedure successfully completed.
   
DEPTNO JOB       ENAME_ SAL_1 ENAME_2    SAL_2 ENAME_3    SAL_3 ENAME_ SAL_4
------ --------- ------ ----- ---------- ----- ---------- ----- ------ -----
    10 CLERK     MILLER  1300
    10 MANAGER   CLARK   2450
    10 PRESIDENT KING    5000
    20 ANALYST   FORD    3000 SCOTT       3000
    20 CLERK     SMITH    800 ADAMS       1100
    20 MANAGER   JONES   2975
    30 CLERK     JAMES     99
    30 MANAGER   BLAKE     99
    30 SALESMAN  ALLEN     99 MARTIN        99 TURNER        99 WARD      99
   
9 rows selected.

As you can see, that dynamically rewrote our query using the generalized template we developed. The implementation of the package body is straightforward:

scott@TKYTE816> create or replace package body my_pkg
  2  as
  3
  4  procedure pivot(p_max_cols       in number   default null,
  5                  p_max_cols_query in varchar2 default null,
  6                  p_query          in varchar2,
  7                  p_anchor         in array,
  8                  p_pivot          in array,
  9                  p_cursor in out refcursor)
10  as
11      l_max_cols number;
12      l_query    long;
13      l_cnames   array;
14  begin
15    -- figure out the number of columns we must support
16    -- we either KNOW this or we have a query that can tell us
17    if (p_max_cols is not null)
18    then
19          l_max_cols := p_max_cols;
20    elsif (p_max_cols_query is not null)
21    then
22          execute immediate p_max_cols_query into l_max_cols;
23    else
24          raise_application_error(-20001, 'Cannot figure out max cols');
25    end if;
26
27
28    -- Now, construct the query that can answer the question for us...
29    -- start with the C1, C2, ... CX columns:
30
31    l_query := 'select ';
32    for i in 1 .. p_anchor.count
33
34    loop
35      l_query := l_query || p_anchor(i) || ',';
36    end loop;
37
38    -- Now add in the C{x+1}... CN columns to be pivoted:
39    -- the format is "max(decode(rn,1,C{X+1},null)) cx+1_1"
40
41    for i in 1 .. l_max_cols
42    loop
43      for j in 1 .. p_pivot.count
44        loop
45          l_query := l_query ||
46                     'max(decode(rn,'||i||','||
47                     p_pivot(j)||',null)) ' ||
48                     p_pivot(j) || '_' || i || ',';
49        end loop;
50    end loop;
51
52    -- Now just add in the original query
53
54    l_query := rtrim(l_query,',') || ' from (' || p_query || ') group by ';
55
56    -- and then the group by columns...
57
58    for i in 1 .. p_anchor.count
59    loop
60        l_query := l_query || p_anchor(i) || ',';
61    end loop;
62    l_query := rtrim(l_query,',');
63
64    -- and return it
65
66    execute immediate 'alter session set cursor_sharing=force';
67    open p_cursor for l_query;
68    execute immediate 'alter session set cursor_sharing=exact';
69    end;
70
71  end;
72  /
   
Package body created.

It only does a little string manipulation to rewrite the query and open a REF CURSOR dynamically. In the likely event the query had a predicate with constants and such in it, we set cursor sharing on and then back off for the parse of this query to facilitate bind variables (see the section on tuning for more information on that). Now we have a fully parsed query that is ready to be fetched from.

---------
[/php]
论坛徽章:
5
授权会员
日期:2005-10-30 17:05:33会员2006贡献徽章
日期:2006-04-17 13:46:34会员2007贡献徽章
日期:2007-09-26 18:42:10BLOG每日发帖之星
日期:2009-11-18 01:01:072010新春纪念徽章
日期:2010-03-01 11:19:07
2#
发表于 2006-7-19 08:58 | 只看该作者
最近在研究分析函数,帮你顶一下

使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表