SQL解惑(第2版) 的一些样题

newkid · 发表于 2010-3-13 00:46

PUZZLE 3 THE ANESTHESIA PUZZLE
Leonard C. Medal came up with this nifty little problem many years ago.
Anesthesiologists administer anesthesia during surgeries in hospital
operating rooms. Information about each anesthesia procedure is
recorded in a table.
Procs
proc_id anest_name start_time end_time
======================================
10 'Baker' 08:00 11:00
20 'Baker' 09:00 13:00
30 'Dow' 09:00 15:30
40 'Dow' 08:00 13:30
50 'Dow' 10:00 11:30
60 'Dow' 12:30 13:30
70 'Dow' 13:30 14:30
80 'Dow' 18:00 19:00
Note that some of the times for a given anesthesiologist overlap. This
is not a mistake. Anesthesiologists, unlike surgeons, can move from one
operating room to another while surgeries are underway, checking on
each patient in turn, adjusting dosages, and leaving junior doctors and
anesthesia nurses to monitor the patients on a minute-to-minute basis.
Pay for the anesthesiologist is per procedure, but there’s a catch.
There is a sliding scale for remuneration for each procedure based on
the maximum count of simultaneous procedures that an
anesthesiologist has underway. The higher this count, the lower the
amount paid for the procedure.
For example, for procedure #10, at each instant during that
procedure Dr. Baker counts up how many total procedures in which he
was concurrently involved. This maximum count for procedure #10 is 2.
Based on the “concurrency” rules, Dr. Baker gets paid 75% of the fee for
procedure #10.
The problem then is to determine for each procedure over its
duration, the maximum, instantaneous count of procedures carried out
by the anesthesiologist.
We can derive the answer graphically at first to get a better
understanding of the problem.

Example 1 shows two overlapping procedures. The upper, Ganttlike
graph displays the elapsed time of the procedure we are evaluating
(the subject procedure) and all the doctor’s other procedures that
overlap in time.
The lower graph (Instantaneous Count of In-Progress Procedures)
shows how many procedures are underway at each moment. It helps to
think of a slide rule hairline moving from left to right over the Gantt
chart while each procedure start or end is plotted stepwise on the
lower chart.
We can see in Example 1 by inspection that the maximum is 2.
Example 1—Dr. Baker, Proc #10
Example 2 shows a more complex set of overlapping procedures, but
the principle is the same. The maximum, which happens twice, is 3.
Example 2—Dr. Dow, Proc #30
Note that the correct answer is not the number of overlapping
procedures but the maximum instantaneous count. The puzzle is how to
do this for each procedure using SQL. Here is the desired result for the
sample data:

proc_id max_inst_count
============================
10 2
20 2
30 3
40 3
50 3
60 3
70 2
80 1
Answer #1
The first step is to convert each procedure into two Events—a start and
end event—and put them in a view. The UNION operator appends the set
of end Events to the set of start Events. A (+1) indicates a start event and
a (-1) indicates an end event.
The WHERE clauses assure that the procedures compared overlap and
are for the same anesthesiologist. The NOT condition eliminates
procedures that do not overlap the subject procedure.
CREATE VIEW Events (proc_id, comparison_proc, anest_name,
event_time, event_type)
AS SELECT P1.proc_id,
P2.proc_id,
P1.anest_name,
P2.start_time,
+1
FROM Procs AS P1, Procs AS P2
WHERE P1.anest_name = P2.anest_name
AND NOT (P2.end_time <= P1.start_time
OR P2.start_time >= P1.end_time)
UNION
SELECT P1.proc_id,
P2.proc_id,
P1.anest_name,
P2.end_time,
-1 AS event_type
FROM Procs AS P1, Procs AS P2
WHERE P1.anest_name= P2.anest_name
AND NOT (P2.end_time <= P1.start_time
OR P2.start_time >= P1.end_time);

The result is this view shown here for procedure #10 only and sorted
by event_time for clarity. Notice that the same anesthesiologist can start
more than one procedure at the same time.
Events
proc_id comparison_proc anest_name event_time event_type
==========================================================
10 10 Baker 08:00 +1
10 20 Baker 09:00 +1
10 10 Baker 11:00 -1
10 20 Baker 13:00 -1
Now, for each set of Events with the same proc_id id, we can
compute for each event the sum of the event_types for Events that
occur earlier. This series of backward-looking sums gives us the values
represented by each step in the step charts.
SELECT E1.proc_id, E1.event_time,
(SELECT SUM(E2.event_type)
FROM Events AS E2
WHERE E2.proc_id = E1.proc_id
AND E2.event_time < E1.event_time)
AS instantaneous_count
FROM Events AS E1
ORDER BY E1.proc_id, E1.event_time;
The result of this query is shown here for procedure #10 only.
proc_id instantaneous_count
===============================
10 NULL
10 1
10 2
10 1
You could put this result set into a view called ConcurrentProcs,
then query the view to get the maximum instantaneous count for each
subject procedure using this statement:

SELECT proc_id,
MAX(instantaneous_count) AS max_inst
FROM ConcurrentProcs
GROUP BY proc_id;
But you could also extract the desired result directly from the Events
view. You could do this by merging the two select statements:
SELECT E1.proc_id,
MAX((SELECT SUM(E2.event_type)
FROM Events AS E2
WHERE E2.proc_id = E1.proc_id
AND E2.event_time < E1.event_time)) AS
max_inst_count
FROM Events AS E1
GROUP BY E1.proc_id;
However, it is illegal to put a subquery expression in an aggregate
function in SQL-92, so you are depending on a vendor extension.

Answer #2
Richard Romley’s single-query answer depends on the SQL-92 table
query expressions in the FROM clause, so that what had been a VIEW
can be folded into the query.
SELECT P3.proc_id, MAX(ConcurrentProcs.tally)
FROM (SELECT P1.anest_name, P1.start_time, COUNT(*)
FROM Procs AS P1
INNER JOIN
Procs AS P2
ON P1.anest_name= P2.anest
AND P2.start_time <= P1.start_time
AND P2.stop_time > P1.start_time
GROUP BY P1.anest_name, P1.start_time)
AS ConcurrentProcs(anest_name, start_time, tally)
INNER JOIN
Procs AS P3
ON ConcurrentProcs.anest_name= P3.anest
AND P3.start_time <= ConcurrentProcs.start_time
AND P3.stop_time > ConcurrentProcs.start_time
GROUP BY P3.proc_id;

Answer #3
This answer came from Lex van de Pol (aavdpol@hotmail.com) on June
9, 2000. The idea is to loop through all procedures (P1); for each
procedure P1 you look at procedures P2 where their start time lies in the
interval of procedure P2. For each start time you found of P2, count the
number of procedures (P3) that are ongoing on that time. Then, take the
maximum count for a certain procedure P1.
For doing this, Lex first created this view:
CREATE VIEW Vprocs (id1, id2, total)
AS SELECT P1.prc_id, P2.prc_id, COUNT(*)
FROM Procs AS P1, Procs AS P2, Procs AS P3
WHERE P2.ant_name = P1.ant_name
AND P3.ant_name = P1.ant_name
AND P1.prc_start <= P2.prc_start
AND P2.prc_start < P1.prc_end
AND P3.prc_start <= P2.prc_start
AND P2.prc_start < P3.prc_end
GROUP BY P1.prc_id, P2.prc_id;
He then took the maximum for each procedure P1:
SELECT id1 AS proc_id, MAX(total) AS max_inst_count
FROM Vprocs
GROUP BY id1;
The funny thing is, you do not need to look at the end times for
procedures P2.

Answer #4
Bert C. Hughes (bhughes@twincities.net) came up with a solution in
Microsoft ACCESS, a proprietary near-SQL language. Here is his code
translated into a single SQL statement:
SELECT P1.proc_id, P1.anest_name, MAX(E1.ecount) AS maxops
FROM Procs AS P1,
-- E1 is # of processes active at time for each
anesthetist
(SELECT P2.anest_name, P2.start_time, COUNT(*)
FROM Procs AS P1, Procs AS P2
WHERE P1.anest_name = P2.anest_name
AND P1.start_time <= P2.start_time
AND P1.end_time > P2.start_time
GROUP BY P2.anest_name, P2.start_time)
AS E1(anest_name, etime, ecount)
WHERE E1.anest_name= P1.anest_name
AND E1.etime >= P1.start_time
AND E1.etime < P1.end_time
GROUP BY P1.proc_id, P1.anest;

Answer #5
Another approach is to set up a Clock table, since you probably round
the billing to within a minute. That means we would have a table with
(24 hours * 60 minutes) = 1,440 rows per day, or 525,600 rows; a year’s
worth of scheduling. But you can also set up a VIEW for the current day:
SELECT X.anest_name, MAX(X.proc_tally)
FROM (SELECT P1.anest_name, COUNT(DISTINCT proc_id)
FROM Procs AS P1, Clock AS C
WHERE C1.clock_time BETWEEN P1.start_time
AND P1.end_time
GROUP BY P1.anest_name)
AS X(anest_name, proc_tally)
GROUP BY X.anest_name);
This is just another version of the Calendar auxiliary table. This kind
of table depends on a known granularity—Calendars work to the day,
and Clocks to the minute, usually. You also can create a VIEW that uses
a table of one day’s clock ticks stored in minutes and the system constant
CURRENT_DATE.
CREATE VIEW TodayClock (clock_time)
AS
SELECT CURRENT_DATE + ticks
FROM DayTicks;

题目大意：有一张手术时间表，每个手术指派一名麻醉师。同一名麻醉师服务的手术时间可以重叠。
对于每个手术，求出在时间轴上同时出现的重叠手术个数的最大值。这将被用于计算麻醉师的报酬。
比如手术10的最高重叠数是2（10和20）, 手术30的最高重叠数是3　（30/40/50, 或者30/40/60）,见附件插图。

CREATE TABLE Procs (proc_id NUMBER, anest_name VARCHAR2(20), start_time VARCHAR2(10), end_time VARCHAR2(10));

INSERT INTO PROCS VALUES (10,'Baker' ,'08:00','11:00');
INSERT INTO PROCS VALUES (20,'Baker' ,'09:00','13:00');
INSERT INTO PROCS VALUES (30,'Dow' ,'09:00','15:30');
INSERT INTO PROCS VALUES (40,'Dow' ,'08:00','13:30');
INSERT INTO PROCS VALUES (50,'Dow' ,'10:00','11:30');
INSERT INTO PROCS VALUES (60,'Dow' ,'12:30','13:30');
INSERT INTO PROCS VALUES (70,'Dow' ,'13:30','14:30');
INSERT INTO PROCS VALUES (80,'Dow' ,'18:00','19:00');

书中答案#1的ORACLE版本：
WITH Events AS (
SELECT P1.proc_id,
   P2.proc_id AS comparison_proc,
   P1.anest_name,
   P2.start_time event_time,
   +1 event_type
FROM Procs  P1, Procs  P2
WHERE P1.anest_name = P2.anest_name
   AND NOT (P2.end_time <= P1.start_time
   OR P2.start_time >= P1.end_time)
UNION ALL
SELECT P1.proc_id,
   P2.proc_id,
   P1.anest_name,
   P2.end_time,
   -1 AS event_type
FROM Procs  P1, Procs  P2
WHERE P1.anest_name = P2.anest_name
   AND NOT (P2.end_time <= P1.start_time
   OR P2.start_time >= P1.end_time)
)
,ConcurrentProcs AS (
SELECT E1.proc_id, E1.event_time,
   (SELECT SUM(E2.event_type)
   FROM Events  E2
   WHERE E2.proc_id = E1.proc_id
   AND E2.event_time < E1.event_time)
   AS instantaneous_count
FROM Events  E1
   )
SELECT proc_id,
MAX(instantaneous_count) AS max_inst
FROM ConcurrentProcs
GROUP BY proc_id
ORDER BY 1;

经我改良过的答案#1:
SELECT proc_id,MAX(cnt)
  FROM (SELECT proc_id,SUM(event_type) OVER (PARTITION BY proc_id ORDER BY event_time) AS cnt
      FROM (SELECT a.*
                  ,(CASE WHEN rn=1 THEN b.start_time ELSE b.end_time END) AS event_time
                  ,(CASE WHEN rn=1 THEN 1 ELSE -1 END) AS event_type
               FROM Procs a,Procs b,(SELECT ROWNUM rn FROM DUAL CONNECT BY ROWNUM<=2)
               WHERE a.anest_name = b.anest_name
                     AND a.end_time > b.start_time
                     AND a.start_time < b.end_time
            )
      WHERE event_time<end_time
   )
GROUP BY proc_id;

newkid · 发表于 2010-3-13 02:20

接上贴：
书中答案#2的ORACLE版本：这是一个相当优秀的答案
SELECT P3.proc_id, MAX(ConcurrentProcs.tally)
FROM (SELECT P1.anest_name, P1.start_time, COUNT(*) tally
   FROM Procs  P1
   INNER JOIN
   Procs  P2
   ON P1.anest_name= P2.anest_name
   AND P2.start_time <= P1.start_time
   AND P2.end_time > P1.start_time
   GROUP BY P1.anest_name, P1.start_time
   ) ConcurrentProcs
   INNER JOIN
   Procs P3
   ON ConcurrentProcs.anest_name= P3.anest_name
   AND P3.start_time <= ConcurrentProcs.start_time
   AND P3.end_time > ConcurrentProcs.start_time
GROUP BY P3.proc_id;

经我用分析函数改良过的答案#2: (认不出来了吧？)
SELECT proc_id,MAX(tally)
  FROM (SELECT p2.proc_id, COUNT(*) OVER(PARTITION BY p1.proc_id) tally
      FROM Procs  P1
            INNER JOIN
            Procs  P2
            ON P1.anest_name= P2.anest_name
               AND P2.start_time <= P1.start_time
               AND P2.end_time > P1.start_time
      )
GROUP BY proc_id;

#3仔细一看，和#2其实是很类似的，只不过把所有表连接放到同一层：
SELECT id1 AS proc_id, MAX(total) AS max_inst_count
  FROM (SELECT P1.proc_id id1, P2.proc_id id2, COUNT(*) total
      FROM Procs P1, Procs P2, Procs P3
      WHERE P2.anest_name = P1.anest_name
            AND P3.anest_name = P1.anest_name
            AND P1.start_time <= P2.start_time
            AND P2.start_time <  P1.end_time
            AND P3.start_time <= P2.start_time
            AND P2.start_time <  P3.end_time
      GROUP BY P1.proc_id, P2.proc_id
      ) Vprocs
GROUP BY id1;

#4其实和#2也是一样的：
SELECT P1.proc_id, P1.anest_name, MAX(E1.ecount) AS maxops
FROM Procs  P1,
   (SELECT P2.anest_name, P2.start_time etime, COUNT(*) ecount
   FROM Procs P1, Procs P2
   WHERE P1.anest_name = P2.anest_name
         AND P1.start_time <= P2.start_time
         AND P1.end_time > P2.start_time
   GROUP BY P2.anest_name, P2.start_time
   ) E1
WHERE E1.anest_name= P1.anest_name
   AND E1.etime >= P1.start_time
   AND E1.etime < P1.end_time
GROUP BY P1.proc_id, P1.anest_name;

#5有点意思，它的思路是把一天的时间打成碎片然后和手术表连接。可惜的是效率不怎么样。

作者给出的答案，按ORACLE写法实现如下，但其实是错的：
SELECT X.anest_name, MAX(X.proc_tally)
FROM (SELECT P1.anest_name, COUNT(DISTINCT proc_id) proc_tally
   FROM Procs P1,
         (SELECT TO_CHAR(TIMESTAMP '2008-1-1 00:00:00' + numtodsinterval(ROWNUM, 'minute' ),'HH24:MI') clock_time
            FROM DUAL
         CONNECT BY ROWNUM<=24*60-1) C1
   WHERE C1.clock_time BETWEEN P1.start_time AND P1.end_time
   GROUP BY P1.anest_name) X
GROUP BY X.anest_name;

按此思路经过我改正的答案：
SELECT X.proc_id, MAX(X.proc_tally)
FROM (SELECT P1.proc_id, COUNT(DISTINCT proc_id) OVER(PARTITION BY anest_name,clock_time) proc_tally
   FROM Procs P1,
         (SELECT TO_CHAR(TIMESTAMP '2008-1-1 00:00:00' + numtodsinterval(ROWNUM, 'minute' ),'HH24:MI') clock_time
            FROM DUAL
         CONNECT BY ROWNUM<=24*60-1) C1
   WHERE C1.clock_time >= P1.start_time AND C1.clock_time<P1.end_time
   ) X
GROUP BY X.proc_id;

[ 本帖最后由 newkid 于 2010-3-13 02:58 编辑 ]

newkid · 发表于 2010-3-13 06:36

根据英文版补充8楼的"谜题12　索赔状态"

书中答案#2的ORACLE版本：

SELECT E1.claim_id, C1.patient_name
   , (SELECT claim_seq
      FROM ClaimStatusCodes S1
      WHERE S1.claim_status = E1.claim_status
      ) claim_seq
  FROM LegalEvents  E1, Claims  C1
WHERE E1.claim_id = C1.claim_id
GROUP BY E1.claim_id, C1.patient_name, E1.claim_status
HAVING COUNT(*) = (SELECT COUNT(DISTINCT defendant_name)
                  FROM LegalEvents  E2
               WHERE E1.claim_id = E2.claim_id)
ORDER BY E1.claim_id;

作者的答案其实未完工，必须再加上一层：
SELECT claim_id, patient_name, MIN(claim_seq)
  FROM (SELECT E1.claim_id, C1.patient_name
            , (SELECT claim_seq
               FROM ClaimStatusCodes S1
               WHERE S1.claim_status = E1.claim_status
               ) claim_seq
         FROM LegalEvents  E1, Claims  C1
      WHERE E1.claim_id = C1.claim_id
      GROUP BY E1.claim_id, C1.patient_name, E1.claim_status
      HAVING COUNT(*) = (SELECT COUNT(DISTINCT defendant_name)
                           FROM LegalEvents  E2
                        WHERE E1.claim_id = E2.claim_id)
   )
GROUP BY claim_id, patient_name;

用此思路改良的版本，其实已经面目全非了：
SELECT cl.claim_id,cl.patient_name,MIN(claim_status) KEEP(DENSE_RANK FIRST ORDER BY claim_seq DESC)
  FROM (SELECT l.claim_id
            ,l.claim_status
            ,cd.claim_seq
            ,COUNT(*) OVER(PARTITION BY l.claim_id,l.claim_status) cnt_status
            ,COUNT(DISTINCT defendant_name) OVER(PARTITION BY claim_id) cnt_defendant
      FROM LegalEvents l, ClaimStatusCodes cd
      WHERE l.claim_status= cd.claim_status
      ) l
   ,claims cl
WHERE cnt_status = cnt_defendant
   AND l.claim_id = cl.claim_id
GROUP BY cl.claim_id,cl.patient_name;

#3在ORACLE运行不了:
SELECT claim_id, MAX(patient_name), MAX(T1.claim_status)
FROM ClaimStatusCodes  T1, Claims  C1
WHERE T1.claim_seq =
   (SELECT MAX(T2.claim_seq)
      FROM ClaimStatusCodes  T2, LegalEvents  E1
      WHERE T2.claim_status = E1.claim_status
            AND E1.claim_id = C1.claim_id
   GROUP BY E1.claim_id, defendant_name)
GROUP BY claim_id, T1.claim_seq
HAVING T1.claim_seq = MIN(T1.claim_seq)

Answer #4
This answer came from Francisco Moreno, and it is designed to avoid
subqueries by using the JOIN syntax of SQL-92.
Step 1
Insert a dummy row in the claim_status table:

INSERT INTO ClaimStatusCodes (claim_status, claim_status_desc,
claim_seq)
VALUES ('XX', 'Dummy', 5);

Step 2
The query:

SELECT C1.claim_id, C1.patient_name,
   CASE MIN(S1.claim_seq)  ------- 往前推一级即claim_seq-1
         WHEN 2 THEN 'AP'
         WHEN 3 THEN 'OR'
         WHEN 4 THEN 'SF'
         ELSE 'CL'
   END
FROM ( (Claims  C1
   INNER JOIN
   Defendants  D1
   ON C1.claim_id = D1.claim_id
   )
   CROSS JOIN
   ClaimStatusCodes S1
   )
   LEFT OUTER JOIN
   LegalEvents  E1
   ON C1.claim_id = E1.claim_id
   AND D1.defendant_name= E1.defendant_name
   AND S1.claim_status = E1.claim_status
WHERE E1.claim_id IS NULL -------- 数据中“没有”的那些状态
GROUP BY C1.claim_id, C1.patient_name;

这个#4思路十分怪异，一般人会从“已有”数据求答案，它是先凑出“没有”的那些状态，然后再倒过来推算“已有”的。
按此怪异思路，用10G的分区外连接改良如下：
SELECT C1.claim_id, C1.patient_name,
   CASE E2.claim_seq  ------- 往前推一级, 即claim_seq-1
         WHEN 2 THEN 'AP'
         WHEN 3 THEN 'OR'
         WHEN 4 THEN 'SF'
         ELSE 'CL'
   END
  FROM (SELECT E1.claim_id,MIN(S1.claim_seq) AS claim_seq
      FROM LegalEvents  E1 PARTITION BY (claim_id, defendant_name)
            RIGHT JOIN ClaimStatusCodes S1
            ON E1.claim_status = S1.claim_status
      WHERE E1.claim_status IS NULL
      GROUP BY E1.claim_id
      ) E2 JOIN Claims C1
   ON E2.claim_id = C1.claim_id;

sxlcom · 发表于 2010-3-13 19:20

佩服,值得学习

wabjtam123 · 发表于 2010-3-13 20:26

绝对的偶像！

QIvan_lee · 发表于 2010-3-13 21:57

感谢分享，顶～

newkid · 发表于 2010-3-14 03:53

谜题17 职业介绍

题目大意：
猎头手里有一些工作（JOB ORDERS）和一些候选人，现在要选出哪些人具备工作所要求的技能。

候选人及其拥有的技能:
CREATE TABLE CandidateSkills
(candidate_id INTEGER NOT NULL,
skill_code CHAR(15) NOT NULL,
PRIMARY KEY (candidate_id, skill_code));

INSERT INTO candidateskills VALUES(100, 'accounting') ;
INSERT INTO candidateskills VALUES(100, 'inventory')    ;
INSERT INTO candidateskills VALUES(100, 'manufacturing') ;
INSERT INTO candidateskills VALUES(200, 'accounting') ;
INSERT INTO candidateskills VALUES(200, 'inventory')    ;
INSERT INTO candidateskills VALUES(300, 'manufacturing') ;
INSERT INTO candidateskills VALUES(400, 'inventory')    ;
INSERT INTO candidateskills VALUES(400, 'manufacturing') ;
INSERT INTO candidateskills VALUES(500, 'accounting') ;
INSERT INTO candidateskills VALUES(500, 'manufacturing') ;

工作及其要求的技能：

CREATE TABLE JobOrders
(job_id INTEGER NOT NULL,
skill_group INTEGER NOT NULL,
skill_code CHAR(15) NOT NULL,
PRIMARY KEY (job_id, skill_group, skill_code));

同一个skill_group中的要求是“与”的关系，即必须全部拥有；不同skill_group之间是或的关系。

例子：
Job 1 = ('inventory' AND 'manufacturing') OR 'accounting'
Job 2 = ('inventory' AND 'manufacturing') OR ('accounting' AND 'manufacturing')
Job 3 = 'manufacturing'
Job 4 = ('inventory' AND 'manufacturing' AND 'accounting')

在表中的数据体现为：
This translates into:

INSERT INTO joborders VALUES(1, 1, 'inventory')    ;
INSERT INTO joborders VALUES(1, 1, 'manufacturing') ;
INSERT INTO joborders VALUES(1, 2, 'accounting') ;
INSERT INTO joborders VALUES(2, 1, 'inventory')    ;
INSERT INTO joborders VALUES(2, 1, 'manufacturing') ;
INSERT INTO joborders VALUES(2, 2, 'accounting') ;
INSERT INTO joborders VALUES(2, 2, 'manufacturing') ;
INSERT INTO joborders VALUES(3, 1, 'manufacturing') ;
INSERT INTO joborders VALUES(4, 1, 'inventory')    ;
INSERT INTO joborders VALUES(4, 1, 'manufacturing') ;
INSERT INTO joborders VALUES(4, 1, 'accounting') ;

answer#1 无实质内容。

ANSWER#2:
SELECT DISTINCT J1.job_id, C1.candidate_id
FROM  JobOrders  J1 INNER JOIN CandidateSkills  C1
   ON J1.skill_code = C1.skill_code
GROUP BY candidate_id, skill_group, job_id
HAVING COUNT(*) >= (SELECT COUNT(*)
                  FROM JobOrders  J2
                  WHERE J1.skill_group = J2.skill_group
                        AND J1.job_id = J2.job_id
                  );

里面用到了相关子查询。改用ORACLE分析函数：
SELECT DISTINCT J1.job_id, C1.candidate_id
FROM  (SELECT j.*
         ,COUNT(*) OVER(PARTITION BY job_id,skill_group) cnt
      FROM JobOrders j
      )  J1 INNER JOIN CandidateSkills  C1
   ON J1.skill_code = C1.skill_code
GROUP BY candidate_id, skill_group, job_id
HAVING COUNT(*) = MAX(cnt);
也可以这么写：
SELECT DISTINCT J1.job_id, C1.candidate_id
FROM  (SELECT j.*
         ,COUNT(*) OVER(PARTITION BY job_id,skill_group) cnt
      FROM JobOrders j
      )  J1 INNER JOIN CandidateSkills  C1
   ON J1.skill_code = C1.skill_code
GROUP BY candidate_id, skill_group, job_id,cnt
HAVING COUNT(*) = cnt;

Answer #3
Another answer came from Richard Romley at Smith Barney. He then
came up with an answer that does not involve a correlated subquery in
SQL-92, thus:
SELECT J1.job_id, C1.candidate_id
FROM (SELECT job_id, skill_grp, COUNT(*)
   FROM JobSkillRequirements
   GROUP BY job_id, skill_grp)
   AS J1(job_id, skill_grp, grp_cnt)
   CROSS JOIN
   (SELECT R1.job_id, R1.skill_grp, S1.candidate_id,
         COUNT(*)
   FROM JobSkillRequirements AS R1, CandidateSkills AS S1
   WHERE R1.skillid = S1.skillid
   GROUP BY R1.job_id, R1.skill_grp, S1.candidate_id)
   AS C1(job_id, skill_grp, candidate_id, candidate_cnt)
WHERE J1.job_id = C1.job_id
   AND J1.skill_grp = C1.skill_grp
   AND J1.grp_cnt = C1.candidate_cnt
GROUP BY J1.job_id, C1.candidate_id;

翻译成ORACLE语法：

SELECT J1.job_id, C1.candidate_id
FROM (SELECT job_id, skill_group, COUNT(*) grp_cnt
   FROM JobOrders
   GROUP BY job_id, skill_group) J1
   CROSS JOIN
   (SELECT R1.job_id, R1.skill_group, S1.candidate_id,
         COUNT(*) candidate_cnt
   FROM JobOrders R1, CandidateSkills S1
   WHERE R1.skill_code = S1.skill_code
   GROUP BY R1.job_id, R1.skill_group, S1.candidate_id) C1
WHERE J1.job_id = C1.job_id
   AND J1.skill_group = C1.skill_group
   AND J1.grp_cnt = C1.candidate_cnt
GROUP BY J1.job_id, C1.candidate_id;
其实和#2并无本质区别，还是用的COUNT相等这个判断方法。CROSS JOIN有些莫名其妙，明明下面有连接条件。

jvkojvko · 发表于 2010-3-14 18:31

顶一个

〇〇 · 发表于 2010-3-14 19:40

newkid · 发表于 2010-3-15 10:33

40 PERMUTATIONS 排列组合

题目大意：求出 N 个数的全排列

CREATE TABLE Elements
(i INTEGER NOT NULL PRIMARY KEY);
INSERT INTO Elements VALUES (1);
INSERT INTO Elements VALUES (2);
INSERT INTO Elements VALUES (3);
INSERT INTO Elements VALUES (4);
INSERT INTO Elements VALUES (5);
INSERT INTO Elements VALUES (6);
INSERT INTO Elements VALUES (7);

Answer #1
The obvious and horrible answer is:
SELECT E1.i, E2.i, E3.i, E4.i, E5.i, E6.i, E7.i
FROM Elements AS E1, Elements AS E2, Elements AS E3,
Elements AS E4, Elements AS E5, Elements AS E6,
Elements AS E7
WHERE E1.i NOT IN (E2.i, E3.i, E4.i, E5.i, E6.i, E7.i)
AND E2.i NOT IN (E1.i, E3.i, E4.i, E5.i, E6.i, E7.i)
AND E3.i NOT IN (E1.i, E2.i, E4.i, E5.i, E6.i, E7.i)
AND E4.i NOT IN (E1.i, E2.i, E3.i, E5.i, E6.i, E7.i)
AND E5.i NOT IN (E1.i, E2.i, E3.i, E4.i, E6.i, E7.i)
AND E6.i NOT IN (E1.i, E2.i, E3.i, E4.i, E5.i, E7.i)
AND E7.i NOT IN (E1.i, E2.i, E3.i, E4.i, E5.i, E6.i);

Answer #2
An improvement on this query can be made by adding one more
predicate to the WHERE clause:
SELECT E1.i, E2.i, E3.i, E4.i, E5.i, E6.i, E7.i
FROM Elements AS E1, Elements AS E2, Elements AS E3,
Elements AS E4, Elements AS E5, Elements AS E6,
Elements AS E7
WHERE (E1.i + E2.i + E3.i + E4.i + E5.i + E6.i + E7.i)
= 28
AND E1.i NOT IN (E2.i, E3.i, E4.i, E5.i, E6.i, E7.i)
AND E2.i NOT IN (E1.i, E3.i, E4.i, E5.i, E6.i, E7.i)
AND E3.i NOT IN (E1.i, E2.i, E4.i, E5.i, E6.i, E7.i)
AND E4.i NOT IN (E1.i, E2.i, E3.i, E5.i, E6.i, E7.i)
AND E5.i NOT IN (E1.i, E2.i, E3.i, E4.i, E6.i, E7.i)
AND E6.i NOT IN (E1.i, E2.i, E3.i, E4.i, E5.i, E7.i)
AND E7.i NOT IN (E1.i, E2.i, E3.i, E4.i, E5.i, E6.i);
作者说加上第一个谓词判断可以快速过滤掉一些组合，但我在ORACLE中比较结果并无改善。

Answer #3
But let's carry the totals trick one step further. First, redefine the
Elements table to have a weight for each element in the set:
CREATE TABLE Elements
(i INTEGER NOT NULL
,wgt INTEGER NOT NULL);

INSERT INTO Elements VALUES (1, 1) ;
INSERT INTO elements VALUES (2, 2) ;
INSERT INTO elements VALUES (3, 4) ;
INSERT INTO elements VALUES (4, 8) ;
INSERT INTO elements VALUES (5, 16);
INSERT INTO elements VALUES (6, 32);
INSERT INTO elements VALUES (7, 64);

SELECT E1.i, E2.i, E3.i, E4.i, E5.i, E6.i, E7.i
FROM Elements AS E1, Elements AS E2, Elements AS E3,
Elements AS E4, Elements AS E5, Elements AS E6,
Elements AS E7
WHERE (E1.wgt + E2.wgt + E3.wgt + E4.wgt
+ E5.wgt + E6.wgt + E7.wgt) = 127;

加上了以2的N次幂作为权重，为了快速判断各元素仅出现一次。
听起来不错，不知为何在ORACLE中行不通，SQL计划的成本和逻辑读比上个答案增加了十几倍！

利用这个加权的思路，我重新写了一种解法，把7表连接改为两两连接，每次都尽量过滤掉一些数据：
WITH r2 AS (
SELECT e1.i i1,e2.i i2,e1.wgt + e2.wgt as wgt
  FROM Elements e1, Elements e2
WHERE e1.i<>e2.i
)
,r3 AS (
SELECT r2.i1,r2.i2,e.i i3,r2.wgt+e.wgt as wgt
  FROM r2,Elements e
WHERE e.i NOT IN (r2.i1,r2.i2)
)
,r4 AS (
SELECT r21.i1,r21.i2,r22.i1 i3,r22.i2 i4,r21.wgt+r22.wgt as wgt
  FROM r2 r21,r2 r22
WHERE BITAND(r21.wgt,r22.wgt)=0
)
SELECT r3.i1,r3.i2,r3.i3,r4.i1 i4,r4.i2 i5,r4.i3 i6,r4.i4 i7
  FROM r3,r4
WHERE BITAND(r3.wgt,r4.wgt)=0
;


Answer #4
为了在ORACLE中实现这个答案，我们必须先写一个自定义函数，它的功能是在一个字符串中的指定位置塞入另一个串：
CREATE OR REPLACE FUNCTION STUFF(
   p_str IN VARCHAR2
   ,p_pos IN NUMBER
   ,p_len IN NUMBER
   ,p_new IN VARCHAR2
   ) RETURN VARCHAR2
AS
BEGIN
RETURN SUBSTR(p_str,1,p_pos-1)||p_new||SUBSTR(p_str,p_pos+p_len);
END STUFF;
/
在本例中，p_new总是为NULL, 实际上是在指定位置删除一个字符。

答案#4:
SELECT a || c
FROM (SELECT a || SUBSTR(c , i , 1) a,
         STUFF(c, i, 1, '') c
      FROM Elements,
         (SELECT a || SUBSTR(c , i , 1) a,
                  STUFF(c, i, 1, '') c
            FROM Elements,
                  (SELECT a || SUBSTR(c , i , 1) a,
                           STUFF(c, i, 1, '') c
                     FROM Elements,
                        (SELECT a || SUBSTR(c , i , 1) a,
                                 STUFF(c, i, 1, '') c
                           FROM Elements,
                                 (SELECT a || SUBSTR(c, i, 1) a,
                                          STUFF(c,i, 1, '') c
                                    FROM Elements,
                                          (SELECT SUBSTR('1234567', i, 1) a,
                                                STUFF('1234567',i, 1, '') c
                                             FROM Elements
                                          WHERE i <= 7) T1
                                    WHERE i <= 6) T2
                              WHERE i <= 5) T3
                     WHERE i <= 4) T4
            WHERE i <= 3) T5
      WHERE i <= 2) T6;
它也是把7表连接改为两两连接，但是很巧妙，每次连接中总有一个集合很小。

Answer #5
其实和上一种一样，但是我觉得可读性更差。
SELECT SUBSTR('1234567', a, 1) ||
   SUBSTR(STUFF('1234567', a, 1, ''), b, 1) ||
   SUBSTR(STUFF(STUFF('1234567', a, 1, ''), b, 1,''), c,1) ||
   SUBSTR(STUFF(STUFF(STUFF('1234567',a, 1, ''), b, 1, ''), c, 1, ''), d, 1) ||
   SUBSTR(STUFF(STUFF(STUFF(STUFF('1234567',a, 1, ''), b, 1, ''), c, 1, ''), d, 1, ''), e, 1) ||
   SUBSTR(STUFF(STUFF(STUFF(STUFF(STUFF('1234567',a, 1, ''), b, 1, ''), c, 1, ''), d, 1, ''), e, 1,''),f, 1) ||
   STUFF(STUFF(STUFF(STUFF(STUFF(STUFF('1234567',a, 1, ''), b, 1, ''), c, 1, ''), d, 1, ''), e, 1,''),f, 1, '')
  FROM (SELECT i a
      FROM Elements
      WHERE i <= 7) T1,
   (SELECT i b
      FROM Elements
      WHERE i <= 6) T2,
   (SELECT i c
      FROM Elements
      WHERE i <= 5) T3,
   (SELECT i d
      FROM Elements
      WHERE i <= 4) T4,
   (SELECT i e
      FROM Elements
      WHERE i <= 3) T5,
   (SELECT i f
      FROM Elements
      WHERE i <= 2) T6;

最后是我自己的方法，得力于10G的NOCYCLE：
SELECT SYS_CONNECT_BY_PATH(i,',')
  FROM Elements
WHERE LEVEL=7
CONNECT BY NOCYCLE i <> PRIOR i;

[精华] SQL解惑(第2版) 的一些样题

浏览过的版块