背包问题-优美的SQL讨论

newkid · 发表于 2023-3-14 02:56

jlandzpa 发表于 2023-3-13 09:34
我是过来学习的

领导真幽默，你还需要学什么呢? 这里现在就是我跳广场舞的地方。

最闪亮滴星 · 发表于 2023-3-14 12:29

jlandzpa 发表于 2023-3-12 20:01
看到“ > 150000” 的条件，就知道语句不具备普适性。

大佬好，我这纯粹偷工减料的

newkid · 发表于 2023-3-15 00:33

最闪亮滴星发表于 2023-3-14 12:29
大佬好，我这纯粹偷工减料的

想通了没有？这个代码要如何修改？

jihuyao · 发表于 2023-3-16 08:34

It can be two cases, perfect grouping and imperfect grouping.

For example, 70 in total is put into 7 groups (each 10 exactly). For this perfect grouping, one can find 2 groups first, one has 30 and one has 40. For group with 30, find another two, 10 and 20. And for group with 40, find all possible 20s which in turn finds all possible 10s (each step dividing into two groups). Cross join all 10s from 30 and 40 will generate at least one valid combination.

For imperfect grouping, I would use partial random evolution approach first to get a combination with smaller deviations and use it to filter out all worse combinations as early as possible in brutal force process.

One can play with a series of different numbers to sense the advantages of doing so.

Apfel · 发表于 2023-3-17 15:03

本人不才，用pg语法写一个暴力解

WITH RECURSIVE T1 AS (
SELECT datapath
, bytes::INT
, ROW_NUMBER()OVER(ORDER BY datapath) AS order1
, COUNT(*)OVER() AS all_datapath
FROM schema1.testpath2
),
--递归寻找所有组合，order1用于递归的依据，order2用于记录起始位置的order，方便用于下一次循环
--all_datapath记录文件总数
T2 AS (
SELECT ARRAY[datapath] AS datapath
, bytes
, order1
, order1 AS order2
, all_datapath
FROM T1
UNION ALL
SELECT T2.datapath||T1.datapath
, T2.bytes::INT + T1.bytes::INT
, T1.order1
, T2.order2
, T2.all_datapath
FROM T2
JOIN T1
ON ( T1.order1 > T2.order1 )
),
--只取出所有<=16T的组合
T3 AS (
SELECT datapath
, ARRAY[bytes] AS bytes
, all_datapath
, order2
FROM T2
WHERE bytes <= 160000
),
--此处逻辑理论上有更优方法，目前会产生很多无用的递归结果
T4 AS (
SELECT datapath
, datapath::TEXT AS GROUP1
, bytes
, all_datapath
, order2
FROM T3
WHERE order2 = 1
UNION ALL
SELECT T3.datapath||T4.datapath
, T4.GROUP1||T3.datapath::TEXT
, T4.bytes||T3.bytes
, T4.all_datapath
, T3.order2
FROM T4
JOIN T3
ON ( T3.order2 > T4.order2
AND NOT T3.datapath && T4.datapath
)
)
SELECT group1
, bytes
FROM T4
WHERE ARRAY_LENGTH(datapath, 1) = all_datapath
AND ARRAY_LENGTH(bytes,1) =
(
SELECT MIN(ARRAY_LENGTH(bytes,1))
FROM T4
WHERE ARRAY_LENGTH(datapath, 1) = all_datapath
);

复制代码

按照楼主的数据跑的话**解法大概有24种
查询结果说我有非法字符，传不上来

newkid · 发表于 2023-3-18 00:15

T3 可以合并到T2, 使得递归效率更高。
PG的数组操作可以用&&，比我用ORACLE的BITAND方便。

jihuyao · 发表于 2023-3-21 12:33

最闪亮滴星发表于 2023-3-10 16:00
测试表create table testpath2 as (select '/data/'as datas ,'file1' as datapath ,'9999' as bytes from ...

If it can be an imperfect grouping for real request.  My favor is (do not have link at hand) as below.  Determine how many groups needed.  Order all values in sequence.  Pick biggest n (number of groups) values and put into each group first.  Then put each following value into the group with minimum sum value at each step until all values are processed.  This approach is quite efficient for data distribution with a few large bills and many coins (or simlar to filling tanks with a few rocks and much sand).
Quickly write one below to demonstrate it.

SQL> /
Enter value for gno: 5
Enter value for rowcnt: 10
old 1: with t0 as (select &gno gno, &rowcnt rowcnt from dual
new 1: with t0 as (select 5 gno, 10 rowcnt from dual

PATH_BY_ID                                        GROUP_VALUE
-------------------------------------------------- -----------
1,10                                                       11
5,6                                                       11
2,9                                                       11
4,7                                                       11
3,8                                                       11

SQL> /
Enter value for gno: 5
Enter value for rowcnt: 20
old 1: with t0 as (select &gno gno, &rowcnt rowcnt from dual
new 1: with t0 as (select 5 gno, 20 rowcnt from dual

PATH_BY_ID                                        GROUP_VALUE
-------------------------------------------------- -----------
3,8,11,20                                                 42
4,7,15,16                                                 42
2,9,12,19                                                 42
1,10,14,17                                                 42
5,6,13,18                                                 42

SQL>
SQL>
SQL> /
Enter value for gno: 5
Enter value for rowcnt: 40
old 1: with t0 as (select &gno gno, &rowcnt rowcnt from dual
new 1: with t0 as (select 5 gno, 40 rowcnt from dual

PATH_BY_ID                                        GROUP_VALUE
-------------------------------------------------- -----------
4,7,15,16,23,28,31,40                                     164
3,8,11,20,24,27,35,36                                     164
2,9,12,19,22,29,32,39                                     164
5,6,13,18,21,30,34,37                                     164
1,10,14,17,25,26,33,38                                  164

SQL>
SQL>

==========================================================

with t0 as (select &gno gno, &rowcnt rowcnt from dual
), t as (select rownum val from t0 connect by rownum<=rowcnt
), tt as (select val, row_number() over (order by val desc) id from t
), tmp (rn, path, vals, lrn) as (
select 1, to_char(id), val, id from tt, t0 where id <=t0.gno
union all
select rn+1, path||decode(id, null, null, ','||id), vals+nvl(val, 0),
row_number() over (order by  vals+nvl(val, 0) desc)
from tmp a, tt, t0
where rn+decode(lrn, gno, lrn) = tt.id (+)
and rn<=rowcnt-gno
)
select path path_by_id, vals group_value from tmp, t0
where rn=rowcnt-gno+1
/

jihuyao · 发表于 2023-3-21 12:55

I would think it is possible to write a sql for partial random evolution process for imperfect grouping although it may look ugly. Basically pick out a value from the group with maximum sum value and put it into the group with minimum sum value. Loop enough time to output the best grouping result (based on deviation or whatever critieria). One can pick several random initial group setup and parallelly the process for better results. According to statistics (probability), if one runs enough times, the best result can probably be achieved.

For perfect grouping, a single recursive function is enough (hard to do it in sql). using input parameter for array or string with all values first, process to get ONE group and then exclude the values in this group and input the resulting array or string into the recursive function for next group picking until the last group is processed. I do have a demonstration to argue why doing so which needs some clean up before posting it.

newkid · 发表于 2023-3-21 22:13

jihuyao 发表于 2023-3-21 12:33
If it can be an imperfect grouping for real request. My favor is (do not have link at hand) as belo ...

你这方法也太简陋了，只是轮流往上叠加，甚至都没有检查是否超过限额。

jihuyao · 发表于 2023-3-25 10:52

newkid 发表于 2023-3-21 22:13
你这方法也太简陋了，只是轮流往上叠加，甚至都没有检查是否超过限额。

It does not hurt to try this simple way first quickly for imperfect grouping. If it meets the requirement why bother other ways?

I am not sure about the meaning of upper limit. If one group has aleady enough sum values above theoretical average, it will not be possible to be the group with minimum sum value and therefore any following smaller values will be added to other groups.

[SQL] 背包问题-优美的SQL讨论