|
|
本帖最后由 少佩恩S 于 2021-4-3 22:44 编辑
经前辈提醒,我发觉我构造的伪数据有问题:主要是数据空洞有问题
举例来说,伪数据中此行为空:
- select * from [dbo].[PseudoDataTable]
- where SOM = '2016-02-01'
- and Gender = 'M' and DistChannel = 'agent' and Product = 'Bronze';
复制代码 但是,2016年1月和2016年3月都有客户,这个是不合理的。
- select * from [dbo].[PseudoDataTable]
- where
- Gender = 'M' and DistChannel = 'agent' and Product = 'Bronze'
- order by SOM;
复制代码
| SOM | Gender | DistChannel | Product | NCustomersSOM | NCustomersEOM | RemainingNCustomerEOM | NewCustomers | 1/1/2016 | M | agent | Bronze | 19064 | 10669 | 9661 | 1008 | 3/1/2016 | M | agent | Bronze | 12605 | 10836 | 10317 | 519 | 4/1/2016 | M | agent | Bronze | 16783 | 17712 | 16053 | 1659 | 5/1/2016 | M | agent | Bronze | 11268 | 10560 | 8771 | 1789 | 6/1/2016 | M | agent | Bronze | 11694 | 13446 | 11463 | 1983 | 7/1/2016 | M | agent | Bronze | 16941 | **06 | 9299 | 807 | 8/1/2016 | M | agent | Bronze | 16021 | 17167 | 13035 | 4132 | 9/1/2016 | M | agent | Bronze | 10342 | 10759 | 9555 | 1204 | 10/1/2016 | M | agent | Bronze | 16747 | 17304 | 16490 | 814 | 11/1/2016 | M | agent | Bronze | 11593 | 13501 | 10468 | 3033 | 12/1/2016 | M | agent | Bronze | 13619 | 19017 | 10984 | 8033 | 1/1/2017 | M | agent | Bronze | 10965 | 13299 | 10340 | 2959 | 2/1/2017 | M | agent | Bronze | 12292 | 16181 | 12131 | 4050 | 现实中,NCustomersEOM = 下一个月的NCustomersSOM
现实中,出现空洞的行的情况普遍是某一业务基本未开展。假设agent这个销售渠道不推广Bronze产品,那么就没有这个维度组合的数据。
假设agent在2021年之前都不销售Bronze产品,在2020年开始才进行销售,产生以下数据
- drop table if exists [dbo].PseudoDataTable;
- create table [dbo].PseudoDataTable (
- SOM date,
- Gender char(1),
- DistChannel varchar(8),
- Product varchar(8),
- NCustomersSOM int,
- NCustomersEOM int,
- RemainingNCustomerEOM int,
- NewCustomers int
- );
复制代码- insert into [dbo].PseudoDataTable values('2020-12-01', 'M', 'agent', 'Bronze', 0, 0, 0, 0);
- insert into [dbo].PseudoDataTable values('2021-01-01', 'M', 'agent', 'Bronze', 0, 100, 0, 100);
- insert into [dbo].PseudoDataTable values('2021-02-01', 'M', 'agent', 'Bronze', 100, 110, 90, 20);
- insert into [dbo].PseudoDataTable values('2021-03-01', 'M', 'agent', 'Bronze', 110, 140, 100, 40);
复制代码 再将之前的维度表构造代码运行一遍
- drop table if exists [dbo].[PseudoDataTableUnikDmsn];
- select distinct
- [Gender]
- ,[DistChannel]
- ,[Product]
- into [dbo].[PseudoDataTableUnikDmsn]
- from [dbo].[PseudoDataTable];
- alter table [dbo].[PseudoDataTableUnikDmsn]
- add ID int identity(1,1);
- drop table if exists [dbo].[PseudoDataTablewithDM];
- select
- B.*,
- Dm.ID
- into [dbo].[PseudoDataTablewithDM]
- from [dbo].[PseudoDataTable] as B
- inner join [dbo].[PseudoDataTableUnikDmsn] as Dm
- on B.[DistChannel] = Dm.[DistChannel]
- and B.[Gender] = Dm.[Gender]
- and B.[Product] = Dm.[Product];
- drop table if exists #Final;
- select
- M.*, D.SOM,
- A1.NCustomersSOM AS NCustomersSOM1,
- A1.RemainingNCustomerEOM AS RemainingNCustomerEOM1,
- A2.NCustomersSOM AS NCustomersSOM2,
- A2.RemainingNCustomerEOM AS RemainingNCustomerEOM2,
- A3.NCustomersSOM AS NCustomersSOM3,
- A3.RemainingNCustomerEOM AS RemainingNCustomerEOM3
- INTO #FINAL
- from [dbo].[PseudoDataTableUnikDmsn] as M
- inner join (
- select
- distinct SOM
- from [dbo].[PseudoDataTablewithDM]
- ) as D
- on 1 = 1
- left join [dbo].[PseudoDataTablewithDM] as A1
- on M.ID = A1.ID and dateadd(month, 0, D.SOM) = A1.SOM
- left join [dbo].[PseudoDataTablewithDM] as A2
- on M.ID = A2.ID and dateadd(month, -1, D.SOM) = A2.SOM
- left join [dbo].[PseudoDataTablewithDM] as A3
- on M.ID = A3.ID and dateadd(month, -2, D.SOM) = A3.SOM;
复制代码 接下来查询三个维度的积,这样就不需要group by 了
- select
- *
- FROM #FINAL
- where
- Gender = 'M' and DistChannel = 'agent' and Product = 'Bronze'
- and SOM >= '2020-12-01'
复制代码| Gender | DistChannel | Product | ID | SOM | NCustomersSOM1 | RemainingNCustomerEOM1 | NCustomersSOM2 | RemainingNCustomerEOM2 | NCustomersSOM3 | RemainingNCustomerEOM3 | | M | agent | Bronze | 16 | 12/1/2020 | 0 | 0 | NULL | NULL | NULL | NULL | | M | agent | Bronze | 16 | 1/1/2021 | 0 | 0 | 0 | 0 | NULL | NULL | | M | agent | Bronze | 16 | 2/1/2021 | 100 | 90 | 0 | 0 | 0 | 0 | | M | agent | Bronze | 16 | 3/1/2021 | 110 | 100 | 100 | 90 | 0 | 0 | | M | agent | Bronze | 16 | 4/1/2021 | 140 | 105 | 110 | 100 | 100 | 90 | 那么,对于购买agent推销的青铜产品的男性顾客数目来说,只有2021年4月才能按照当前方式计算季度滚动留存率- SELECT
- SOM,
- Gender, DistChannel,Product,
- SUM(COALESCE(RemainingNCustomerEOM1, 0.0))/SUM(COALESCE(NCustomersSOM1, 0.0)) *
- SUM(COALESCE(RemainingNCustomerEOM2, 0.0))/SUM(COALESCE(NCustomersSOM2, 0.0)) *
- SUM(COALESCE(RemainingNCustomerEOM3, 0.0))/SUM(COALESCE(NCustomersSOM3, 0.0)) as retainratio,
- SUM(COALESCE(RemainingNCustomerEOM1, 0.0)),SUM(COALESCE(NCustomersSOM1, 0.0)),
- SUM(COALESCE(RemainingNCustomerEOM2, 0.0)),SUM(COALESCE(NCustomersSOM2, 0.0)),
- SUM(COALESCE(RemainingNCustomerEOM3, 0.0)),SUM(COALESCE(NCustomersSOM3, 0.0)),
- count(*)
- FROM #FINAL
- where
- Gender = 'M' and DistChannel = 'agent' and Product = 'Bronze'
- and SOM = '2021-04-01'
- GROUP BY SOM, Gender, PRODUCT, DistChannel;
复制代码| SOM | Gender | DistChannel | Product | retainratio | (No column name) | (No column name) | (No column name) | (No column name) | (No column name) | (No column name) | (No column name) | 4/1/2021 | M | agent | Bronze | 0.613636 | 105 | 140 | 100 | 110 | 90 | 100 | 1 | 其他月份都因为有0作为分母出错。
那么,对于有空洞的情况,比如在2021年3月的季度滚动留存率计算时有一个空洞,对于这种情况,我目前的想处理方式如下
(100/110 * 90/100)^(3/2)
- select power( (100.0/110.0 * 90.0/100.0), 1.5)
复制代码 约=0.74
对于在2021年1月及之前的季度滚动留存率都应该为空。当然这部分的代码都还没有写。之前没考虑到newkid网友提醒的空洞问题。
另外,现阶段的管理层只需要针对某一维度的sum进行滚动留存率计算,比如所有男性客户的留存率计算。或者所有购买了黄金会员产品的客户留存率计算。那么中间如果有某一细分维度组合的空洞,比如说在线销售(online)的白银产品(silver)在某段时间没有女性顾客购买,产生了几行空洞。由于现阶段管理层未关心到这么细的维度,所以只要某单一维度在所有月度都有数据,直接对此单一维度sum就可以了。业务存续时间不短,因此单一维度的年度滚动留存率计算时暂时未碰到空洞的情况。
针对数个月连乘,我下一步打算把数据log化,这样就可以用window sum的形式来取代相乘,就不用大量左关联了。不过这样形式是否适合Tableau来可视化,还要再进行试验。
|
|