楼主: 〇〇

一个俄国的列式数据库clickhouse

[复制链接]
论坛徽章:
407
紫蛋头
日期:2012-05-21 10:19:41迷宫蛋
日期:2012-06-06 16:02:49奥运会纪念徽章:足球
日期:2012-06-29 15:30:06奥运会纪念徽章:排球
日期:2012-07-10 21:24:24鲜花蛋
日期:2012-07-16 15:24:59奥运会纪念徽章:拳击
日期:2012-08-07 10:54:50奥运会纪念徽章:羽毛球
日期:2012-08-21 15:55:33奥运会纪念徽章:蹦床
日期:2012-08-21 21:09:51奥运会纪念徽章:篮球
日期:2012-08-24 10:29:11奥运会纪念徽章:体操
日期:2012-09-07 16:40:00
11#
 楼主| 发表于 2019-12-24 10:11 | 只看该作者
本帖最后由 〇〇 于 2019-12-24 10:29 编辑

文档中一些sql报错了
localhost --Q5. 查询2007年各航空公司延误超过10分钟以上的百分比
:-] SELECT Carrier, c, c2, c*100/c2 as c3
:-] FROM
:-] (
:-]     SELECT
:-]         Carrier,
:-]         count(*) AS c
:-]     FROM ontime
:-]     WHERE DepDelay>10
:-]         AND Year=2007
:-]     GROUP BY Carrier
:-] )
:-] ANY INNER JOIN
:-] (
:-]     SELECT
:-]         Carrier,
:-]         count(*) AS c2
:-]     FROM ontime
:-]     WHERE Year=2007
:-]     GROUP BY Carrier
:-] ) USING Carrier
:-] ORDER BY c3 DESC;



Received exception from server (version 19.17.5):
Code: 48. DB::Exception: Received from localhost:9000. DB::Exception: Old ANY INNER|RIGHT|FULL JOINs are disabled by default. Their logic would be changed. Old logic is many-to-one for all kinds of ANY JOINs. It's equil to apply distinct for right table keys. Default bahaviour is reserved for many-to-one LEFT JOIN, one-to-many RIGHT JOIN and one-to-one INNER JOIN. It would be equal to apply distinct for keys to right, left and both tables respectively. Set any_join_distinct_right_table_keys=1 to enable old bahaviour..

0 rows in set. Elapsed: 0.005 sec.

按照提示修改设置后可以查询
SET any_join_distinct_right_table_keys = 1

Ok.

0 rows in set. Elapsed: 0.002 sec.

┌─Carrier─┬──────c─┬──────c2─┬─────────────────c3─┐
│ EV      │ 101796 │  286234 │ 35.563909249075934 │
│ US      │ 135987 │  485447 │ 28.012738774778708 │
│ AA      │ 176203 │  633857 │ 27.798541311368336 │
│ MQ      │ 145630 │  540494 │ 26.943869867195566 │
│ AS      │  42830 │  160185 │  26.73783437899928 │
│ B6      │  50740 │  191450 │ 26.503003395142336 │
│ UA      │ 128174 │  490002 │  26.15785241692891 │
│ WN      │ 296293 │ 1168871 │ 25.348648396615197 │
│ OH      │  59034 │  236032 │ 25.011015455531453 │
│ CO      │  76662 │  323151 │  23.72327487768876 │
│ F9      │  23035 │   97760 │ 23.562806873977088 │
│ YV      │  67905 │  294362 │ 23.068534661403305 │
│ XE      │  99915 │  434773 │  22.98095787916913 │
│ FL      │  59460 │  263159 │  22.59470510223857 │
│ NW      │  90429 │  414526 │  21.81503693375084 │
│ OO      │ 127426 │  597880 │  21.31297250284338 │
│ DL      │  93675 │  475889 │  19.68421207466447 │
│ 9E      │  46948 │  258851 │  18.13707499681284 │
│ AQ      │   4299 │   46360 │  9.273080241587575 │
│ HA      │   2746 │   56175 │  4.888295505117935 │
└─────────┴────────┴─────────┴────────────────────┘

20 rows in set. Elapsed: 0.088 sec. Processed 16.87 million rows, 41.77 MB (192.17 million rows/s., 475.90 MB/s.)
与文档的改进版sql结果一致
localhost SELECT
:-]     Carrier,
:-]     avg(DepDelay > 10) * 100 AS c3
:-] FROM ontime
:-] WHERE Year = 2007
:-] GROUP BY Carrier
:-] ORDER BY c3 desc;

SELECT
    Carrier,
    avg(DepDelay > 10) * 100 AS c3
FROM ontime
WHERE Year = 2007
GROUP BY Carrier
ORDER BY c3 DESC

┌─Carrier─┬─────────────────c3─┐
│ EV      │ 35.563909249075934 │
│ US      │  28.01273877477871 │
│ AA      │ 27.798541311368336 │
│ MQ      │ 26.943869867195563 │
│ AS      │  26.73783437899928 │
│ B6      │ 26.503003395142333 │
│ UA      │  26.15785241692891 │
│ WN      │ 25.348648396615197 │
│ OH      │ 25.011015455531453 │
│ CO      │  23.72327487768876 │
│ F9      │ 23.562806873977088 │
│ YV      │ 23.068534661403305 │
│ XE      │ 22.980957879169132 │
│ FL      │  22.59470510223857 │
│ NW      │  21.81503693375084 │
│ OO      │  21.31297250284338 │
│ DL      │  19.68421207466447 │
│ 9E      │  18.13707499681284 │
│ AQ      │  9.273080241587575 │
│ HA      │  4.888295505117935 │
└─────────┴────────────────────┘

20 rows in set. Elapsed: 0.053 sec. Processed 8.43 million rows, 67.48 MB (158.64 million rows/s., 1.27 GB/s.)

使用道具 举报

回复
论坛徽章:
407
紫蛋头
日期:2012-05-21 10:19:41迷宫蛋
日期:2012-06-06 16:02:49奥运会纪念徽章:足球
日期:2012-06-29 15:30:06奥运会纪念徽章:排球
日期:2012-07-10 21:24:24鲜花蛋
日期:2012-07-16 15:24:59奥运会纪念徽章:拳击
日期:2012-08-07 10:54:50奥运会纪念徽章:羽毛球
日期:2012-08-21 15:55:33奥运会纪念徽章:蹦床
日期:2012-08-21 21:09:51奥运会纪念徽章:篮球
日期:2012-08-24 10:29:11奥运会纪念徽章:体操
日期:2012-09-07 16:40:00
12#
 楼主| 发表于 2019-12-24 10:16 | 只看该作者
testontime.sql (16.1 KB, 下载次数: 10) 测试0-10

使用道具 举报

回复
论坛徽章:
407
紫蛋头
日期:2012-05-21 10:19:41迷宫蛋
日期:2012-06-06 16:02:49奥运会纪念徽章:足球
日期:2012-06-29 15:30:06奥运会纪念徽章:排球
日期:2012-07-10 21:24:24鲜花蛋
日期:2012-07-16 15:24:59奥运会纪念徽章:拳击
日期:2012-08-07 10:54:50奥运会纪念徽章:羽毛球
日期:2012-08-21 15:55:33奥运会纪念徽章:蹦床
日期:2012-08-21 21:09:51奥运会纪念徽章:篮球
日期:2012-08-24 10:29:11奥运会纪念徽章:体操
日期:2012-09-07 16:40:00
13#
 楼主| 发表于 2019-12-24 13:19 | 只看该作者
本帖最后由 〇〇 于 2019-12-24 13:21 编辑

性能测试,大表关联还是慢
localhost select count(1) from
:-] (select ArrTime,CRSArrTime,FlightDate from ontime limit 200000000) t1 ;

SELECT count(1)
FROM
(
    SELECT
        ArrTime,
        CRSArrTime,
        FlightDate
    FROM ontime
    LIMIT 200000000
) AS t1

┌──count(1)─┐
│ 183953732 │
└───────────┘

1 rows in set. Elapsed: 0.627 sec. Processed 183.95 million rows, 367.91 MB (293.53 million rows/s., 587.06 MB/s.)

localhost SELECT
:-]     OriginCityName,
:-]     DestCityName,
:-]     count(*) AS flights
:-] FROM (select OriginCityName,DestCityName from ontime limit 100000000) t1   GROUP BY OriginCityName, DestCityName ORDER BY flights DESC LIMIT 20 ;

SELECT
    OriginCityName,
    DestCityName,
    count(*) AS flights
FROM
(
    SELECT
        OriginCityName,
        DestCityName
    FROM ontime
    LIMIT 100000000
) AS t1
GROUP BY
    OriginCityName,
    DestCityName
ORDER BY flights DESC
LIMIT 20

┌─OriginCityName────┬─DestCityName──────┬─flights─┐                                                               ?
│ San Francisco, CA │ Los Angeles, CA   │  263188 │
│ Los Angeles, CA   │ San Francisco, CA │  261646 │
│ New York, NY      │ Chicago, IL       │  227524 │
│ Chicago, IL       │ New York, NY      │  223485 │
│ Chicago, IL       │ Minneapolis, MN   │  223465 │
│ Minneapolis, MN   │ Chicago, IL       │  221180 │
│ Los Angeles, CA   │ Las Vegas, NV     │  218676 │
│ Las Vegas, NV     │ Los Angeles, CA   │  215675 │
│ New York, NY      │ Boston, MA        │  213967 │
│ Boston, MA        │ New York, NY      │  212150 │
│ Washington, DC    │ New York, NY      │  207167 │
│ New York, NY      │ Washington, DC    │  206196 │
│ Houston, TX       │ Dallas, TX        │  204305 │
│ Washington, DC    │ Chicago, IL       │  201170 │
│ Chicago, IL       │ Washington, DC    │  199208 │
│ Dallas, TX        │ Houston, TX       │  196426 │
│ Phoenix, AZ       │ Los Angeles, CA   │  196197 │
│ Los Angeles, CA   │ Phoenix, AZ       │  195803 │
│ Boston, MA        │ Washington, DC    │  195316 │
│ Detroit, MI       │ Chicago, IL       │  193867 │
└───────────────────┴───────────────────┴─────────┘
                                                                                                                                    ?
20 rows in set. Elapsed: 11.860 sec. Processed 102.20 million rows, 4.53 GB (8.62 million rows/s., 381.92 MB/s.)

localhost select * from
:-] (select ArrTime,CRSArrTime,FlightDate from ontime limit 100000000) t1
:-] ALL INNER JOIN (select ArrTime,CRSArrTime,FlightDate from ontime  limit 10000000) t2 on t1.ArrTime=t2.CRSArrTime limit 10;

SELECT *
FROM
(
    SELECT
        ArrTime,
        CRSArrTime,
        FlightDate
    FROM ontime
    LIMIT 100000000
) AS t1
ALL INNER JOIN
(
    SELECT
        ArrTime,
        CRSArrTime,
        FlightDate
    FROM ontime
    LIMIT 10000000
) AS t2 ON t1.ArrTime = t2.CRSArrTime
LIMIT 10

↖ Progress: 26.42 million rows, 264.22 MB (35.80 million rows/s., 358.01 MB/s.) ███▍                                          %^CCancelling query.

Received exception from server (version 19.17.5):
Code: 241. DB::Exception: Received from localhost:9000. DB::Exception: Memory limit (for query) exceeded: would use 10.74 GiB (attempt to allocate chunk of 4295356920 bytes), maximum: 9.31 GiB.
Query was cancelled.

0 rows in set. Elapsed: 139.426 sec. Processed 26.42 million rows, 264.22 MB (189.50 thousand rows/s., 1.90 MB/s.)




使用道具 举报

回复
论坛徽章:
407
紫蛋头
日期:2012-05-21 10:19:41迷宫蛋
日期:2012-06-06 16:02:49奥运会纪念徽章:足球
日期:2012-06-29 15:30:06奥运会纪念徽章:排球
日期:2012-07-10 21:24:24鲜花蛋
日期:2012-07-16 15:24:59奥运会纪念徽章:拳击
日期:2012-08-07 10:54:50奥运会纪念徽章:羽毛球
日期:2012-08-21 15:55:33奥运会纪念徽章:蹦床
日期:2012-08-21 21:09:51奥运会纪念徽章:篮球
日期:2012-08-24 10:29:11奥运会纪念徽章:体操
日期:2012-09-07 16:40:00
14#
 楼主| 发表于 2019-12-24 14:13 | 只看该作者
本帖最后由 〇〇 于 2019-12-24 14:21 编辑

临时表的建立
localhost CREATE TEMPORARY TABLE testtemp as select * from ontime limit 10;

CREATE TEMPORARY TABLE testtemp AS
SELECT *
FROM ontime
LIMIT 10

Ok.

0 rows in set. Elapsed: 0.196 sec.

localhost select count(*) from testtemp;

SELECT count(*)
FROM testtemp

┌─count()─┐
│      10 │
└─────────┘

1 rows in set. Elapsed: 0.008 sec.

查询表分区
localhost SELECT
:-]     partition,
:-]     name,
:-]     active
:-] FROM system.parts
:-] WHERE table = 'ontime';

SELECT
    partition,
    name,
    active
FROM system.parts
WHERE table = 'ontime'

┌─partition─┬─name─────────────┬─active─┐
│ 198710    │ 198710_1_1_1     │      1 │
│ 198711    │ 198711_2_2_1     │      1 │
│ 198712    │ 198712_3_3_1     │      1 │
│ 198801    │ 198801_7_7_1     │      1 │
│ 198802    │ 198802_8_8_1     │      1 │
│ 198803    │ 198803_9_9_1     │      1 │
│ 198804    │ 198804_10_10_1   │      1 │

使用道具 举报

回复
论坛徽章:
407
紫蛋头
日期:2012-05-21 10:19:41迷宫蛋
日期:2012-06-06 16:02:49奥运会纪念徽章:足球
日期:2012-06-29 15:30:06奥运会纪念徽章:排球
日期:2012-07-10 21:24:24鲜花蛋
日期:2012-07-16 15:24:59奥运会纪念徽章:拳击
日期:2012-08-07 10:54:50奥运会纪念徽章:羽毛球
日期:2012-08-21 15:55:33奥运会纪念徽章:蹦床
日期:2012-08-21 21:09:51奥运会纪念徽章:篮球
日期:2012-08-24 10:29:11奥运会纪念徽章:体操
日期:2012-09-07 16:40:00
15#
 楼主| 发表于 2019-12-24 15:11 | 只看该作者
本帖最后由 〇〇 于 2019-12-24 15:32 编辑

后台一直在维护
合并删分区
2019.12.24 14:28:55.490486 [ 34 ] {} <Trace> system.trace_log: Renaming temporary part tmp_merge_201912_1_110_23 to 201912_1_110_23.
2019.12.24 14:28:55.490588 [ 34 ] {} <Trace> system.trace_log (MergerMutator): Merged 3 parts: from 201912_1_108_22 to 201912_110_110_0
2019.12.24 14:38:23.672580 [ 32 ] {} <Trace> system.trace_log: Found 3 old parts to remove.
2019.12.24 14:38:23.672691 [ 32 ] {} <Debug> system.trace_log: Removing part from filesystem 201912_1_108_22
2019.12.24 14:38:23.673360 [ 32 ] {} <Debug> system.trace_log: Removing part from filesystem 201912_109_109_0
2019.12.24 14:38:23.673794 [ 32 ] {} <Debug> system.trace_log: Removing part from filesystem 201912_110_110_0

也可以人工合并,不成功也不报错
localhost OPTIMIZE TABLE ontime PARTITION 201902;

OPTIMIZE TABLE ontime PARTITION 201902

Ok.

0 rows in set. Elapsed: 0.103 sec.
---------------

2019.12.24 15:13:30.934196 [ 30 ] {1a200a45-ed5e-47c2-9f3f-46100ce32bb5} <Debug> executeQuery: (from [::1]:61129) OPTIMIZE TABLE ontime PARTITION 201902
2019.12.24 15:13:31.019246 [ 30 ] {1a200a45-ed5e-47c2-9f3f-46100ce32bb5} <Information> datasets.ontime: Cannot OPTIMIZE table: Cannot select parts for optimization
2019.12.24 15:13:31.019369 [ 30 ] {1a200a45-ed5e-47c2-9f3f-46100ce32bb5} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
2019.12.24 15:13:31.019439 [ 30 ] {1a200a45-ed5e-47c2-9f3f-46100ce32bb5} <Debug> MemoryTracker: Peak memory usage (total): 0.00 B.
2019.12.24 15:13:31.019487 [ 30 ] {1a200a45-ed5e-47c2-9f3f-46100ce32bb5} <Information> TCPHandler: Processed in 0.085 sec.

使用道具 举报

回复
论坛徽章:
407
紫蛋头
日期:2012-05-21 10:19:41迷宫蛋
日期:2012-06-06 16:02:49奥运会纪念徽章:足球
日期:2012-06-29 15:30:06奥运会纪念徽章:排球
日期:2012-07-10 21:24:24鲜花蛋
日期:2012-07-16 15:24:59奥运会纪念徽章:拳击
日期:2012-08-07 10:54:50奥运会纪念徽章:羽毛球
日期:2012-08-21 15:55:33奥运会纪念徽章:蹦床
日期:2012-08-21 21:09:51奥运会纪念徽章:篮球
日期:2012-08-24 10:29:11奥运会纪念徽章:体操
日期:2012-09-07 16:40:00
16#
 楼主| 发表于 2019-12-25 16:44 | 只看该作者

使用道具 举报

回复
论坛徽章:
407
紫蛋头
日期:2012-05-21 10:19:41迷宫蛋
日期:2012-06-06 16:02:49奥运会纪念徽章:足球
日期:2012-06-29 15:30:06奥运会纪念徽章:排球
日期:2012-07-10 21:24:24鲜花蛋
日期:2012-07-16 15:24:59奥运会纪念徽章:拳击
日期:2012-08-07 10:54:50奥运会纪念徽章:羽毛球
日期:2012-08-21 15:55:33奥运会纪念徽章:蹦床
日期:2012-08-21 21:09:51奥运会纪念徽章:篮球
日期:2012-08-24 10:29:11奥运会纪念徽章:体操
日期:2012-09-07 16:40:00
17#
 楼主| 发表于 2019-12-26 08:36 | 只看该作者
本帖最后由 〇〇 于 2019-12-26 08:40 编辑

select 可以输出csv,json等格式,但格式名需要大写
http://www.clickhouse.com.cn/topic/5a3bb82a2141c2917483556d

localhost select * from ontime limit 1 format json;

SELECT *
FROM ontime
LIMIT 1
FORMAT json


Exception on client:
Code: 73. DB::Exception: Unknown format json

Connecting to database datasets at localhost:9000 as user default.
Connected to ClickHouse server version 19.17.5 revision 54428.

--------------
localhost select Year,Carrier from ontime limit 1 format JSON;

SELECT
    Year,
    Carrier
FROM ontime
LIMIT 1
FORMAT JSON

{
        "meta":
        [
                {
                        "name": "Year",
                        "type": "UInt16"
                },
                {
                        "name": "Carrier",
                        "type": "FixedString(2)"
                }
        ],

        "data":
        [
                {
                        "Year": 1987,                                                                                               ?
                        "Carrier": "PS"
                }
        ],

        "rows": 1,

        "rows_before_limit_at_least": 1,

        "statistics":
        {
                "elapsed": 0.003054294,
                "rows_read": 1,
                "bytes_read": 12
        }
}

1 rows in set. Elapsed: 0.035 sec.
---
localhost select Year,Carrier from ontime limit 1 format CSV;

SELECT
    Year,                                                                                                                           ?
    Carrier
FROM ontime
LIMIT 1
FORMAT CSV

1987,"PS"

1 rows in set. Elapsed: 0.039 sec.

----
localhost select Year,Carrier from ontime limit 3 format JSONCompact;

SELECT
    Year,
    Carrier
FROM ontime
LIMIT 3
FORMAT JSONCompact

{
        "meta":
        [
                {
                        "name": "Year",
                        "type": "UInt16"
                },
                {
                        "name": "Carrier",
                        "type": "FixedString(2)"
                }
        ],

        "data":
        [
                [1987, "PS"],
                [1987, "PS"],
                [1987, "PS"]
        ],

        "rows": 3,

        "rows_before_limit_at_least": 3,

        "statistics":
        {
                "elapsed": 0.003968895,
                "rows_read": 3,
                "bytes_read": 20
        }
}

3 rows in set. Elapsed: 0.037 sec.
-----
localhost select Year,Carrier from ontime limit 3 format JSONEachRow;

SELECT
    Year,
    Carrier
FROM ontime
LIMIT 3
FORMAT JSONEachRow

{"Year":1987,"Carrier":"PS"}
{"Year":1987,"Carrier":"PS"}
{"Year":1987,"Carrier":"PS"}

3 rows in set. Elapsed: 0.033 sec.

使用道具 举报

回复
论坛徽章:
407
紫蛋头
日期:2012-05-21 10:19:41迷宫蛋
日期:2012-06-06 16:02:49奥运会纪念徽章:足球
日期:2012-06-29 15:30:06奥运会纪念徽章:排球
日期:2012-07-10 21:24:24鲜花蛋
日期:2012-07-16 15:24:59奥运会纪念徽章:拳击
日期:2012-08-07 10:54:50奥运会纪念徽章:羽毛球
日期:2012-08-21 15:55:33奥运会纪念徽章:蹦床
日期:2012-08-21 21:09:51奥运会纪念徽章:篮球
日期:2012-08-24 10:29:11奥运会纪念徽章:体操
日期:2012-09-07 16:40:00
18#
 楼主| 发表于 2019-12-27 08:24 | 只看该作者
定长字符串列导出的有非法字符
user@localhost ~]$ clickhouse-client --query="select * from datasets.ontime limit 3 format CSV"
1987,4,10,1,4,"1987-10-01","PS",19391,"PS","","1443",10800,1080001,32575,"BUR","Burbank, CA","CA","06","California",91,14831,1483101,32457,"SJC","San Jose, CA","CA","06","California",91,855,0,0,0,0,"","0800-0859",0,0,0,0,958,0,0,0,0,0,"0900-0959",1,"",0,63,0,0,1,297,2,0,0,0,0,0,"","","","","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","",""
1987,4,10,1,4,"1987-10-01","PS",19391,"PS","","1445",10800,1080001,32575,"BUR","Burbank, CA","CA","06","California",91,13796,1379601,32457,"OAK","Oakland, CA","CA","06","California",91,1145,1145,0,0,0,"0","1100-1159",0,0,0,0,1247,1248,1,1,0,0,"1200-1259",0,"",0,62,63,0,1,325,2,0,0,0,0,0,"","","","","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","",""
1987,4,10,1,4,"1987-10-01","PS",19391,"PS","","1446",13796,1379601,32457,"OAK","Oakland, CA","CA","06","California",91,10800,1080001,32575,"BUR","Burbank, CA","CA","06","California",91,1325,1331,6,6,0,"0","1300-1359",0,0,0,0,1422,1437,15,15,1,1,"1400-1459",0,"",0,57,66,0,1,325,2,0,0,0,0,0,"","","","","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","",""
被vectorwise导入报错
[actian@localhost ~]$ time vwload --fdelim "," --rdelim "\n" --table ontime tpch /tmp/ontime3.csv
loading
Error in file '/tmp/ontime3.csv', record 1: Illegal character encountered in input
Input record: '1987,4,10,1,4,"1987-10-01","PS?????",19391,"PS","","1443",10800,1080001,32575,"BUR??","Burbank, CA","CA","06","California",91,14831,1483101,32457,"SJC??","San Jose, CA","CA","06","California",91,855,0,0,0,0,"","0800-0859",0,0,0,0,958,0,0,0,0,0,"0900-0959",1,"?",0,63,0,0,1,297,2,0,0,0,0,0,"","","","","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","",""'
Error in file '/tmp/ontime3.csv', record 2: Illegal character encountered in input
Input record: '1987,4,10,1,4,"1987-10-01","PS?????",19391,"PS","","1445",10800,1080001,32575,"BUR??","Burbank, CA","CA","06","California",91,13796,1379601,32457,"OAK??","Oakland, CA","CA","06","California",91,1145,1145,0,0,0,"0","1100-1159",0,0,0,0,1247,1248,1,1,0,0,"1200-1259",0,"?",0,62,63,0,1,325,2,0,0,0,0,0,"","","","","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","",""'
Error in file '/tmp/ontime3.csv', record 3: Illegal character encountered in input
Input record: '1987,4,10,1,4,"1987-10-01","PS?????",19391,"PS","","1446",13796,1379601,32457,"OAK??","Oakland, CA","CA","06","California",91,10800,1080001,32575,"BUR??","Burbank, CA","CA","06","California",91,1325,1331,6,6,0,"0","1300-1359",0,0,0,0,1422,1437,15,15,1,1,"1400-1459",0,"?",0,57,66,0,1,325,2,0,0,0,0,0,"","","","","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","","","",0,0,"","","","",""'
processed 3 records, loaded 0 records, 3 errors

使用道具 举报

回复
论坛徽章:
407
紫蛋头
日期:2012-05-21 10:19:41迷宫蛋
日期:2012-06-06 16:02:49奥运会纪念徽章:足球
日期:2012-06-29 15:30:06奥运会纪念徽章:排球
日期:2012-07-10 21:24:24鲜花蛋
日期:2012-07-16 15:24:59奥运会纪念徽章:拳击
日期:2012-08-07 10:54:50奥运会纪念徽章:羽毛球
日期:2012-08-21 15:55:33奥运会纪念徽章:蹦床
日期:2012-08-21 21:09:51奥运会纪念徽章:篮球
日期:2012-08-24 10:29:11奥运会纪念徽章:体操
日期:2012-09-07 16:40:00
19#
 楼主| 发表于 2019-12-27 08:35 | 只看该作者
这些?都是\u0000
SON 与 JavaScript 兼容。对于这个目的,一些符号是额外转义符: 斜杠 / 被转义为 /; 替代换行为 U+2028 和 U+2029, 在一些浏览器上是不能用的, 被转义为 uXXXX-序列. ASCII 控制字符被转义: 分别为退格, form feed, line feed, carriage return, 和 水平 Tab 键 \b,\f,\n,\r, 和 \ 剩下的字节从 00-1F,使用 uXXXX-序列. 无效的 UTF-8 序列 被更改为替换字符

还可以使用\t分隔
[actian@localhost ~]$ clickhouse-client --query="select * from datasets.ontime limit 3 format TabSeparated"
1987    4       10      1       4       1987-10-01      PS\0\0\0\0\0    19391   PS              1443    10800   1080001 32575   BUR\0\0     Burbank, CA     CA      06      California      91      14831   1483101 32457   SJC\0\0 San Jose, CA    CA      06      California  91      855     0       0       0       0               0800-0859       0       0       0       0       958     0       00       0       0       0900-0959       1       \0      0       63      0       0       1       297     2       0       0       00       0                                                                               0       00       0                                                       0       0                                                       00                                                       0       0

使用道具 举报

回复
论坛徽章:
407
紫蛋头
日期:2012-05-21 10:19:41迷宫蛋
日期:2012-06-06 16:02:49奥运会纪念徽章:足球
日期:2012-06-29 15:30:06奥运会纪念徽章:排球
日期:2012-07-10 21:24:24鲜花蛋
日期:2012-07-16 15:24:59奥运会纪念徽章:拳击
日期:2012-08-07 10:54:50奥运会纪念徽章:羽毛球
日期:2012-08-21 15:55:33奥运会纪念徽章:蹦床
日期:2012-08-21 21:09:51奥运会纪念徽章:篮球
日期:2012-08-24 10:29:11奥运会纪念徽章:体操
日期:2012-09-07 16:40:00
20#
 楼主| 发表于 2019-12-27 08:43 | 只看该作者
用sed 替换掉\0
clickhouse-client --query="select * from datasets.ontime limit 1 format CSV" |sed 's/\x00//g' >/tmp/ontime1.csv

使用道具 举报

回复

您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

TOP技术积分榜 社区积分榜 徽章 团队 统计 知识索引树 积分竞拍 文本模式 帮助
  ITPUB首页 | ITPUB论坛 | 数据库技术 | 企业信息化 | 开发技术 | 微软技术 | 软件工程与项目管理 | IBM技术园地 | 行业纵向讨论 | IT招聘 | IT文档
  ChinaUnix | ChinaUnix博客 | ChinaUnix论坛
CopyRight 1999-2011 itpub.net All Right Reserved. 北京盛拓优讯信息技术有限公司版权所有 联系我们 未成年人举报专区 
京ICP备16024965号-8  北京市公安局海淀分局网监中心备案编号:11010802021510 广播电视节目制作经营许可证:编号(京)字第1149号
  
快速回复 返回顶部 返回列表