为SAP Sybase IQ 15.1 Demo库建大数据表并验证其性能
接上一节,我们有了demo数据库,可惜里边的数据集相对都比较小。如果你没有合适的测试数据集,也没有可用的生产环境为你提供数据集,怎么办?
可以自己去造一张大表,生成随机数据。这是许多DBA或者开发人员尤其是研究系统性能的开发人员常用的办法。
第一步,建目标表
建一张表,字段足够多,如,表名为BIG,有43个字段:
定义如下:
CREATE TABLE "BIG" ( "H0" VARchar(1), "ID" VARchar(18), "H02" VARchar(1), "H031" VARchar(2), "H032" VARchar(2), "H041" VARchar(2), "H042" VARchar(2), "H051" VARchar(2), "H052" VARchar(2), "H061" VARchar(2), "H062" VARchar(2), "H071" VARchar(1), "H072" VARchar(1), "H081" VARchar(1), "H082" VARchar(1), "H09" VARchar(2), "H10" VARchar(3), "H11" VARchar(1), "H12" VARchar(1), "H13" VARchar(4), "H14" VARchar(1), "H15" VARchar(1), "H16" VARchar(1), "H17" VARchar(1), "H18" VARchar(1), "H19" VARchar(1), "H20" VARchar(1), "H21" VARchar(1), "H22" VARchar(1), "H23" VARchar(1), "HA0" VARchar(1), "HA1" VARchar(2), "HA2" VARchar(2), "HA3" VARchar(1), "HA4" VARchar(1), "HA5" VARchar(1), "HA6" VARchar(1), "HA7" VARchar(1), "HA8" VARchar(1), "HA9" VARchar(1), "HA10" VARchar(3), "HA11" VARchar(1), "HA20" VARchar(2));
2. 生成数据
我们使用比较直接而且笨的办法,插入随机数据,但是经过实测,发现性能极其低下,最后我已经无法忍受了。其脚本如下:
BEGINDECLARE i INT;SET i = 1;WHILE i<=1000000 LOOP INSERT INTO "BIG" VALUES ( char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48), char( cast (rand() * 10 as int) + 48) ); SET i = i + 1; if ( i % 2000 = 0) then COMMIT; END IF;END LOOP;END;1000000条数据,几个小时能弄完。
有没有快捷的方式呢?
可以写一小段程序,生成一个数据文件,再后再用INPUT或者LOAD来加载
这一小段程序如下, 用C实现:
#include "stdafx.h"#include <stdio.h>#include <time.h>int getN(){ return rand() % 10;}static void gen_data(FILE* f, int n, int colCount){ // file : d:/work/demo/BIG.txt, default colCount is 43 for (int i=0; i<n; i++) { for (int j=0; j<colCount-1; j++) { fprintf(f, "%ld,", getN()); } fprintf(f, "%ld/n", getN()); }}int main(int argc,char** argv){ srand( (unsigned)time( NULL ) ); printf("rand = %ld/n", getN()); long begin = (long) time(NULL); FILE * f = fopen("d://asa120//BIG.txt", "wt"); gen_data(f, 1000000, 43); fclose(f); printf("generate finished.../n"); printf("time consumed: %ld /n", (long)time(NULL) - begin); return 0;}
只用10秒左右即生成所有数据。
3. 加载数据
如果采用INPUT命令,INPUT INTO BIG from 'd://asa120//BIG.txt' format ascii escapes on escape character '//' delimited by ',' encoding 'GBK';实测速度比较慢,一秒钟大概也就300多条。文档中说了:
The LOAD TABLE statement adds rows into a table; it doesn't replace them.
Loading data using the LOAD TABLE statement (without the WITH ROW LOGGING and WITH CONTENT LOGGING options) is considerably faster than using the INPUT statement.
看来,INPUT操作比LOAD操作,多了些LOGGING的操作,所以费时间。
痛苦的LOAD TABLE命令开始了,
试了好几个用法:
最后,下述命令通过, 大概花了10来秒钟完成100万条数据的加载,速度非常快。
LOAD TABLE BIG (H0',',ID',',H02',',H031',',H032',',H041',',H042',',H051',',H052',',H061',',H062',',H071',',H072',',H081',',H082',',H09',',H10',',H11',',H12',',H13',',H14',',H15',',H16',',H17',',H18',',H19',',H20',',H21',',H22',',H23',',HA0',',HA1',',HA2',',HA3',',HA4',',HA5',',HA6',',HA7',',HA8',',HA9',',HA10',',HA11',',HA20'/X0A')from 'd://asa120//BIG.txt' ESCAPES OFFQUOTES OFFNOTIFY 100000WITH CHECKPOINT ON
要说明的是,在上边,如果没有各列后边的分隔符说明,如果quotes设成默认值 ON,该命令都会执行失败。
我估计load table命令在这个版本里可能功能不是很完善。
比如,一个简单的数据文件内容如下:
'123','456'
'222','111'
使用INPUT命令,很容易就载入表abc成功。
input into abc from 'd://asa120//abc.txt' format ascii escapes on escape character '//' delimited by ',' encoding 'GBK';
可是用load table就失败,默认值也出错。
truncate table abc;load table abc(col1 ',', col2 '0x0A') from 'd://asa120//abc.txt' escapes off;结果报错:Non-space text found after ending quote character for an enclosed field ......
4. 验证查询
select ID,sum(case when ID<>'0' then cnt end) c1,sum(case when ha3='1' then cnt end) c2,sum(case when ha3='2' then cnt end) c4,sum(case when ha3='3' then cnt end) c6,sum(case when ha3='4' then cnt end) c8from (select substr(ID,1,6) ID,count(*)cnt,ha3from BIG group by id,ha3) Agroup by IDorder by ID
用时0.032秒。非常快。赞一个。
- 06-16卡巴斯基郑启良:支持信创发展是卡巴斯基的重要使命
- 06-16访问管理是确保现代工作场所安全的的五个关键原因
- 06-16零信任安全的演变:彻底改变网络安全策略
- 06-16GitHub上值得关注的20个网络安全项目
- 06-16英国曼彻斯特大学遭遇网络攻击,机密数据或遭窃!
- 06-16调查表明广告软件推送恶意软件感染了六万多个安卓应用程序
- 06-16微软向美国政府提供GPT的大模型,安全性如何保证?
- 06-16如何保护OT环境免受安全威胁?
- 01-11全球最受赞誉公司揭晓:苹果连续九年第一
- 12-09罗伯特·莫里斯:让黑客真正变黑
- 12-09谁闯入了中国网络?揭秘美国绝密黑客小组TA
- 12-09警示:iOS6 惊现“闪退”BUG
- 12-05亚马逊推出新一代基础模型 任意模态生成大模
- 12-05OpenAI拓展欧洲业务 将在苏黎世设立办公室
- 12-05微软质疑美国联邦贸易委员会泄露信息 督促其
- 12-05联交所取消宝宝树上市地位 宝宝树:不会对公
- 12-04企业微信致歉:文档打开异常已完成修复