JSP如何实现对txt文件内容的选择性读取

欣水寓言 · 发表于 2007-1-5 22:50

我现在用的方法是将读取到的数据
用replaceall来处理，将两个空格替换成一个空格，循环执行到字符串中不存在两个连续的空格
然后再用split来按空格分隔字段，效果是成功了
但是方法好像不是很好，因为数据多了会很占用资源和时间

不知道谁有更好的方法
你们说的正则是指什么正则？

greenflute · 发表于 2007-1-7 07:25

明白你为什么问把两个空格变成一个空格了，呵呵

还好，文本生成方面的程序没有考虑到什么对齐之类的padding问题，否则，简单的替换也不起作用的，呵呵

至于xml输出，并不是所有程序设计的时候都有这方面的考虑的，从另一方面讲，即便有这样的考虑，结果可能会更难办，不幸遇到过几次，颇让人伤脑筋，呵呵

对于正则表达式，难点主要在于汉字匹配（可能中文系统下没问题吧），其他的都比较简单。

import java.util.regex.*;

public class RegexpTest {

private static Pattern pattern = Pattern.compile("\\s*([\\p{InCJK Unified Ideographs}||\\p{InCJK Unified Ideographs Extension A}||\\p{InCJK Compatibility Ideographs}||\\p{InCJK Compatibility Forms}||\\p{InEnclosed CJK Letters and Months}||\\p{InSmall Form Variants}||\\p{InBopomofo}||\\p{InBopomofo Extended}]*)\\s*([0-9]*)\\s*([0-9]{4}\\-[0-9]{1,2}\\-[0-9]{1,2})\\s*([0-9]{2}:[0-9]{2}:[0-9]{2})\\s*([0-9]{4}\\-[0-9]{1,2}\\-[0-9]{1,2})\\s*([0-9]{2}:[0-9]{2}:[0-9]{2})\\s*([0-9]{2}:[0-9]{2}:[0-9]{2})\\s([0-9]*)\\s*([\\p{InCJK Unified Ideographs}||\\p{InCJK Unified Ideographs Extension A}||\\p{InCJK Compatibility Ideographs}||\\p{InCJK Compatibility Forms}||\\p{InEnclosed CJK Letters and Months}||\\p{InSmall Form Variants}||\\p{InBopomofo}||\\p{InBopomofo Extended}]*)\\s*([0-9]*\\.[0-9]*)\\s*([0-9]*\\.[0-9]*)\\s*";

public static void main(String[] args) throws Exception {

String[] txt = new String[] {
"\u6bf3 59125566 2007-1-4 10:17:18 2007-1-4 10:18:06 00:00:48 1 \u6bf3 0.11 0.22",
"小明 59125566 2007-1-4 10:17:18 2007-1-4 10:18:06 00:00:48 1 市话默认 0.11 0.22\n",
new String("小强 59125566120 2007-1-4 10:25:06 2007-1-4 10:27:31 00:02:25 1 市话默认 0.11 0.22 \n".getBytes("UTF-8","UTF-8" };

Matcher m = null;
for (String s : txt) {
m = pattern.matcher(s);
if (m.matches()) {
for (int i = 1; i <= m.groupCount(); i++)
System.out.printf("Group %s = %s\n",i,m.group(i));
} else {
System.out.printf("Error! String \"%s\" not matched!\n", s);
}
}

}
}

比较复杂的代码部分主要是汉字在unicode表中的分布了，因为汉字占用了不止一个unicodeblock，所以判断起来就比较繁琐了。

结果比较简单，11个组，当然如果需要对年月日，时间，分秒都要细分的话，可能组数会更多一些，或者如果有些字段不需匹配，也可以少一些。不过觉得这都是小问题，可以另外处理。

另外要说明的是，如果有字符集问题，可以参照代码中的处理，也可能不需要，在非中文系统下编译问题不大，却挺烦人，比写程序时间还长，呵呵

Group 1 = 小明
Group 2 = 59125566120
Group 3 = 2007-1-4
Group 4 = 10:25:06
Group 5 = 2007-1-4
Group 6 = 10:27:31
Group 7 = 00:02:25
Group 8 = 1
Group 9 = 市话默认
Group 10 = 0.11
Group 11 = 0.22

另，关于文中unicode block的名称，和正则表达式的标准，可以参见http://www.unicode.org/reports/tr18/#Simple_Word_Boundaries 和http://www.unicode.org/reports/tr18/#Character_Blocks

greenflute · 发表于 2007-1-7 07:29

如果实在觉得麻烦，可以用"([.^\\s]*)"试一下，只是不能保证有效罢了，呵呵

greenflute · 发表于 2007-1-7 07:37

我现在用split实现了按空格读取字段，问题就是分割的空格有几个，它就分成多少段了，很麻烦，所以不知道有没有更好的方法
大家再帮忙我想想看其他更好的办法

split也可以用正则表达式的，或与这样更好也说不定。
比如

String[] txt = new String[] {
"\u6bf3 59125566 2007-1-4 10:17:18 2007-1-4 10:18:06 00:00:48 1 \u6bf3 0.11 0.22",
"小明 59125566 2007-1-4 10:17:18 2007-1-4 10:18:06 00:00:48 1 市话默认 0.11 0.22\n"};

for(String s : txt){
String[] ss=s.split("\\s{1}+";
for(String s_:ss) System.out.println(s_);
}

效果一样，但是简洁了很多，呵呵