快捷導(dǎo)航

java高效讀大文件(csv,text)的幾種處理方式

更新時(shí)間：2024年07月03日 09:40:14 作者：w_l666

這篇文章主要給大家介紹了關(guān)于java高效讀大文件(csv,text)的幾種處理方式,Java中處理大文件時(shí),通常需要采取一些特定的策略來(lái)避免內(nèi)存溢出或性能問(wèn)題,文中通過(guò)代碼及圖片介紹的非常詳細(xì),需要的朋友可以參考下

前言

當(dāng)我們?cè)谔幚硪粋€(gè)2G或者更大的文件數(shù)據(jù)時(shí)，往往是很耗系統(tǒng)性能的，處理不當(dāng)可能造成系統(tǒng)崩潰。接下來(lái)介紹四種讀取大文件的方式，以及每種方式的資源的調(diào)用情況。

方法1：Guava讀取

由于我是用的windows系統(tǒng)，在第一次測(cè)試時(shí)用了2G的文件，最后在讀取文件時(shí)，讀取了好久，最后報(bào)錯(cuò)堆內(nèi)存溢出（由此可知，這種方式是基于內(nèi)存進(jìn)行一次性讀取整個(gè)文件，文件越大，占用的資源越多）。然后選用了一個(gè)624MB的csv文件進(jìn)行測(cè)試。

代碼示例如下：

    @org.junit.Test
    public void testGuavaReadFile() throws IOException {
        //本次測(cè)試的這個(gè)outPut.csv文件的大小是624MB
        String filePath = "D:\\outPut.csv";
        File file = new File(filePath);
        Long startTime = System.currentTimeMillis();
        //進(jìn)行文件的讀取，返回結(jié)果：每行數(shù)據(jù)都是一個(gè)string字符串
        List<String> lines = Files.readLines(file, Charsets.UTF_8);

        for (String line : lines) {
            // 在這里添加對(duì)每行數(shù)據(jù)的處理邏輯
            System.out.println("Processing line: " + line);
        }
        Long endTime = System.currentTimeMillis();
        long consume = (endTime - startTime)/1000;
        System.out.println("************總共耗時(shí):"+consume+"秒*****************");
    }

監(jiān)控結(jié)果如下：

從上圖可以看到：

時(shí)間消耗：20秒堆內(nèi)存：最高2.5GCPU消耗：最高50%

方式2：Apache Commons IO普通方式

代碼如下：

    @org.junit.Test
    public void TestCommonsIoReadFile() throws IOException {
        //本次測(cè)試的這個(gè)outPut.csv文件的大小是624MB
        String filePath = "D:\\outPut.csv";
        File file = new File(filePath);
        Long startTime = System.currentTimeMillis();
        //Apache Commons IO普通方式讀取文件
        List<String> lines = FileUtils.readLines(file, "UTF-8");

        for (String line : lines) {
            // 在這里添加對(duì)每行數(shù)據(jù)的處理邏輯
            System.out.println("Processing line: " + line);
        }
        Long endTime = System.currentTimeMillis();
        long consume = (endTime - startTime)/1000;
        System.out.println("************CommonsIo方式總共耗時(shí):"+consume+"秒*****************");
    }

運(yùn)行結(jié)果：

從上圖可以看出：

時(shí)間消耗：17秒CPU消耗：最高50%，平穩(wěn)運(yùn)行25%左右

方式3：java文件流

代碼如下：

@org.junit.Test
    public void TestJavaIoReadFile() throws IOException {
        //本次測(cè)試的這個(gè)outPut.csv文件的大小是624MB
        String filePath = "D:\\outPut.csv";
        Long startTime = System.currentTimeMillis();
        FileInputStream inputStream = null;
        Scanner scanner = null;

        try {
            inputStream = new FileInputStream(filePath);
            scanner = new Scanner(inputStream, "UTF-8");

            while (scanner.hasNextLine()) {
                //逐行讀取文件內(nèi)容
                String line = scanner.nextLine();
                System.out.println(line);
            }

            if (scanner.ioException() != null) {
                throw scanner.ioException();
            }
        } finally {
            if (inputStream != null) {
                inputStream.close();
            }

            if (scanner != null) {
                scanner.close();
            }
        }
        Long endTime = System.currentTimeMillis();
        long consume = (endTime - startTime)/1000;
        System.out.println("************CommonsIo方式總共耗時(shí):"+consume+"秒*****************");
    }

運(yùn)行結(jié)果：

從上圖可以看出：

時(shí)間消耗：32秒，增加了一倍堆內(nèi)存：最高1G，少了一半CPU消耗：平穩(wěn)運(yùn)行25%左右

方式4：Apache Commons IO流

代碼如下：

 @org.junit.Test
    public void TestApacheCommonsIOReanFile() throws IOException {
        //本次測(cè)試的這個(gè)outPut.csv文件的大小是624MB
        String filePath = "D:\\outPut.csv";
        Long startTime = System.currentTimeMillis();
        LineIterator lineIterator = null;

        try {
            lineIterator = FileUtils.lineIterator(new File(filePath), "UTF-8");

            while (lineIterator.hasNext()) {
                String line = lineIterator.nextLine();
                System.out.println(line);
            }
        } finally {
            LineIterator.closeQuietly(lineIterator);
        }
        Long endTime = System.currentTimeMillis();
        long consume = (endTime - startTime)/1000;
        System.out.println("************CommonsIo方式總共耗時(shí):"+consume+"秒*****************");
    }

運(yùn)行結(jié)果：