快捷導(dǎo)航

使用JS解析excel文件的完整實現(xiàn)步驟

更新時間：2022年10月16日 10:13:24 作者：嘿嘿Z

解析excel文件是我們?nèi)粘ｉ_發(fā)中經(jīng)常遇到的一個需求,下面這篇文章主要給大家介紹了關(guān)于使用JS解析excel文件的完整實現(xiàn)步驟,文中通過示例代碼介紹的非常詳細(xì),需要的朋友可以參考下

前言

今天來聊一聊如何使用 JS 來解析 excel 文件，當(dāng)然不是直接使用 exceljs、sheetjs 之類的庫，那就沒意思了，而是主要說一下 JS 解析 excel 表格是如何實現(xiàn)的。

注意本文主要討論 xlsx 格式的 excel 表格，其它格式未探究并不清楚。

excel 表格文件到底是什么

首先要解析 excel 文件，得先了解他是如何存儲數(shù)據(jù)的，經(jīng)過我百般搜索，終于在 GG 中找到了答案：excel 文件其實是一個 zip 包！于是我趕緊新建了一個 xlsx 文件，在其中新建了兩個 sheet 表，兩個 sheet 表數(shù)據(jù)如下：

此為 sheet 1：

A	B	C
1		2
1		2



1		2
1		2

此為 sheet 2：

A	B
q	a
q	a
q	a

然后使用 zip 進行解壓：

unzip test.xlsx -d test

然后通過 tree 我們就拿到這樣一個目錄結(jié)構(gòu)：

test
├── [Content_Types].xml
├── _rels
├── docProps
│ ├── app.xml
│ ├── core.xml
│ └── custom.xml
└── xl
├── _rels
│ └── workbook.xml.rels
├── sharedStrings.xml
├── styles.xml
├── theme
│ └── theme1.xml
├── workbook.xml
└── worksheets
├── sheet1.xml
└── sheet2.xml

啊哈，干得漂亮，居然全都是 xml 文件。

我們在打開 xml 一探究竟，可以看出有幾個文件很顯眼，就是 worksheets 下的 sheet1.xml 和 sheet2.xml，還有 workbook.xml，其他的 styles、theme 一看就是和樣式有關(guān)系，_rels 感覺就是什么內(nèi)部引用，我們先看看兩個 sheet 的 xml 文件，看看猜測是否正確，貼下 sheet1.xml：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<worksheet xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
    xmlns:xdr="http://schemas.openxmlformats.org/drawingml/2006/spreadsheetDrawing"
    xmlns:x14="http://schemas.microsoft.com/office/spreadsheetml/2009/9/main"
    xmlns:mc="http://schemas.openxmlformats.org/markup-compatibility/2006"
    xmlns:etc="http://www.wps.cn/officeDocument/2017/etCustomData">
    <sheetPr/>
    <dimension ref="A1:C7"/>
    <sheetViews>
        <sheetView workbookViewId="0">
            <selection activeCell="D5" sqref="A3:D5"/>
        </sheetView>
    </sheetViews>
    <sheetFormatPr defaultColWidth="9.23076923076923" defaultRowHeight="16.8" outlineLevelRow="6" outlineLevelCol="2"/>
    <sheetData>
        <row r="1" spans="1:3">
            <c r="A1">
                <v>1</v>
            </c>
            <c r="C1">
                <v>2</v>
            </c>
        </row>
        <row r="2" spans="1:3">
            <c r="A2">
                <v>1</v>
            </c>
            <c r="C2">
                <v>2</v>
            </c>
        </row>
        <row r="6" spans="1:3">
            <c r="A6">
                <v>1</v>
            </c>
            <c r="C6">
                <v>2</v>
            </c>
        </row>
        <row r="7" spans="1:3">
            <c r="A7">
                <v>1</v>
            </c>
            <c r="C7">
                <v>2</v>
            </c>
        </row>
    </sheetData>
    <pageMargins left="0.75" right="0.75" top="1" bottom="1" header="0.5" footer="0.5"/>
    <headerFooter/>
</worksheet>

?? 相信大家已經(jīng)看出來了，sheetData 就是 excel 表格中的數(shù)據(jù)了，<row> 代表行，其中的 r 則是行數(shù)索引，row 中的 <c> 應(yīng)該是 cell 了，其中的 <v> 對應(yīng)著 cell 中的值，而 r 則是 cell 的位置，如 A7 代表著在 A 列 7 行。

此外還有幾個很明顯的屬性如 dimension 可以看出是表格的大小范圍，從 A1 cell 到 C7 cell 形成一個框。<sheetViews> 中存儲的應(yīng)該是頁面中的信息，<selection> 代表的應(yīng)該就是被選中的表格內(nèi)容了。

而 workbook 中存儲的則是 sheet 的信息：

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<workbook xmlns="http://schemas.openxmlformats.org/spreadsheetml/2006/main"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
    <fileVersion appName="xl" lastEdited="3" lowestEdited="5" rupBuild="9302"/>
    <workbookPr/>
    <bookViews>
        <workbookView windowHeight="16360" activeTab="1"/>
    </bookViews>
    <sheets>
        <sheet name="Sheet1" sheetId="1" r:id="rId1"/>
        <sheet name="Sheet2" sheetId="2" r:id="rId2"/>
    </sheets>
    <calcPr calcId="144525"/>
</workbook>

剩下的幾個 xml，大概看了一眼，存儲的信息還算很清楚，比如：

app 中存儲了文件程序的信息，好像還有文件名
core 中保存了作者的信息和創(chuàng)建、修改時間
rels 文件也是 xml 格式，存儲了一些其它 xml 的引用
theme 里存儲了表格中定義的顏色、字體
[Content_Types] 里則是所有文件的引用，猜測估計為解析的入口文件

JS 實現(xiàn)步驟

知道了 excel 文件是如何存儲數(shù)據(jù)的，那我們?nèi)绾斡?js 來解析它就很清楚了，主要分三步：

使用 js 解壓縮 excel 文件
獲取到其中的 sheet 文件內(nèi)容，然后將 xml 數(shù)據(jù)解析出來
將數(shù)據(jù)轉(zhuǎn)換成我們想要的形狀

說干就干，那我們來實操一下：

ZIP 解壓

關(guān)于 JS 如何實現(xiàn) ZIP 解壓的，上一篇文章也有提到，這里我們就不細(xì)說，直接使用 jszip 搞定：

document.querySelector('#file').addEventListener('change', async e => {
    const file = e.target.files[0];
    if (!file) return;
    const zip = await JSZip.loadAsync(file);
    const sheetXML = await zip.files['xl/worksheets/sheet1.xml'].async('string');
});

快速搞定，現(xiàn)在 sheetXML 就是我們剛剛看到的 sheet1.xml 中的數(shù)據(jù)了。

XML 解析

然后我們即可解析 XML 內(nèi)容將其中數(shù)據(jù)取出，xml 解析原理很簡單，和 html parse 一樣，了解原理咱就直接隨便搞個開源庫幫忙搞定：

import convert from 'xml-js';

const result = convert.xml2json(sheetXML, { compact: true, spaces: 4 });

然后我們就得到了這樣一串 JSON（刪除了部分內(nèi)容）：

{
    "_declaration": {
        "_attributes": {}
    },
    "worksheet": {
        "_attributes": {},
        "sheetPr": {},
        "dimension": {
            "_attributes": {
                "ref": "A1:C7"
            }
        },
        "sheetData": {
            "row": [
                {
                    "_attributes": {
                        "r": "1",
                        "spans": "1:3"
                    },
                    "c": [
                        {
                            "_attributes": {
                                "r": "A1"
                            },
                            "v": {
                                "_text": "1"
                            }
                        },
                        {
                            "_attributes": {
                                "r": "C1"
                            },
                            "v": {
                                "_text": "2"
                            }
                        }
                    ]
                },
                {
                    "_attributes": {
                        "r": "7",
                        "spans": "1:3"
                    },
                    "c": [
                        {
                            "_attributes": {
                                "r": "A7"
                            },
                            "v": {
                                "_text": "1"
                            }
                        },
                        {
                            "_attributes": {
                                "r": "C7"
                            },
                            "v": {
                                "_text": "2"
                            }
                        }
                    ]
                }
            ]
        }
    }
}

接下來，我們只需要將 sheetData 中的數(shù)據(jù)取出，然后按照內(nèi)部的屬性生成自己想要的數(shù)據(jù)格式即可。

總結(jié)

excel 文件本質(zhì)就是一個 zip 包，我們只需要通過 zip 解壓、xml 解析、數(shù)據(jù)處理這三個步驟，即可使用 JS 讀取到其中的數(shù)據(jù)，當(dāng)然其中的細(xì)節(jié)還是很多的，不過如果只是簡單的 excel 模版，不妨自己嘗試一下。

到此這篇關(guān)于使用JS解析excel文件的文章就介紹到這了,更多相關(guān)JS解析excel文件內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: