快捷導(dǎo)航

Python如何調(diào)用spire.doc輕松讀取Word文檔內(nèi)容

更新時(shí)間：2025年02月13日 09:13:01 作者：覓遠(yuǎn)

Spire.Doc?for?.NET?是一款專門對(duì)?Word?文檔進(jìn)行操作的?.NET?類庫(kù),本文為大家介紹了Python如何調(diào)用spire.doc實(shí)現(xiàn)輕松讀取Word文檔內(nèi)容,需要的可以了解下

前言

Spire.Doc for .NET 是一款專門對(duì) Word 文檔進(jìn)行操作的 .NET 類庫(kù)。這款控件的主要功能在于幫助開發(fā)人員輕松快捷高效的創(chuàng)建、編輯、轉(zhuǎn)換、比較和打印 Microsoft Word 文檔。作為一款獨(dú)立的 Word .NET 控件，Spire.Doc for .NET 的運(yùn)行系統(tǒng)（服務(wù)器端或客戶端）均無(wú)需安裝 Microsoft Word，但是它卻可以將 Microsoft Word 文檔的操作功能集成到任何開發(fā)人員的 .NET（ASP.NET、Windows Form、.NET Core、.NET 5.0、.NET 6.0、.NET 7.0、.NET Standard、 Xamarin 和 Mono Android）應(yīng)用程序中。

注意，文件在讀取或?qū)懭氩僮鲿r(shí)必須是關(guān)閉狀態(tài)，否則會(huì)報(bào)錯(cuò)。

讀取全部文本內(nèi)容

from spire.doc import *
from spire.doc.common import *
 
inputFile = r'自檢測(cè)試報(bào)告.doc'
outputFile = r'自檢測(cè)試報(bào)告.docx'
 
document = Document()  # 創(chuàng)建Document實(shí)例
document.LoadFromFile(inputFile)  # 加載Word文檔
document_text = document.GetText()
print(document_text)

通過節(jié)點(diǎn)讀取數(shù)據(jù)

Document.Sections[index] 屬性可用于獲取Word 文檔中的特定節(jié)點(diǎn)。獲取后，可遍歷該節(jié)中的段落、表格等。

print(len(document.Sections))  # 獲取節(jié)點(diǎn)數(shù)量
print(document.Sections.Count)  # 獲取節(jié)點(diǎn)數(shù)量
section = document.Sections
 
# 分段落獲取文本內(nèi)容
for i in range(document.Sections.Count):
    paragraphs = section[i].Paragraphs
    for p in range(paragraphs.Count):
        print(paragraphs[p].Text)

按頁(yè)讀取

因?yàn)閃ord文檔本質(zhì)上是流式文檔，流式布局，所以沒有“頁(yè)面”的概念。為了方便頁(yè)面操作，Spire.Doc for Python提供了FixedLayoutDocument類，用于將Word文檔轉(zhuǎn)換為固定布局。

layoutDoc = FixedLayoutDocument(document)  # 創(chuàng)建FixedLayoutDocument類的實(shí)例，用于將Word文檔轉(zhuǎn)換為固定布局。
 
print(layoutDoc.Pages.Count)
 
for p in range(layoutDoc.Pages.Count):
    page_data = layoutDoc.Pages[p]
    # print(page_data.Text)   # 按頁(yè)讀取文本
    cols_data = page_data.Columns
    for col in range(len(cols_data)):
        # print(cols_data[col].Text)  # 按段讀取文本
        row_data = cols_data[col].Lines
        for row in range(len(row_data)):
            print(row_data[row].Text)  # 按行讀取文本

讀取頁(yè)眉頁(yè)腳

section = document.Sections
 
for i in range(document.Sections.Count):
 
    header = section[i].HeadersFooters.Header  # 獲取該節(jié)的頁(yè)眉對(duì)象
 
    footer = section[i].HeadersFooters.Footer  # 獲取該節(jié)的頁(yè)腳對(duì)象
    for h in range(header.Paragraphs.Count):
        headerPara = header.Paragraphs[h]
        print(headerPara.Text)
        
    for f in range(footer.Paragraphs.Count):
        footerPara = footer.Paragraphs[f]
        print(footerPara.Text)

遍歷表格數(shù)據(jù)

document = Document()  # 創(chuàng)建Document實(shí)例
document.LoadFromFile(inputFile)  # 加載Word文檔
 
for i in range(document.Sections.Count):
    section = document.Sections.get_Item(i)
    for j in range(section.Tables.Count):
        table = section.Tables.get_Item(j)
 
        # 遍歷表格中的行
        for row in range(table.Rows.Count):
            row_data = []
 
            # 遍歷行中的單元格
            for cell in range(table.Rows.get_Item(row).Cells.Count):
                cell_obj = table.Rows.get_Item(row).Cells.get_Item(cell)
                cell_text = ""
 
                # 獲取單元格中的段落內(nèi)容
                for paragraph_index in range(cell_obj.Paragraphs.Count):
                    paragraph = cell_obj.Paragraphs.get_Item(paragraph_index)
                    cell_text += paragraph.Text
 
                row_data.append(cell_text.strip())
 
            # 打印行數(shù)據(jù)
            print(row_data)
            
document.Close()

查找指定文本

def FindAllString(self ,matchString:str,caseSensitive:bool,wholeWord:bool)->List['TextSelection']

參數(shù)：

matchString:str，要查找的內(nèi)容
caseSensitive:bool，如果為True，匹配是區(qū)分大小寫的。
wholeWord:bool，如果為True，匹配的必須是一個(gè)完整的單詞。

可對(duì)查找的內(nèi)容進(jìn)行其他操作

document = Document()  # 創(chuàng)建Document實(shí)例
document.LoadFromFile(inputFile)  # 加載Word文檔
 
textSelections = document.FindAllString("測(cè)試報(bào)告", False, True)
 
# 對(duì)找到的內(nèi)容設(shè)置高亮顯示顏色
for selection in textSelections:
    selection.GetAsOneRange().CharacterFormat.HighlightColor = Color.get_Blue()
 
document.SaveToFile(outputFile, FileFormat.Docx)
document.Close()

以上就是Python如何調(diào)用spire.doc輕松讀取Word文檔內(nèi)容的詳細(xì)內(nèi)容，更多關(guān)于Python讀取Word的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章: