使用Python實現(xiàn)Office文檔(Word/Excel/PowerPoint)批量轉(zhuǎn)換為PDF

更新時間：2024年10月22日 08:41:04 作者：Eiceblue

在處理不同格式的Office文檔（如Word、Excel和PowerPoint）時,將其轉(zhuǎn)換為PDF格式是常見的需求,本文就跟隨小編來看看如何使用Python將Word/Excel/PowerPoint批量轉(zhuǎn)換為PDF吧

在處理不同格式的Office文檔（如Word、Excel和PowerPoint）時，將其轉(zhuǎn)換為PDF格式是常見的需求。這種轉(zhuǎn)換不僅確保了文件在不同設備和操作系統(tǒng)間的一致性顯示，而且有助于保護原始內(nèi)容不被輕易修改，非常適合于正式報告、提案或資料歸檔等場景。通過使用Python，開發(fā)者可以編寫簡潔高效的腳本來自動完成這些任務，滿足企業(yè)或個人對于文檔管理的需求。本文將介紹如何使用Python代碼實現(xiàn)Word、Excel和PowerPoint文檔到PDF文件的批量轉(zhuǎn)換，同時提供用Python將Office文檔合并轉(zhuǎn)換為PDF的方法。

本文所使用的方法需要用到Spire.Office for Python，PyPI：pip install spire.office。

將Word、Excel和PowerPoint文檔批量分別轉(zhuǎn)換為PDF文檔

我們可以通過判斷文檔的文件后綴名，然后將對應的文檔分別用Document類（Word）、Workbook類（Excel）和Presentation類（PowerPoint）的LoadFromFile方法載入，再分別使用SaveToFile(string: fileName, FileFormat.PDF)方法轉(zhuǎn)換并保存為PDF文檔，從而實現(xiàn)Office文檔到PDF文件的批量轉(zhuǎn)換。以下是詳細操作步驟：

導入所需模塊。
定義要處理的文件夾路徑，獲取指定類型的文件并排序。
創(chuàng)建一個PdfDocument對象。
遍歷文件列表的文件，根據(jù)后綴名判斷文件類型。
根據(jù)文件類型創(chuàng)建Document、Workbook或Presentation對象。
使用LoadFromFile方法載入文檔。
使用SaveToFile方法將文檔轉(zhuǎn)換為PDF并保存。
釋放資源。

代碼示例

from spire.pdf import PdfDocument
from spire.doc import Document
from spire.xls import Workbook
from spire.presentation import Presentation
from spire.doc import FileFormat as wFileFormat
from spire.xls import FileFormat as eFileFormat
from spire.presentation import FileFormat as pFileFormat
import os

# 定義要處理的文件夾路徑
folderPath = "Documents/"
# 獲取所有指定類型的文件并排序
extensions = [".doc", ".docx", ".xls", ".xlsx", ".ppt", ".pptx"]
files = sorted([os.path.join(folderPath, f) for f in os.listdir(folderPath) if f.lower().endswith(tuple(extensions))])

# 創(chuàng)建一個PdfDocument對象
pdf = PdfDocument()

# 遍歷文件列表
for file in files:
    extension = os.path.splitext(file)[1].lower()
    if extension in [".doc", ".docx"]:
        # 創(chuàng)建Document對象
        doc = Document()
        # 載入Word文檔
        doc.LoadFromFile(file)
        # 將Word文檔轉(zhuǎn)換為PDF
        doc.SaveToFile(f"output/Documents/{os.path.basename(file)}.pdf", wFileFormat.PDF)
        doc.Close()
    if extension in [".xls", ".xlsx"]:
        # 創(chuàng)建Workbook對象
        workbook = Workbook()
        # 載入Excel文件
        workbook.LoadFromFile(file)
        # 將Excel文件轉(zhuǎn)換為PDF
        workbook.SaveToFile(f"output/Documents/{os.path.basename(file)}.pdf", eFileFormat.PDF)
        workbook.Dispose()
    if extension in [".ppt", ".pptx"]:
        # 創(chuàng)建Presentation對象
        presentation = Presentation()
        # 載入PowerPoint文件
        presentation.LoadFromFile(file)
        # 將PowerPoint文件轉(zhuǎn)換為PDF
        presentation.SaveToFile(f"output/Documents/{os.path.basename(file)}.pdf", pFileFormat.PDF)
        presentation.Dispose()

# 關(guān)閉PdfDocument對象
pdf.Close()

結(jié)果

將Word、Excel、PowerPoint和PDF文檔合并轉(zhuǎn)換為單個PDF

除了批量分別轉(zhuǎn)換Office文檔外，我們還可以將各種類型的文檔合并轉(zhuǎn)換到同一個PDF文件中。以下是操作步驟：

導入所需模塊。
定義要處理的文件夾路徑，獲取指定類型的文件并排序。
創(chuàng)建一個PdfDocument對象pdf用于儲存最終PDF文檔。
創(chuàng)建一個新的PdfDocument對象temPdf和一個臨時PDF文檔地址用于轉(zhuǎn)換出的臨時PDF文檔。
遍歷文件列表的文件，根據(jù)后綴名判斷文件類型。
根據(jù)文件類型創(chuàng)建Document、Workbook或Presentation對象，并使用LoadFromFile方法載入文檔。
使用SaveToFile方法將文檔轉(zhuǎn)換為PDF并保存到臨時PDF路徑。
使用temPdf.LoadFromFile()方法載入臨時PDF，并使用pdf.AppendPage(temPdf)將其頁面插入到最終PDF中。
處理完成后，使用pdf.SaveToFile()方法保存最終PDF文檔。
清理臨時文件并釋放資源。

代碼示例

from spire.pdf import PdfDocument
from spire.doc import Document
from spire.xls import Workbook
from spire.presentation import Presentation
from spire.doc import FileFormat as wFileFormat
from spire.xls import FileFormat as eFileFormat
from spire.presentation import FileFormat as pFileFormat

import os

# 指定要處理的文件夾路徑
folderPath = 'Documents/'
# 獲取所有指定類型的文件并排序
extensions = ['.doc', '.docx', '.xls', '.xlsx', '.ppt', '.pptx']
files = sorted([os.path.join(folderPath, f) for f in os.listdir(folderPath) if f.lower().endswith(tuple(extensions))])

# 創(chuàng)建一個PdfDocument對象
pdf = PdfDocument()
# 創(chuàng)建一個臨時PDF和一個Stream對象
temPdf = PdfDocument()
temPdfPath = 'temp.pdf'

# 遍歷文件列表
for file in files:
    extension = os.path.splitext(file)[1].lower()

    if extension in ['.doc', '.docx']:
        # 加載Word文檔
        doc = Document()
        doc.LoadFromFile(file)
        # 保存為臨時PDF
        doc.SaveToFile(temPdfPath, wFileFormat.PDF)
        # 載入臨時PDF并將其頁面添加到最終PDF中
        temPdf.LoadFromFile(temPdfPath)
        pdf.AppendPage(temPdf)
        doc.Close()  # 顯式關(guān)閉文檔

    elif extension in ['.xls', '.xlsx']:
        # 加載Excel工作簿
        workbook = Workbook()
        workbook.LoadFromFile(file)
        # 保存為臨時PDF
        workbook.SaveToFile(temPdfPath, eFileFormat.PDF)
        # 載入臨時PDF并將其頁面添加到最終PDF中
        temPdf.LoadFromFile(temPdfPath)
        pdf.AppendPage(temPdf)
        workbook.Dispose()  # 顯式關(guān)閉工作簿

    elif extension in ['.ppt', '.pptx']:
        # 加載PowerPoint演示文稿
        presentation = Presentation()
        presentation.LoadFromFile(file)
        # 保存為臨時PDF
        presentation.SaveToFile(temPdfPath, pFileFormat.PDF)
        # 載入臨時PDF并將其頁面添加到最終PDF中
        temPdf.LoadFromFile(temPdfPath)
        pdf.AppendPage(temPdf)
        presentation.Dispose()  # 顯式關(guān)閉演示文稿

    elif extension == '.pdf':
        # 如果已經(jīng)是PDF，則直接載入并將其頁面添加到最終PDF中
        temPdf.LoadFromFile(file)
        pdf.AppendPage(temPdf)

# 保存最終PDF
outputPath = "output/CombinedPDF.pdf"
pdf.SaveToFile(outputPath)

# 清理臨時文件
if os.path.exists('temp.pdf'):
    os.remove('temp.pdf')

# 釋放資源
pdf.Close()
temPdf.Close()

結(jié)果