Python進行PDF文件拆分的示例詳解

更新時間：2025年02月28日 15:43:18 作者：nuclear2011

在日常生活中,我們常常會遇到大型的PDF文件,難以發(fā)送,將PDF拆分成多個小文件是一個實用的解決方案,下面我們就來看看如何使用Python實現(xiàn)PDF文件拆分吧

在日常生活中，我們常常會遇到大型的PDF文件，這些文件可能難以發(fā)送、管理和查閱。將PDF拆分成多個小文件是一個實用的解決方案，可以為我們帶來多重好處。首先，拆分PDF可以提高文件的可讀性，使用戶更容易找到所需信息。此外，拆分后的文件更便于分享和協(xié)作，特別適用于團隊項目，讓不同成員能夠同時處理各自負責的部分。同時，這種方法還能有效保護隱私，允許將敏感信息單獨處理，從而降低數(shù)據(jù)泄露的風險。

這篇博客將探討如何使用Python實現(xiàn)PDF文件拆分，主要涵蓋以下幾個方面的內(nèi)容：

將PDF按頁數(shù)拆分
- 將PDF的每一頁拆分為單獨的文件
- 將PDF按指定頁數(shù)拆分
將PDF按頁碼范圍拆分
將PDF按指定內(nèi)容拆分
將PDF的一頁拆分為多頁

使用工具

要在Python中實現(xiàn)拆分PDF文件，可以使用Spire.PDF for Python庫。該庫主要用于在Python應用程序中生成和處理PDF文檔，也支持將PDF轉(zhuǎn)換為其他格式，例如圖片，Word和Excel等。

安裝 Spire.PDF

在開始之前，需要先安裝 Spire.PDF 庫。你可以在終端中運行以下命令進行安裝：

pip install spire.pdf

將PDF按頁數(shù)拆分

在按頁數(shù)拆分PDF文件時，你可以將PDF文檔的每一頁拆分為一個單獨的文件，也可以將PDF文檔按指定頁數(shù)拆分。下面將對這兩種方式逐一進行介紹。

將PDF的每一頁拆分為單獨的文件

Spire.PDF for Python提供了PdfDocument.Split()方法，支持將PDF文檔按頁拆分，生成的每個文件僅包含原始文檔中的一頁。具體實現(xiàn)步驟如下：

創(chuàng)建PdfDocument對象。
使用PdfDocument.LoadFromFile()方法打開PDF文檔。
使用PdfDocument.Split()方法將PDF文檔的每一頁拆分為單獨的PDF文檔。

實現(xiàn)代碼：

from spire.pdf.common import *
from spire.pdf import *
 
# 創(chuàng)建PdfDocument對象
pdf = PdfDocument()
# 加載PDF文件
pdf.LoadFromFile("心理健康.pdf")
 
# 將PDF文件拆分為多個PDF文件，每個文件僅包含原始PDF中的一頁
pdf.Split("拆分PDF/第{0}頁.pdf", 1)
 
# 關閉PdfDocument對象
pdf.Close()

將PDF按指定頁數(shù)拆分

將 PDF 文件按指定頁數(shù)拆分的方法是通過創(chuàng)建新的 PDF 文檔并將指定數(shù)量的頁面插入其中來實現(xiàn)。具體實現(xiàn)步驟如下：

創(chuàng)建PdfDocument對象。

使用PdfDocument.LoadFromFile()方法打開PDF文檔。

獲取PDF文檔的總頁數(shù)。

使用循環(huán)按指定頁數(shù)拆分PDF：

設置起始頁和結束頁。
創(chuàng)建新的PdfDocument對象。
使用PdfDocument.InsertPageRange()方法將當前頁碼范圍內(nèi)的頁面插入到新PDF文檔中。
使用PdfDocument.SaveToFile()方法保存生成的PDF文檔。

實現(xiàn)代碼：

from spire.pdf.common import *
from spire.pdf import *
 
# 將PDF按指定頁數(shù)拆分的方法
def split_pdf_by_page_count(input_file, page_count):
    # 創(chuàng)建PdfDocument對象
    pdf = PdfDocument()
    # 加載PDF文件
    pdf.LoadFromFile(input_file)
 
    # 計算總頁數(shù)
    total_pages = pdf.Pages.Count
 
    # 按指定頁數(shù)拆分PDF
    for i in range(0, total_pages, page_count):
        # 創(chuàng)建新的PdfDocument對象
        new_pdf = PdfDocument()
        
        # 計算當前要插入的頁碼范圍
        start_page = i
        end_page = min(i + page_count - 1, total_pages - 1)  # 確保不超過總頁數(shù)
        
        # 將當前頁碼范圍的頁面插入到新PDF中
        new_pdf.InsertPageRange(pdf, start_page, end_page)
 
        # 保存生成的文件
        new_pdf.SaveToFile("拆分PDF/" + f"{start_page + 1}-{end_page + 1}頁.pdf")
        # 關閉新創(chuàng)建的PdfDocument對象
        new_pdf.Close()
 
    # 關閉原始PdfDocument對象
    pdf.Close()
 
# 調(diào)用split_pdf_by_page_count方法將PDF文件按照每3頁拆分
split_pdf_by_page_count("心理健康.pdf", 3)

根據(jù)頁碼范圍拆分PDF

除了按頁數(shù)拆分 PDF 文件外，你還可以選擇將指定頁碼范圍內(nèi)的頁面提取為單獨的文件。該方法的實現(xiàn)步驟與按指定頁數(shù)拆分類似，此處不再贅述。

實現(xiàn)代碼：

from spire.pdf.common import *
from spire.pdf import *
 
# 提取PDF中指定頁碼范圍內(nèi)的頁面并保存為新文件的方法
def split_pdf_by_page_range(input_file, start_page, end_page, output_file):
    # 創(chuàng)建PdfDocument對象并加載PDF文件
    pdf = PdfDocument()
    pdf.LoadFromFile(input_file)
 
    # 創(chuàng)建新的PdfDocument對象
    new_pdf = PdfDocument()
 
    # 將指定頁碼范圍內(nèi)的頁面插入到新PDF文檔中
    new_pdf.InsertPageRange(pdf, start_page, end_page)
 
    # 保存生成的文件
    new_pdf.SaveToFile(output_file)
 
    # 關閉PdfDocument對象
    pdf.Close()
    new_pdf.Close()
 
# 調(diào)用split_pdf_by_page_range方法，從PDF文件中提取第1-3頁并保存為新文件
split_pdf_by_page_range("心理健康.pdf", 0, 2, "拆分PDF/指定頁碼范圍.pdf")

根據(jù)指定內(nèi)容拆分PDF

在某些情況下，你可能需要根據(jù)特定關鍵字或短語拆分 PDF。這種方法可以提取包含特定內(nèi)容的頁面，便于整理相關信息。以下代碼會查找 PDF 每一頁上的文本，如果找到指定關鍵字，則將該頁面添加到新 PDF 中：

from spire.pdf.common import *
from spire.pdf import *
 
# 提取包含特定關鍵字的頁面到新PDF中的方法 
def extract_pages_with_keyword(pdf_path, output_path, keyword):
    # 創(chuàng)建PdfDocument對象
    pdf = PdfDocument()
    # 加載PDF文件
    pdf.LoadFromFile(pdf_path)
 
    # 創(chuàng)建一個新的PdfDocument對象
    new_pdf = PdfDocument()
 
    # 遍歷文檔中的每一頁
    for i in range(pdf.Pages.Count):
        page = pdf.Pages[i]
        # 創(chuàng)建PdfTextFinder實例
        finder = PdfTextFinder(page)
        # 定義文本查找參數(shù)
        finder.Options.Parameter = TextFindParameter.WholeWord
        # 查找特定文本
        results = finder.Find(keyword)
 
        # 如果找到了關鍵字
        if results:
            # 將當前頁面添加到新文檔中
            new_pdf.InsertPage(pdf, i)
            
    # 保存提取的結果文件
    new_pdf.SaveToFile(output_path)
 
    # 關閉PdfDocument對象
    new_pdf.Close()
    pdf.Close()
 
# 調(diào)用extract_pages_with_keyword方法將PDF文件中包含特定關鍵字的頁面保存為新文件
extract_pages_with_keyword("心理健康.pdf", "拆分PDF/含關鍵字頁面.pdf", "問題")

將PDF的一頁拆分為多頁

在某些情況下，你可能需要將 PDF 文檔的某一頁拆分為兩頁或多頁。在拆分時，你可以選擇將該頁面橫向或豎向拆分。橫向拆分時，拆分后的文檔的每個頁面的寬度等于原始寬度的1/拆分總頁數(shù)；豎向拆分時，拆分后的文檔的每個頁面的高度等于原始高度的1/拆分總頁數(shù)。

以下代碼展示了如何將PDF文檔的指定頁面豎向或橫向拆分為兩頁：

from spire.pdf.common import *
from spire.pdf import *
 
# 將指定PDF頁面橫向或豎向拆分為多頁的方法
def split_specific_pdf_page(pdf_path, output_folder, page_index, num_pages, split_direction='vertical'):
    # 創(chuàng)建PdfDocument對象
    pdf = PdfDocument()
    # 加載PDF文件
    pdf.LoadFromFile(pdf_path)
 
    # 獲取指定頁面
    if page_index < 0 or page_index >= pdf.Pages.Count:
        print("錯誤：指定的頁面索引超出范圍。")
        return
    
    page = pdf.Pages[page_index]
 
    # 創(chuàng)建一個新的PdfDocument對象
    newPdf = PdfDocument()
    # 移除所有頁面邊距
    newPdf.PageSettings.Margins.All = 0.0
 
    if split_direction == 'vertical':
        newPdf.PageSettings.Width = page.Size.Width
        newPdf.PageSettings.Height = page.Size.Height / float(num_pages)
    elif split_direction == 'horizontal':
        newPdf.PageSettings.Height = page.Size.Height
        newPdf.PageSettings.Width = page.Size.Width / float(num_pages)
    else:
        print("錯誤：無效的拆分方向，請選擇'vertical'或'horizontal'。")
        return
    
    # 向新PDF添加一頁
    newPage = newPdf.Pages.Add()
 
    # 設置布局格式為自動分頁
    format = PdfTextLayout()
    format.Break = PdfLayoutBreakType.FitPage
    format.Layout = PdfLayoutType.Paginate
 
    # 繪制內(nèi)容
    if split_direction == 'vertical':
        page.CreateTemplate().Draw(newPage, PointF(0.0, 0.0), format)
    elif split_direction == 'horizontal':
        page.CreateTemplate().Draw(newPage, PointF(0.0, 0.0), format)
 
    # 保存生成的文件
    newPdf.SaveToFile(f"{output_folder}/拆分第{page_index + 1}頁.pdf")
 
    # 關閉PdfDocument對象
    newPdf.Close()
    pdf.Close()
 
# 調(diào)用split_specific_pdf_page方法將PDF文件第1頁豎向拆分為2頁，0為當前頁面的索引，2為拆分總頁數(shù)
# split_specific_pdf_page("心理健康.pdf", "拆分PDF", 0, 2, 'vertical')  
# 或者將PDF文件第1頁橫向拆分為2頁
split_specific_pdf_page("心理健康.pdf", "拆分PDF", 0, 2, 'horizontal')

到此這篇關于Python進行PDF文件拆分的示例詳解的文章就介紹到這了,更多相關Python PDF拆分內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: