快捷導(dǎo)航

利用python程序生成word和PDF文檔的方法

更新時(shí)間：2017年02月14日 08:35:33 作者：會(huì)心一擊

這篇文章主要給大家介紹了利用python程序生成word和PDF文檔的方法，文中給出了詳細(xì)的介紹和示例代碼，相信對(duì)大家具有一定的參考價(jià)值，有需要的朋友們下面來一起看看吧。

一、程序?qū)С鰓ord文檔的方法

將web/html內(nèi)容導(dǎo)出為world文檔，再java中有很多解決方案，比如使用Jacob、Apache POI、Java2Word、iText等各種方式，以及使用freemarker這樣的模板引擎這樣的方式。php中也有一些相應(yīng)的方法，但在python中將web/html內(nèi)容生成world文檔的方法是很少的。其中最不好解決的就是如何將使用js代碼異步獲取填充的數(shù)據(jù)，圖片導(dǎo)出到word文檔中。

1. unoconv

功能：

1.支持將本地html文檔轉(zhuǎn)換為docx格式的文檔，所以需要先將網(wǎng)頁中的html文件保存到本地，再調(diào)用unoconv進(jìn)行轉(zhuǎn)換。轉(zhuǎn)換效果也不錯(cuò)，使用方法非常簡(jiǎn)單。

\# 安裝
sudo apt-get install unoconv
\# 使用
unoconv -f pdf *.odt
unoconv -f doc *.odt
unoconv -f html *.odt

缺點(diǎn)：

1.只能對(duì)靜態(tài)html進(jìn)行轉(zhuǎn)換，對(duì)于頁面中有使用ajax異步獲取數(shù)據(jù)的地方也不能轉(zhuǎn)換（主要是要保證從web頁面保存下來的html文件中有數(shù)據(jù)）。

2.只能對(duì)html進(jìn)行轉(zhuǎn)換，如果頁面中有使用echarts,highcharts等js代碼生成的圖片，是無法將這些圖片轉(zhuǎn)換到word文檔中；

3.生成的word文檔內(nèi)容格式不容易控制。

2. python-docx

功能：

1.python-docx是一個(gè)可以讀寫word文檔的python庫(kù)。

使用方法：

1.獲取網(wǎng)頁中的數(shù)據(jù)，使用python手動(dòng)排版添加到word文檔中。

from docx import Document
from docx.shared import Inches
document = Document()
document.add_heading('Document Title', 0)
p = document.add_paragraph('A plain paragraph having some ')
p.add_run('bold').bold = True
p.add_run(' and some ')
p.add_run('italic.').italic = True
document.add_heading('Heading, level 1', level=1)
document.add_paragraph('Intense quote', style='IntenseQuote')
document.add_paragraph(
 'first item in unordered list', style='ListBullet'
)
document.add_paragraph(
 'first item in ordered list', style='ListNumber'
)
document.add_picture('monty-truth.png', width=Inches(1.25))
table = document.add_table(rows=1, cols=3)
hdr_cells = table.rows[0].cells
hdr_cells[0].text = 'Qty'
hdr_cells[1].text = 'Id'
hdr_cells[2].text = 'Desc'
for item in recordset:
 row_cells = table.add_row().cells
 row_cells[0].text = str(item.qty)
 row_cells[1].text = str(item.id)
 row_cells[2].text = item.desc
document.add_page_break()
document.save('demo.docx')

from docx import Document
from docx.shared import Inches
document = Document()
for row in range(9):
 t = document.add_table(rows=1,cols=1,style = 'Table Grid')
 t.autofit = False #很重要！
 w = float(row) / 2.0
 t.columns[0].width = Inches(w)
document.save('table-step.docx')

缺點(diǎn)：

功能非常弱。有很多限制比如不支持模板等，只能生成簡(jiǎn)單格式的word文檔。

二、程序?qū)С鯬DF文檔方法

1.pdfkit

功能：

1.wkhtmltopdf主要用于HTML生成PDF。

2.pdfkit是基于wkhtmltopdf的python封裝，支持URL，本地文件，文本內(nèi)容到PDF的轉(zhuǎn)換，其最終還是調(diào)用wkhtmltopdf命令。是目前接觸到的python生成pdf效果較好的。

優(yōu)點(diǎn)：

1.wkhtmltopdf：利用webkit內(nèi)核將HTML轉(zhuǎn)為PDF

webkit是一個(gè)高效、開源的瀏覽器內(nèi)核，包括Chrome和Safari在內(nèi)的瀏覽器都使用了這個(gè)內(nèi)核。Chrome打印當(dāng)前網(wǎng)頁的功能，其中有一個(gè)選項(xiàng)就是直接“保存為 PDF”。

2.wkhtmltopdf使用webkit內(nèi)核的PDF渲染引擎來將HTML頁面轉(zhuǎn)換為PDF。高保真，轉(zhuǎn)換質(zhì)量很好，且使用非常簡(jiǎn)單。
使用方法：

\# 安裝
pip install pdfkit
\# 使用
import pdfkit
pdfkit.from_url('http://google.com', 'out.pdf')
pdfkit.from_file('test.html', 'out.pdf')
pdfkit.from_string('Hello!', 'out.pdf')

缺點(diǎn)：

1.對(duì)使用echarts，highcharts這樣的js代碼生成的圖標(biāo)無法轉(zhuǎn)換為pdf（因?yàn)樗墓δ苤饕菍tml轉(zhuǎn)換為pdf,而不是將js轉(zhuǎn)換為pdf）。對(duì)于純靜態(tài)頁面的轉(zhuǎn)換效果還是不錯(cuò)的。

2.其他

其他生成pdf的插件還有：weasyprint，reportlab，PyPDF2等，經(jīng)簡(jiǎn)單試驗(yàn)都不如pdfkit效果好，且有些用法復(fù)雜。

總結(jié)

以上就是這篇文章的全部?jī)?nèi)容了，希望本文的內(nèi)容對(duì)大家的學(xué)習(xí)或者工作能帶來一定的幫助，如果有疑問大家可以留言交流。

您可能感興趣的文章: