快捷導(dǎo)航

一文學(xué)會使用OpenCV構(gòu)建文檔掃描儀

更新時間：2022年11月03日 10:31:06 作者：woshicver

本文將使用 OpenCV,創(chuàng)建一個簡單的文檔掃描儀,就像常用的攝像頭掃描儀應(yīng)用程序一樣,這篇文章主要給大家介紹了關(guān)于使用OpenCV構(gòu)建文檔掃描儀的相關(guān)資料,需要的朋友可以參考下

介紹

在本文中，我們將使用 OpenCV 庫來開發(fā) Python 文檔掃描器。

OpenCV 的簡要概述： OpenCV 是一個開源庫，用于各種計算機語言的圖像處理，包括 Python、C++ 等。它可用于檢測照片（例如使用人臉檢測系統(tǒng)的人臉）。

要了解更多關(guān)于 OpenCV 的信息，你可以在此處參考他們的官方文檔：https://pypi.org/project/opencv-python/

我們的軟件應(yīng)該能夠正確對齊文檔，檢測捕獲圖像的邊界，提升文檔的質(zhì)量，并最終提供更好的圖像作為輸出。本質(zhì)上，我們將輸入一個文檔，即用相機拍攝的未經(jīng)編輯的圖像。OpenCV 將處理該圖像。

我們的基本工作流程是：

形態(tài)學(xué)運算
邊緣和輪廓檢測
識別角點
轉(zhuǎn)變視角

執(zhí)行形態(tài)學(xué)操作

形態(tài)學(xué)：是一系列圖像處理程序和算法，根據(jù)圖像的高度和寬度來處理圖片。最重要的是它們的大小，而不是它們的相對像素值排序。

kernel = np.ones((5,5),np.uint8)
img = cv2.morphologyEx(img, cv2.MORPH_CLOSE, kernel, iterations= 3)

我們可以使用morphologyEx() 函數(shù)執(zhí)行操作。Morphology 中的“close”操作與Erosion相同，在此之前是Dilation過程。

我們將創(chuàng)建一個空白文檔，因為在處理邊緣時圖片里的內(nèi)容會妨礙你，我們不想冒險刪除它們。

從捕獲的圖像中刪除背景

照片中非我們拍攝對象的部分也必須刪除。與裁剪圖像類似，我們將只專注于維護圖像所需的部分?？梢允褂肎rabCut庫。

GrabCut 在接收到輸入圖片及其邊界后，剔除邊界外的所有元素。

為了利用 GrabCut 來識別背景，我們還可以為用戶提供手動設(shè)置文檔邊框的選項。

不過，目前，GrabCut 將能夠通過從圖像的每個角落取 20 個像素作為背景來自動識別前景。

mask = np.zeros(img.shape[:2],np.uint8)
bgdModel = np.zeros((1,65),np.float64)
fgdModel = np.zeros((1,65),np.float64)
rect = (20,20,img.shape[1]-20,img.shape[0]-20)
cv2.grabCut(img,mask,rect,bgdModel,fgdModel,5,cv2.GC_INIT_WITH_RECT)
mask2 = np.where((mask==2)|(mask==0),0,1).astype('uint8')
img = img*mask2[:,:,np.newaxis]

這里的“rect”變量表示我們愿意分離的邊界。你可能會遇到部分背景進入線條內(nèi)部的情況，但這是可以接受的。我們的目標是對象的任何部分都不應(yīng)超出邊界。

邊緣和輪廓檢測

我們目前擁有一份與原始文件大小相同的空白文件。同樣，我們將進行邊緣檢測。我們將為此使用Canny函數(shù)。

為了清理文檔的噪聲，我們還使用了高斯模糊。

（注意：Canny 函數(shù)僅適用于灰度圖像，因此如果圖像尚不存在，則將圖像轉(zhuǎn)換為灰度）。

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (11, 11), 0)
# Edge Detection.
canny = cv2.Canny(gray, 0, 200)
canny = cv2.dilate(canny, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5)))

我們在最后一行放大了圖像。

在此之后，我們可以繼續(xù)進行輪廓檢測：

我們只會記錄最大的輪廓并在一個新的空白文檔上進行交互。

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
gray = cv2.GaussianBlur(gray, (11, 11), 0)
# Edge Detection.
canny = cv2.Canny(gray, 0, 200)
canny = cv2.dilate(canny, cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (5, 5)))

識別角點

我們將使用已經(jīng)注意到的四個角對齊紙張。使用“ Douglas-Peucker ”方法和approxPolyDp()函數(shù)。

con = np.zeros_like(img)
# Loop over the contours.
for c in page:
  # Approximate the contour.
  epsilon = 0.02 * cv2.arcLength(c, True)
  corners = cv2.approxPolyDP(c, epsilon, True)
  # If our approximated contour has four points
  if len(corners) == 4:
      break
cv2.drawContours(con, c, -1, (0, 255, 255), 3)
cv2.drawContours(con, corners, -1, (0, 255, 0), 10)
corners = sorted(np.concatenate(corners).tolist())
for index, c in enumerate(corners):
  character = chr(65 + index)
  cv2.putText(con, character, tuple(c), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 1, cv2.LINE_AA)

標準化四點定位

def order_points(pts):
    rect = np.zeros((4, 2), dtype='float32')
    pts = np.array(pts)
    s = pts.sum(axis=1)
    rect[0] = pts[np.argmin(s)]
    rect[2] = pts[np.argmax(s)]
    diff = np.diff(pts, axis=1)
    rect[1] = pts[np.argmin(diff)]
    rect[3] = pts[np.argmax(diff)]
    return rect.astype('int').tolist()

尋找目的地坐標：

最后一組坐標可以改變圖像的視角。如果從通常的視角以一定角度拍攝，這將很有幫助。

(tl, tr, br, bl) = pts
# Finding the maximum width.
widthA = np.sqrt(((br[0] - bl[0]) ** 2) + ((br[1] - bl[1]) ** 2))
widthB = np.sqrt(((tr[0] - tl[0]) ** 2) + ((tr[1] - tl[1]) ** 2))
maxWidth = max(int(widthA), int(widthB))
# Finding the maximum height.    
heightA = np.sqrt(((tr[0] - br[0]) ** 2) + ((tr[1] - br[1]) ** 2))
heightB = np.sqrt(((tl[0] - bl[0]) ** 2) + ((tl[1] - bl[1]) ** 2))
maxHeight = max(int(heightA), int(heightB))
# Final destination co-ordinates.
destination_corners = [[0, 0], [maxWidth, 0], [maxWidth, maxHeight], [0, maxHeight]]

透視變換

源照片的坐標現(xiàn)在必須與我們事先發(fā)現(xiàn)的目標坐標對齊。完成此階段后，圖像看起來就像是從紙張的正上方拍攝的一樣。

# Getting the homography.
M = cv2.getPerspectiveTransform(np.float32(corners), np.float32(destination_corners))
final = cv2.warpPerspective(orig_img, M, (destination_corners[2][0], destination_corners[2][1]), flags=cv2.INTER_LINEAR)