人臉識(shí)別實(shí)戰(zhàn)之Opencv+SVM實(shí)現(xiàn)人臉識(shí)別

更新時(shí)間：2021年12月16日 16:59:33 作者：AI浩

這篇文章主要介紹了通過(guò)Opencv+SVM實(shí)現(xiàn)人臉識(shí)別功能，文中的示例代碼介紹詳細(xì)，對(duì)于我們學(xué)習(xí)人臉識(shí)別和OpenCV都有一定的幫助，感興趣的小伙伴可以學(xué)習(xí)一下

前言

在本文中，您將學(xué)習(xí)如何使用 OpenCV 進(jìn)行人臉識(shí)別。文章分三部分介紹：

第一，將首先執(zhí)行人臉檢測(cè)，使用深度學(xué)習(xí)從每個(gè)人臉中提取人臉量化為128位的向量。

第二，在嵌入基礎(chǔ)上使用支持向量機(jī)（SVM）訓(xùn)練人臉識(shí)別模型。

第三，最后使用 OpenCV 識(shí)別圖像和視頻流中的人臉。

項(xiàng)目結(jié)構(gòu)

編碼

新建face_embeddings.py腳本，寫(xiě)入如下代碼：

# import the necessary packages
import numpy as np
import pickle
import cv2
import os
import os

導(dǎo)入需要的包。然后定義幾個(gè)函數(shù)：

def list_images(basePath, contains=None):
    # return the set of files that are valid
    return list_files(basePath, validExts=image_types, contains=contains)


def list_files(basePath, validExts=None, contains=None):
    # loop over the directory structure
    for (rootDir, dirNames, filenames) in os.walk(basePath):
        # loop over the filenames in the current directory
        for filename in filenames:
            # if the contains string is not none and the filename does not contain
            # the supplied string, then ignore the file
            if contains is not None and filename.find(contains) == -1:
                continue
            # determine the file extension of the current file
            ext = filename[filename.rfind("."):].lower()
            # check to see if the file is an image and should be processed
            if validExts is None or ext.endswith(validExts):
                # construct the path to the image and yield it
                imagePath = os.path.join(rootDir, filename)
                yield imagePath

def resize(image, width=None, height=None, inter=cv2.INTER_AREA):
    dim = None
    (h, w) = image.shape[:2]
    # 如果高和寬為None則直接返回
    if width is None and height is None:
        return image
    # 檢查寬是否是None
    if width is None:
        # 計(jì)算高度的比例并并按照比例計(jì)算寬度
        r = height / float(h)
        dim = (int(w * r), height)
    # 高為None
    else:
        # 計(jì)算寬度比例，并計(jì)算高度
        r = width / float(w)
        dim = (width, int(h * r))
    resized = cv2.resize(image, dim, interpolation=inter)
    # return the resized image
    return resized

list_images函數(shù)，讀取數(shù)據(jù)集文件夾下面的圖片。

resize函數(shù)，等比例resize圖片。接下來(lái)定義一些變量：

dataset_path='dataset'
embeddings_path='output/embeddings.pickle'
detector_path='face_dete_model'
embedding_model='nn4.small2.v1.t7'
confidence_low=0.5

dataset_path：數(shù)據(jù)集路徑

embeddings_path：輸出編碼文件的路徑

detector_path：人臉檢測(cè)模型的路徑

embedding_model：編碼模型

confidence_low：最低的置信度。

接下來(lái)就是代碼的最重要的部分：

print("loading face detector...")
protoPath = os.path.sep.join([detector_path, "deploy.proto.txt"])
modelPath = os.path.sep.join([detector_path,"res10_300x300_ssd_iter_140000_fp16.caffemodel"])
detector = cv2.dnn.readNetFromCaffe(protoPath, modelPath)
# 加載序列化的人臉編碼模型
print("loading face recognizer...")
embedder = cv2.dnn.readNetFromTorch(embedding_model)
# 獲取數(shù)據(jù)集中輸入圖像的路徑
print("quantifying faces...")
imagePaths = list(list_images(dataset_path))
# 初始化我們提取的面部編碼列表和相應(yīng)的人名
knownEmbeddings = []
knownNames = []
# 初始化處理的人臉總數(shù)
total = 0
# loop over the image paths
for (i, imagePath) in enumerate(imagePaths):
    # extract the person name from the image path
    print("processing image {}/{}".format(i + 1,len(imagePaths)))
    name = imagePath.split(os.path.sep)[-2]
    # 加載圖像，將其調(diào)整為寬度為 600 像素（同時(shí)保持縱橫比），然后抓取圖像尺寸
    image = cv2.imread(imagePath)
    image = resize(image, width=600)
    (h, w) = image.shape[:2]
    # 從圖像構(gòu)建一個(gè) blob
    imageBlob = cv2.dnn.blobFromImage(
        cv2.resize(image, (300, 300)), 1.0, (300, 300),
        (104.0, 177.0, 123.0), swapRB=False, crop=False)
    # 使用 OpenCV 的基于深度學(xué)習(xí)的人臉檢測(cè)器來(lái)定位輸入圖像中的人臉
    detector.setInput(imageBlob)
    detections = detector.forward()
    # ensure at least one face was found
    if len(detections) > 0:
        # 假設(shè)每個(gè)圖像只有一張臉，所以找到概率最大的邊界框
        i = np.argmax(detections[0, 0, :, 2])
        confidence = detections[0, 0, i, 2]
        # 確保最大概率的檢測(cè)也意味著我們的最小概率測(cè)試（從而幫助過(guò)濾掉弱檢測(cè)）
        if confidence > confidence_low:
            # 計(jì)算人臉邊界框的 (x, y) 坐標(biāo)
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")
            # 提取人臉ROI并抓取ROI維度
            face = image[startY:endY, startX:endX]
            (fH, fW) = face.shape[:2]
            # 確保人臉寬度和高度足夠大
            if fW < 20 or fH < 20:
                continue
            # 為人臉 ROI 構(gòu)造一個(gè) blob，然后將 blob 通過(guò)我們的人臉嵌入模型來(lái)獲得人臉的 128-d 量化
            faceBlob = cv2.dnn.blobFromImage(face, 1.0 / 255,
                                             (96, 96), (0, 0, 0), swapRB=True, crop=False)
            embedder.setInput(faceBlob)
            vec = embedder.forward()
            # 將人名+對(duì)應(yīng)的人臉嵌入添加到各自的列表中
            knownNames.append(name)
            knownEmbeddings.append(vec.flatten())
            total += 1
# 保存編碼文件
print("serializing {} encodings...".format(total))
data = {"embeddings": knownEmbeddings, "names": knownNames}
f = open(embeddings_path, "wb")
f.write(pickle.dumps(data))
f.close()

加載人臉檢測(cè)器和編碼器：

檢測(cè)器：使用基于Caffe的DL人臉檢測(cè)器來(lái)定位圖像中的人臉。

編碼器：模型基于Torch，負(fù)責(zé)通過(guò)深度學(xué)習(xí)特征提取來(lái)提取人臉編碼。

接下來(lái)，讓我們抓取圖像路徑并執(zhí)行初始化。

遍歷 imagePaths。從路徑中提取人名。

構(gòu)造了一個(gè) blob。

然后，通過(guò)將 imageBlob 通過(guò)檢測(cè)器網(wǎng)絡(luò)來(lái)檢測(cè)圖像中的人臉。

檢測(cè)列表包含定位圖像中人臉的概率和坐標(biāo)。

假設(shè)我們至少有一個(gè)檢測(cè)，將進(jìn)入 if 語(yǔ)句的主體。

假設(shè)圖像中只有一張臉，因此提取具有最高置信度的檢測(cè)并檢查以確保置信度滿足用于過(guò)濾弱檢測(cè)的最小概率閾值。

假設(shè)已經(jīng)達(dá)到了這個(gè)閾值，提取面部 ROI 并抓取/檢查尺寸以確保面部 ROI 足夠大。

然后，我們將利用編碼器并提取人臉編碼。

繼續(xù)構(gòu)建另一個(gè) blob。

隨后，將 faceBlob 通過(guò)編碼器。這會(huì)生成一個(gè) 128 維向量 (vec) 來(lái)描述面部。

然后我們簡(jiǎn)單地將名稱和嵌入 vec 分別添加到 knownNames 和 knownEmbeddings 中。

繼續(xù)循環(huán)圖像、檢測(cè)人臉并為數(shù)據(jù)集中的每個(gè)圖像提取人臉編碼的過(guò)程。

循環(huán)結(jié)束后剩下的就是將數(shù)據(jù)轉(zhuǎn)儲(chǔ)到磁盤(pán)。

運(yùn)行結(jié)果：

loading face detector...

loading face recognizer...

quantifying faces...

processing image 1/19

processing image 2/19

processing image 3/19

processing image 4/19

processing image 5/19

processing image 6/19

processing image 7/19

processing image 8/19

processing image 9/19

processing image 10/19

processing image 11/19

processing image 12/19

processing image 13/19

processing image 14/19

processing image 15/19

processing image 16/19

processing image 17/19

processing image 18/19

processing image 19/19

serializing 19 encodings...

Process finished with exit code 0

訓(xùn)練人臉識(shí)別模型

已經(jīng)為每張臉提取了 128 維編碼——但是我們?nèi)绾胃鶕?jù)這些嵌入來(lái)識(shí)別一個(gè)人呢？

答案是我們需要在嵌入之上訓(xùn)練一個(gè)“標(biāo)準(zhǔn)”機(jī)器學(xué)習(xí)模型（例如 SVM、k-NN 分類器、隨機(jī)森林等）。

今天我們使用SVM實(shí)現(xiàn)

打開(kāi) train_face.py 文件并插入以下代碼：

from sklearn.preprocessing import LabelEncoder
from sklearn.svm import SVC
import pickle

embeddings_path='output/embeddings.pickle'
recognizer_path='output/recognizer.pickle'
lable_path='output/le.pickle'
# 加載編碼模型
print("[INFO] loading face embeddings...")
data = pickle.loads(open(embeddings_path, "rb").read())

# 給label編碼
print("[INFO] encoding labels...")
le = LabelEncoder()
labels = le.fit_transform(data["names"])
# 訓(xùn)練用于接受人臉 128-d 嵌入的模型，然后產(chǎn)生實(shí)際的人臉識(shí)別
recognizer = SVC(C=1.0, kernel="linear", probability=True)
recognizer.fit(data["embeddings"], labels)
# 保存模型
f = open(recognizer_path, "wb")
f.write(pickle.dumps(recognizer))
f.close()
# 保存lable
f = open(lable_path, "wb")
f.write(pickle.dumps(le))
f.close()

導(dǎo)入包和模塊。我們將使用 scikit-learn 的支持向量機(jī) (SVM) 實(shí)現(xiàn)，這是一種常見(jiàn)的機(jī)器學(xué)習(xí)模型。

定義變量。

embeddings_path：序列化編碼。
recognizer_path：這將是我們識(shí)別人臉的輸出模型。它基于 SVM。
lable_path：標(biāo)簽編碼器輸出文件路徑

加載編碼。

然后初始化 scikit-learn LabelEncoder 并編碼名稱標(biāo)簽。

訓(xùn)練模型。本文使用的是線性支持向量機(jī) (SVM)，但如果您愿意，您可以嘗試使用其他機(jī)器學(xué)習(xí)模型進(jìn)行試驗(yàn)。

訓(xùn)練模型后，我們將模型和標(biāo)簽編碼器保存到電腦上。

運(yùn)行train_face.py 腳本。

識(shí)別圖像中的人臉

新建腳本文件recognize_face.py，插入一下代碼：

import numpy as np
import pickle
import cv2
import os

導(dǎo)入包，然后我們需要新增一個(gè)resize方法。

def resize(image, width=None, height=None, inter=cv2.INTER_AREA):
    dim = None
    (h, w) = image.shape[:2]
    # 如果高和寬為None則直接返回
    if width is None and height is None:
        return image
    # 檢查寬是否是None
    if width is None:
        # 計(jì)算高度的比例并并按照比例計(jì)算寬度
        r = height / float(h)
        dim = (int(w * r), height)
    # 高為None
    else:
        # 計(jì)算寬度比例，并計(jì)算高度
        r = width / float(w)
        dim = (width, int(h * r))
    resized = cv2.resize(image, dim, interpolation=inter)
    # return the resized image
    return resized

等比例resize圖像,定義變量：

image_path = '11.jpg'
detector_path = 'face_dete_model'
embedding_path = 'nn4.small2.v1.t7'
recognizer_path = 'output/recognizer.pickle'
label_path = 'output/le.pickle'
confidence_low = 0.5

這六個(gè)變量的含義如下：

image_path ：輸入圖像的路徑。
detector_path：OpenCV 深度學(xué)習(xí)人臉檢測(cè)器的路徑。使用這個(gè)模型來(lái)檢測(cè)人臉 ROI 在圖像中的位置。
embedding_path : OpenCV 深度學(xué)習(xí)人臉編碼模型的路徑。我們將使用這個(gè)模型從人臉 ROI 中提取 128 維人臉嵌入——然后將把數(shù)據(jù)輸入到識(shí)別器中。
recognizer_path ：識(shí)別器模型的路徑。
label_path : 標(biāo)簽編碼器的路徑。
confidence_low：過(guò)濾弱人臉檢測(cè)的可選閾值。

接下來(lái)是代碼的主體部分：

# 加載序列化人臉檢測(cè)器
print("[INFO] loading face detector...")
protoPath = os.path.sep.join([detector_path, "deploy.proto.txt"])
modelPath = os.path.sep.join([detector_path,"res10_300x300_ssd_iter_140000_fp16.caffemodel"])
detector = cv2.dnn.readNetFromCaffe(protoPath, modelPath)
# 加載我們序列化的人臉編碼模型
print("[INFO] loading face recognizer...")
embedder = cv2.dnn.readNetFromTorch(embedding_path)
# 加載實(shí)際的人臉識(shí)別模型和標(biāo)簽編碼器
recognizer = pickle.loads(open(recognizer_path, "rb").read())
le = pickle.loads(open(label_path, "rb").read())
# 加載圖像，將其調(diào)整為寬度為 600 像素（同時(shí)保持縱橫比），然后抓取圖像尺寸
image = cv2.imread(image_path)
image = resize(image, width=600)
(h, w) = image.shape[:2]
# 從圖像構(gòu)建一個(gè) blob
imageBlob = cv2.dnn.blobFromImage(
    cv2.resize(image, (300, 300)), 1.0, (300, 300),
    (104.0, 177.0, 123.0), swapRB=False, crop=False)

# 應(yīng)用 OpenCV 的基于深度學(xué)習(xí)的人臉檢測(cè)器來(lái)定位輸入圖像中的人臉
detector.setInput(imageBlob)
detections = detector.forward()

# loop over the detections
for i in range(0, detections.shape[2]):
    # 提取與預(yù)測(cè)相關(guān)的置信度（即概率）
    confidence = detections[0, 0, i, 2]
    # filter out weak detections
    if confidence > confidence_low:
        # 計(jì)算人臉邊界框的 (x, y) 坐標(biāo)
        box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
        (startX, startY, endX, endY) = box.astype("int")
        # 提取人臉ROI
        face = image[startY:endY, startX:endX]
        (fH, fW) = face.shape[:2]
        # 確保人臉寬度和高度足夠大
        if fW < 20 or fH < 20:
            continue

        # 為人臉 ROI 構(gòu)造一個(gè) blob，然后將 blob 通過(guò)我們的人臉嵌入模型來(lái)獲得人臉的 128-d 量化
        faceBlob = cv2.dnn.blobFromImage(face, 1.0 / 255, (96, 96),
                                         (0, 0, 0), swapRB=True, crop=False)
        embedder.setInput(faceBlob)
        vec = embedder.forward()
        # 執(zhí)行分類以識(shí)別人臉
        preds = recognizer.predict_proba(vec)[0]
        j = np.argmax(preds)
        proba = preds[j]
        name = le.classes_[j]
        # 繪制人臉的邊界框以及相關(guān)的概率
        text = "{}: {:.2f}%".format(name, proba * 100)
        y = startY - 10 if startY - 10 > 10 else startY + 10
        cv2.rectangle(image, (startX, startY), (endX, endY),
                      (0, 0, 255), 2)
        cv2.putText(image, text, (startX, y),
                    cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

# 展示結(jié)果
cv2.imshow("Image", image)
cv2.waitKey(0)

我們?cè)谶@個(gè)塊中加載三個(gè)模型。冒著冗余的風(fēng)險(xiǎn)，我想明確提醒您模型之間的差異：

檢測(cè)器：一個(gè)預(yù)訓(xùn)練的 Caffe DL 模型，用于檢測(cè)人臉在圖像中的位置。
embedder：一個(gè)預(yù)訓(xùn)練的 Torch DL 模型，用于計(jì)算我們的 128-D 人臉嵌入。
識(shí)別器：線性 SVM 人臉識(shí)別模型。

1 和 2 都是預(yù)先訓(xùn)練好的，這意味著它們是由 OpenCV 按原樣提供給您的

加載標(biāo)簽編碼器，其中包含我們的模型可以識(shí)別的人的姓名。

將圖像加載到內(nèi)存中并構(gòu)建一個(gè) blob。

通過(guò)我們的檢測(cè)器定位圖像中的人臉。

您將從步驟 1 中識(shí)別出此塊。我在這里再解釋一遍：

遍歷檢測(cè)并提取每個(gè)檢測(cè)的置信度。

然后將置信度與命令行最小概率檢測(cè)閾值進(jìn)行比較，確保計(jì)算出的概率大于最小概率。

我們提取人臉 ROI并確保它的空間維度足夠大。

下面是識(shí)別人臉 ROI代碼：

首先，構(gòu)建一個(gè) faceBlob）并將其通過(guò)編碼器以生成描述面部的 128 維向量

然后，我們將 vec 通過(guò)我們的 SVM 識(shí)別器模型，其結(jié)果是我們對(duì)面部 ROI 中的人的預(yù)測(cè)。

我們?nèi)∽罡吒怕手笖?shù)并查詢我們的標(biāo)簽編碼器以找到名稱。

循環(huán)中識(shí)別每一張臉（包括“未知”）人：

在構(gòu)造了一個(gè)包含名稱和概率的文本字符串。

然后在臉部周圍繪制一個(gè)矩形并將文本放在框上方。

最后我們?cè)谄聊簧峡梢暬Y(jié)果，直到按下某個(gè)鍵。

可以看出使用機(jī)器學(xué)習(xí)的方式準(zhǔn)確率還是比較低的，但是優(yōu)點(diǎn)是速度快。

攝像頭識(shí)別人臉

這里我用視頻代替，代碼和圖像中識(shí)別人臉的步驟一致，直接上代碼。

新建recognize_video.py腳本，插入一下代碼：

import numpy as np
import pickle
import time
import cv2
import os


def resize(image, width=None, height=None, inter=cv2.INTER_AREA):
    dim = None
    (h, w) = image.shape[:2]
    # 如果高和寬為None則直接返回
    if width is None and height is None:
        return image
    # 檢查寬是否是None
    if width is None:
        # 計(jì)算高度的比例并并按照比例計(jì)算寬度
        r = height / float(h)
        dim = (int(w * r), height)
    # 高為None
    else:
        # 計(jì)算寬度比例，并計(jì)算高度
        r = width / float(w)
        dim = (width, int(h * r))
    resized = cv2.resize(image, dim, interpolation=inter)
    # return the resized image
    return resized

out_put='output.avi'
video_path = '1.mp4'
detector_path = 'face_dete_model'
embedding_path = 'nn4.small2.v1.t7'
recognizer_path = 'output/recognizer.pickle'
label_path = 'output/le.pickle'
confidence_low = 0.5
# load our serialized face detector from disk
print("[INFO] loading face detector...")
protoPath = os.path.sep.join([detector_path, "deploy.proto.txt"])
modelPath = os.path.sep.join([detector_path,"res10_300x300_ssd_iter_140000_fp16.caffemodel"])
detector = cv2.dnn.readNetFromCaffe(protoPath, modelPath)

# load our serialized face embedding model from disk
print("[INFO] loading face recognizer...")
embedder = cv2.dnn.readNetFromTorch(embedding_path)

# load the actual face recognition model along with the label encoder
recognizer = pickle.loads(open(recognizer_path, "rb").read())
le = pickle.loads(open(label_path, "rb").read())

# initialize the video stream, then allow the camera sensor to warm up
print("[INFO] starting video stream...")
#vs = cv2.VideoCapture(0) #攝像頭
vs=cv2.VideoCapture(video_path)# 視頻
time.sleep(2.0)

# start the FPS throughput estimator

writer=None
# loop over frames from the video file stream
while True:
    # grab the frame from the threaded video stream
    ret_val, frame = vs.read()

    if ret_val is False:
        break
    # resize the frame to have a width of 600 pixels (while
    # maintaining the aspect ratio), and then grab the image
    # dimensions
    frame = resize(frame, width=600)
    (h, w) = frame.shape[:2]

    # construct a blob from the image
    imageBlob = cv2.dnn.blobFromImage(
        cv2.resize(frame, (300, 300)), 1.0, (300, 300),
        (104.0, 177.0, 123.0), swapRB=False, crop=False)

    # apply OpenCV's deep learning-based face detector to localize
    # faces in the input image
    detector.setInput(imageBlob)
    detections = detector.forward()

    # loop over the detections
    for i in range(0, detections.shape[2]):
        # extract the confidence (i.e., probability) associated with
        # the prediction
        confidence = detections[0, 0, i, 2]

        # filter out weak detections
        if confidence >confidence_low:
            # compute the (x, y)-coordinates of the bounding box for
            # the face
            box = detections[0, 0, i, 3:7] * np.array([w, h, w, h])
            (startX, startY, endX, endY) = box.astype("int")
            # extract the face ROI
            face = frame[startY:endY, startX:endX]
            (fH, fW) = face.shape[:2]

            # ensure the face width and height are sufficiently large
            if fW < 20 or fH < 20:
                continue

            # construct a blob for the face ROI, then pass the blob
            # through our face embedding model to obtain the 128-d
            # quantification of the face
            faceBlob = cv2.dnn.blobFromImage(face, 1.0 / 255,
                                             (96, 96), (0, 0, 0), swapRB=True, crop=False)
            embedder.setInput(faceBlob)
            vec = embedder.forward()

            # perform classification to recognize the face
            preds = recognizer.predict_proba(vec)[0]
            j = np.argmax(preds)
            proba = preds[j]
            name = le.classes_[j]

            # draw the bounding box of the face along with the
            # associated probability
            text = "{}: {:.2f}%".format(name, proba * 100)
            y = startY - 10 if startY - 10 > 10 else startY + 10
            cv2.rectangle(frame, (startX, startY), (endX, endY),
                          (0, 0, 255), 2)
            cv2.putText(frame, text, (startX, y),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.45, (0, 0, 255), 2)

            if writer is None and out_put is not None:
                fourcc = cv2.VideoWriter_fourcc(*"MJPG")
                writer = cv2.VideoWriter(out_put, fourcc, 20,
                                         (frame.shape[1], frame.shape[0]), True)
                # 如果 writer 不是 None，則將識(shí)別出人臉的幀寫(xiě)入磁盤(pán)
            if writer is not None:
                writer.write(frame)
    # show the output frame
    cv2.imshow("Frame", frame)
    key = cv2.waitKey(1) & 0xFF
    # if the `q` key was pressed, break from the loop
    if key == ord("q"):
        break

# do a bit of cleanup
cv2.destroyAllWindows()
vs.release()
if writer is not None:
    writer.release()

運(yùn)行結(jié)果：?