如何基于Python深度圖生成3D點(diǎn)云詳解

更新時(shí)間：2022年12月20日 10:33:13 作者：Huterox

通常使用TOF等3d攝像頭采集的格式一般只是深度圖,下面這篇文章主要給大家介紹了關(guān)于如何基于Python深度圖生成3D點(diǎn)云的相關(guān)資料,文中通過(guò)示例代碼介紹的非常詳細(xì),需要的朋友可以參考下

前言

廢話不多說(shuō)，直接開(kāi)造。這里的話我們有兩個(gè)目標(biāo)，第一個(gè)是如何把一個(gè)2維圖片上的點(diǎn)映射到3維空間。第二就是如何生成3D點(diǎn)云。當(dāng)然實(shí)際上這是一個(gè)大問(wèn)題，因?yàn)橹灰鉀Q了第一個(gè)問(wèn)題，第二個(gè)問(wèn)題就是“送分”

二維RGB圖像

在說(shuō)到3D點(diǎn)云之前我們就不得不說(shuō)到RGB圖像，也就是一些二維圖像。

圖像以像RGB三個(gè)通道的形式進(jìn)行存儲(chǔ)。也就是這樣：

平時(shí)我們看到的就是左邊的2D圖像，實(shí)際上是以右邊的形式存儲(chǔ)的。

那么從我們的矩陣角度來(lái)看的話，大概是這個(gè)樣子的：

[
[[r,g,b],[r,g,b],[r,g,b],[r,g,b]],
[[r,g,b],[r,g,b],[r,g,b],[r,g,b]],
...

]

這里面存在了兩組信息，

第一組就是我們最容易忽略的位置信息，也就是像素點(diǎn)的位置。

第二組就是我們的色彩信息。

顯然如果我們想要生成3D點(diǎn)云的話，我們的色彩信息是必要的。

而位置信息，在二維平面上，是在3為空間上的映射

那么要想得到3D點(diǎn)云，那么我們就必須得到完整的位置坐標(biāo)，也就是x,y,z

成像原理

那么說(shuō)到這里的話，我們就不得不說(shuō)到，攝像機(jī)是如何把咱們的影像映射出來(lái)的了。

大概他是這樣成像的：

這里咱們有三個(gè)東西需要注意一下。

首先是我們的攝像機(jī)位置

之后是相片的位置也就是中間那個(gè)

最后是我們物品實(shí)際位置

所以我們實(shí)際上的一個(gè)二維圖像，就是一個(gè)投影，在實(shí)際上的話，我們可以理解為深度為1的空間。

但是這個(gè)投影

所以的話，用數(shù)學(xué)矩陣的形式表示為：

其中u,v為二維平面上的任意點(diǎn)坐標(biāo)，1為深度，Zc就是咱們的二維平面上的點(diǎn)。

其中R,T是外參矩陣，并且我們的世界坐標(biāo)原點(diǎn)就是相機(jī)的坐標(biāo)原點(diǎn)

所以，R，T可以取下面的矩陣，（具體原理的話比較復(fù)雜，我也不是很懂為什么會(huì)這樣，我回頭再補(bǔ)補(bǔ)數(shù)學(xué)）

帶入得到：

最后轉(zhuǎn)換得到這個(gè)公式：

那么接下來(lái)咱們就對(duì)這個(gè)公式進(jìn)行處理，來(lái)生成我們的點(diǎn)云

數(shù)據(jù)準(zhǔn)備

前面通過(guò)公式我們可以了解到，一個(gè)二維圖像，具備了RGB信息，還有對(duì)應(yīng)的那個(gè)像素點(diǎn)在二維空間的“投影”。我們需要將這個(gè)點(diǎn)重新還原到三維空間。所以這里需要使用到深度信息，而這個(gè)是需要一些專(zhuān)業(yè)攝像機(jī)才有的，比如iphone的，或者機(jī)器人的：

有了這個(gè)深度圖片，我們就可以還原坐標(biāo)了。其實(shí)說(shuō)到這兒，我們其實(shí)可以發(fā)現(xiàn)，RGB圖片的話其實(shí)只是提供了一下色彩如果要還原為3D點(diǎn)云的話，因?yàn)樯疃葓D和對(duì)應(yīng)的rgb的矩陣的位置是一一對(duì)應(yīng)的。

圖片加載

說(shuō)到這里，我們就可以開(kāi)始我們的正式編碼了。

首先是咱們的圖片加載階段，這里的話主要是我們先需要一些信息，和讀取圖片。

    def __init__(self, rgb_file, depth_file, save_ply, camera_intrinsics=[784.0, 779.0, 649.0, 405.0]):
        self.rgb_file = rgb_file
        self.depth_file = depth_file
        self.save_ply = save_ply

        self.rgb = cv2.imread(rgb_file)
        self.depth = cv2.imread(self.depth_file, -1)

        print("your depth image shape is:",self.depth.shape)

        self.width = self.rgb.shape[1]
        self.height = self.rgb.shape[0]

        self.camera_intrinsics = camera_intrinsics
        self.depth_scale = 1000

下面的這個(gè)東西，就是咱們公式當(dāng)中的u0,v0,dx,dy

camera_intrinsics=[784.0, 779.0, 649.0, 405.0]

這個(gè)要注意的是，這個(gè)玩意呢，是不同的設(shè)備相機(jī)有不同的參數(shù)，要根據(jù)自己的設(shè)置。

一般來(lái)說(shuō)相機(jī)會(huì)給這樣的矩陣：

我們對(duì)應(yīng)起來(lái)就可以了。

之后是要注意的是這個(gè)深度圖是uint16的，所以讀取的時(shí)候加個(gè)-1

這里還有個(gè)東西：

深度圖與比例因子（scale_factor）

在代碼中是：

self.depth_scale = 1000

深度圖對(duì)應(yīng)的尺度因子是深度圖中存儲(chǔ)的值與真實(shí)深度（單位為m）的比例

通常情況下，深度值以毫米為單位存儲(chǔ)在16位無(wú)符號(hào)整數(shù)中(0~65535)，因此要獲得以米為單位的z值，深度圖的像素需要除以比例因子1000。不過(guò)不同相機(jī)的的比例因子可能不同，我這里的話是1000，這個(gè)看自己的實(shí)際的。

算法實(shí)現(xiàn)

之后就是咱們的算法實(shí)現(xiàn)了，這個(gè)主要是位置換算，得到真實(shí)的x,y,z

        depth = np.asarray(self.depth, dtype=np.uint16).T
        # depth[depth==65535]=0
        self.Z = depth / self.depth_scale
        fx, fy, cx, cy = self.camera_intrinsics

        X = np.zeros((self.width, self.height))
        Y = np.zeros((self.width, self.height))
        for i in range(self.width):
            X[i, :] = np.full(X.shape[1], i)

        self.X = ((X - cx / 2) * self.Z) / fx
        for i in range(self.height):
            Y[:, i] = np.full(Y.shape[0], i)
        self.Y = ((Y - cy / 2) * self.Z) / fy

之后,x,y,z就算好了。

注意的的是我們計(jì)算完成后得到的x，y , x 的每一組向量都是寬×高

生成點(diǎn)云

現(xiàn)在咱們已經(jīng)得到了具體的坐標(biāo)。那么接下來(lái)是要生成點(diǎn)云的，我們要把剩下的色彩信息搞上去。

代碼很簡(jiǎn)單：

    data_ply = np.zeros((6, self.width * self.height))
        data_ply[0] = self.X.T.reshape(-1)
        data_ply[1] = -self.Y.T.reshape(-1)
        data_ply[2] = -self.Z.T.reshape(-1)
        img = np.array(self.rgb, dtype=np.uint8)
        data_ply[3] = img[:, :, 0:1].reshape(-1)
        data_ply[4] = img[:, :, 1:2].reshape(-1)
        data_ply[5] = img[:, :, 2:3].reshape(-1)
        self.data_ply = data_ply
        t2 = time.time()
        print('calcualte 3d point cloud Done.', t2 - t1)

之后就是保存文件了。

目前點(diǎn)云的主要存儲(chǔ)格式包括：pts、LAS、PCD、.xyz 和. pcap 等

例如：

pts 點(diǎn)云文件格式是最簡(jiǎn)便的點(diǎn)云格式，直接按 XYZ 順序存儲(chǔ)點(diǎn)云數(shù)據(jù)，可以是整型或者浮點(diǎn)型。

LAS 是激光雷達(dá)數(shù)據(jù)（LiDAR），存儲(chǔ)格式比 pts 復(fù)雜，旨在提供一種開(kāi)放的格式標(biāo)準(zhǔn)，允許不同的硬件和軟件提供商輸出可互操作的統(tǒng)一格式。LAS 格式點(diǎn)云截圖，其中 C：class(所屬類(lèi))，F(xiàn)：flight(航線號(hào))，T：time(GPS 時(shí)間)，I：intensity(回波強(qiáng)度)，R：return(第幾次回波)，N：number of return(回波次數(shù))，A：scan angle(掃描角)，RGB：red green blue(RGB 顏色值)。

等等，我們這里的是ply

全名為多邊形檔案（Polygon File Format）或史丹佛三角形檔案（Stanford Triangle Format）。. 該格式主要用以儲(chǔ)存立體掃描結(jié)果的三維數(shù)值，透過(guò)多邊形片面的集合描述三維物體，與其他格式相較之下這是較為簡(jiǎn)單的方法。. 它可以儲(chǔ)存的資訊包含顏色、透明度、表面法向量、材質(zhì)座標(biāo)與資料可信度，并能對(duì)多邊形的正反兩面設(shè)定不同的屬性。

格式為

頭部

頂點(diǎn)列表

面片列表

其他元素列表

       	start = time.time()
        float_formatter = lambda x: "%.4f" % x
        points = []
        for i in self.data_ply.T:
            points.append("{} {} {} {} {} {} 0\n".format
                          (float_formatter(i[0]), float_formatter(i[1]), float_formatter(i[2]),
                           int(i[3]), int(i[4]), int(i[5])))

        file = open(self.save_ply, "w")
        file.write('''ply
        format ascii 1.0
        element vertex %d
        property float x
        property float y
        property float z
        property uchar red
        property uchar green
        property uchar blue
        property uchar alpha
        end_header
        %s
        ''' % (len(points), "".join(points)))
        file.close()

        end = time.time()
        print("Write into .ply file Done.", end - start)

點(diǎn)云顯示

這個(gè)的話可以使用軟件

也可以使用open3d模塊

完整代碼

import cv2
import numpy as np
import open3d as o3d

import time

class point_cloud_generator():

    def __init__(self, rgb_file, depth_file, save_ply, camera_intrinsics=[784.0, 779.0, 649.0, 405.0]):
        self.rgb_file = rgb_file
        self.depth_file = depth_file
        self.save_ply = save_ply

        self.rgb = cv2.imread(rgb_file)
        self.depth = cv2.imread(self.depth_file, -1)

        print("your depth image shape is:",self.depth.shape)

        self.width = self.rgb.shape[1]
        self.height = self.rgb.shape[0]

        self.camera_intrinsics = camera_intrinsics
        self.depth_scale = 1000

    def compute(self):
        t1 = time.time()

        depth = np.asarray(self.depth, dtype=np.uint16).T
        # depth[depth==65535]=0
        self.Z = depth / self.depth_scale
        fx, fy, cx, cy = self.camera_intrinsics

        X = np.zeros((self.width, self.height))
        Y = np.zeros((self.width, self.height))
        for i in range(self.width):
            X[i, :] = np.full(X.shape[1], i)

        self.X = ((X - cx / 2) * self.Z) / fx
        for i in range(self.height):
            Y[:, i] = np.full(Y.shape[0], i)
        self.Y = ((Y - cy / 2) * self.Z) / fy

        data_ply = np.zeros((6, self.width * self.height))
        data_ply[0] = self.X.T.reshape(-1)
        data_ply[1] = -self.Y.T.reshape(-1)
        data_ply[2] = -self.Z.T.reshape(-1)
        img = np.array(self.rgb, dtype=np.uint8)
        data_ply[3] = img[:, :, 0:1].reshape(-1)
        data_ply[4] = img[:, :, 1:2].reshape(-1)
        data_ply[5] = img[:, :, 2:3].reshape(-1)
        self.data_ply = data_ply
        t2 = time.time()
        print('calcualte 3d point cloud Done.', t2 - t1)

    def write_ply(self):
        start = time.time()
        float_formatter = lambda x: "%.4f" % x
        points = []
        for i in self.data_ply.T:
            points.append("{} {} {} {} {} {} 0\n".format
                          (float_formatter(i[0]), float_formatter(i[1]), float_formatter(i[2]),
                           int(i[3]), int(i[4]), int(i[5])))

        file = open(self.save_ply, "w")
        file.write('''ply
        format ascii 1.0
        element vertex %d
        property float x
        property float y
        property float z
        property uchar red
        property uchar green
        property uchar blue
        property uchar alpha
        end_header
        %s
        ''' % (len(points), "".join(points)))
        file.close()

        end = time.time()
        print("Write into .ply file Done.", end - start)

    def show_point_cloud(self):
        pcd = o3d.io.read_point_cloud(self.save_ply)
        o3d.visualization.draw([pcd])


if __name__ == '__main__':
    camera_intrinsics = [378.998657, 378.639862, 321.935120, 240.766663]
    rgb_file = "data/1.jpg"
    depth_file = "data/1.png"
    save_ply = "data.ply"
    a = point_cloud_generator(rgb_file=rgb_file,
                              depth_file=depth_file,
                              save_ply=save_ply,
                              camera_intrinsics=camera_intrinsics
                              )
    a.compute()
    a.write_ply()
    a.show_point_cloud()

效果如下：

總結(jié)

這里的話其實(shí)還是從一張深度圖到3d點(diǎn)云，實(shí)際上有時(shí)候我們可能只需一組坐標(biāo)然后還原，那么這部分的話，要實(shí)現(xiàn)的話也不難，不過(guò)要重新做一點(diǎn)轉(zhuǎn)換，公式還是那個(gè)公式。這里的話就不能在多說(shuō)了，而且這里還有一點(diǎn)值得一提的是這個(gè)通過(guò)這種方式得到的3d點(diǎn)云其實(shí)怎么說(shuō)呢，并不是特別準(zhǔn)，所以這邊還是要更加牛批的人工智能算法的，目前有個(gè)集成玩意可以玩玩的是middlepipe。這方面的話Google還是得是他。

到此這篇關(guān)于如何基于Python深度圖生成3D點(diǎn)云的文章就介紹到這了,更多相關(guān)Python深度圖生成3D點(diǎn)云內(nèi)容請(qǐng)搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: