一文詳解如何用GPU來(lái)運(yùn)行Python代碼
簡(jiǎn)介
前幾天搗鼓了一下Ubuntu,正是想用一下我舊電腦上的N卡,可以用GPU來(lái)跑代碼,體驗(yàn)一下多核的快樂(lè)。
還好我這破電腦也是支持Cuda的:
$ sudo lshw -C display
*-display
description: 3D controller
product: GK208M [GeForce GT 740M]
vendor: NVIDIA Corporation
physical id: 0
bus info: pci@0000:01:00.0
version: a1
width: 64 bits
clock: 33MHz
capabilities: pm msi pciexpress bus_master cap_list rom
configuration: driver=nouveau latency=0
resources: irq:35 memory:f0000000-f0ffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:6000(size=128)
安裝相關(guān)工具
首先安裝一下Cuda的開(kāi)發(fā)工具,命令如下:
$ sudo apt install nvidia-cuda-toolkit
查看一下相關(guān)信息:
$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Nov_18_09:45:30_PST_2021 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0
通過(guò)Conda安裝相關(guān)的依賴(lài)包:
conda install numba & conda install cudatoolkit
通過(guò)pip安裝也可以,一樣的。
測(cè)試與驅(qū)動(dòng)安裝
簡(jiǎn)單測(cè)試了一下,發(fā)覺(jué)報(bào)錯(cuò)了:
$ /home/larry/anaconda3/bin/python /home/larry/code/pkslow-samples/python/src/main/python/cuda/test1.py
Traceback (most recent call last):
File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 246, in ensure_initialized
self.cuInit(0)
File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 319, in safe_cuda_api_call
self._check_ctypes_error(fname, retcode)
File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 387, in _check_ctypes_error
raise CudaAPIError(retcode, msg)
numba.cuda.cudadrv.driver.CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/larry/code/pkslow-samples/python/src/main/python/cuda/test1.py", line 15, in <module>
gpu_print[1, 2]()
File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/compiler.py", line 862, in __getitem__
return self.configure(*args)
File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/compiler.py", line 857, in configure
return _KernelConfiguration(self, griddim, blockdim, stream, sharedmem)
File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/compiler.py", line 718, in __init__
ctx = get_context()
File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/devices.py", line 220, in get_context
return _runtime.get_or_create_context(devnum)
File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/devices.py", line 138, in get_or_create_context
return self._get_or_create_context_uncached(devnum)
File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/devices.py", line 153, in _get_or_create_context_uncached
with driver.get_active_context() as ac:
File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 487, in __enter__
driver.cuCtxGetCurrent(byref(hctx))
File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 284, in __getattr__
self.ensure_initialized()
File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 250, in ensure_initialized
raise CudaSupportError(f"Error at driver init: {description}")
numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: Call to cuInit results in CUDA_ERROR_NO_DEVICE (100)
網(wǎng)上搜了一下,發(fā)現(xiàn)是驅(qū)動(dòng)問(wèn)題。通過(guò)Ubuntu自帶的工具安裝顯卡驅(qū)動(dòng):

還是失敗:
$ nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
最后,通過(guò)命令行安裝驅(qū)動(dòng),成功解決這個(gè)問(wèn)題:
$ sudo apt install nvidia-driver-470
檢查后發(fā)現(xiàn)正常了:
$ nvidia-smi
Wed Dec 7 22:13:49 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:01:00.0 N/A | N/A |
| N/A 51C P8 N/A / N/A | 4MiB / 2004MiB | N/A Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
測(cè)試代碼也可以跑了。
測(cè)試Python代碼
打印ID
準(zhǔn)備以下代碼:
from numba import cuda
import os
def cpu_print():
print('cpu print')
@cuda.jit
def gpu_print():
dataIndex = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x
print('gpu print ', cuda.threadIdx.x, cuda.blockIdx.x, cuda.blockDim.x, dataIndex)
if __name__ == '__main__':
gpu_print[4, 4]()
cuda.synchronize()
cpu_print()
這個(gè)代碼主要有兩個(gè)函數(shù),一個(gè)是用CPU執(zhí)行,一個(gè)是用GPU執(zhí)行,執(zhí)行打印操作。關(guān)鍵在于@cuda.jit這個(gè)注解,讓代碼在GPU上執(zhí)行。運(yùn)行結(jié)果如下:
$ /home/larry/anaconda3/bin/python /home/larry/code/pkslow-samples/python/src/main/python/cuda/print_test.py
gpu print 0 3 4 12
gpu print 1 3 4 13
gpu print 2 3 4 14
gpu print 3 3 4 15
gpu print 0 2 4 8
gpu print 1 2 4 9
gpu print 2 2 4 10
gpu print 3 2 4 11
gpu print 0 1 4 4
gpu print 1 1 4 5
gpu print 2 1 4 6
gpu print 3 1 4 7
gpu print 0 0 4 0
gpu print 1 0 4 1
gpu print 2 0 4 2
gpu print 3 0 4 3
cpu print
可以看到GPU總共打印了16次,使用了不同的Thread來(lái)執(zhí)行。這次每次打印的結(jié)果都可能不同,因?yàn)樘峤籊PU是異步執(zhí)行的,無(wú)法確保哪個(gè)單元先執(zhí)行。同時(shí)也需要調(diào)用同步函數(shù)cuda.synchronize(),確保GPU執(zhí)行完再繼續(xù)往下跑。
查看時(shí)間
我們通過(guò)這個(gè)函數(shù)來(lái)看GPU并行的力量:
from numba import jit, cuda
import numpy as np
# to measure exec time
from timeit import default_timer as timer
# normal function to run on cpu
def func(a):
for i in range(10000000):
a[i] += 1
# function optimized to run on gpu
@jit(target_backend='cuda')
def func2(a):
for i in range(10000000):
a[i] += 1
if __name__ == "__main__":
n = 10000000
a = np.ones(n, dtype=np.float64)
start = timer()
func(a)
print("without GPU:", timer() - start)
start = timer()
func2(a)
print("with GPU:", timer() - start)
結(jié)果如下:
$ /home/larry/anaconda3/bin/python /home/larry/code/pkslow-samples/python/src/main/python/cuda/time_test.py
without GPU: 3.7136273959999926
with GPU: 0.4040513340000871
可以看到使用CPU需要3.7秒,而GPU則只要0.4秒,還是能快不少的。當(dāng)然這里不是說(shuō)GPU一定比CPU快,具體要看任務(wù)的類(lèi)型。
以上就是一文詳解如何用GPU來(lái)運(yùn)行Python代碼的詳細(xì)內(nèi)容,更多關(guān)于用GPU運(yùn)行Python代碼的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章!
- Python基于pyCUDA實(shí)現(xiàn)GPU加速并行計(jì)算功能入門(mén)教程
- 關(guān)于Python的GPU編程實(shí)例近鄰表計(jì)算的講解
- Python實(shí)現(xiàn)GPU加速的基本操作
- Python3實(shí)現(xiàn)打格點(diǎn)算法的GPU加速實(shí)例詳解
- GPU排隊(duì)腳本實(shí)現(xiàn)空閑觸發(fā)python腳本實(shí)現(xiàn)示例
- python 詳解如何使用GPU大幅提高效率
- python沒(méi)有g(shù)pu,如何改用cpu跑代碼
- 淺談Python實(shí)時(shí)檢測(cè)CPU和GPU的功耗
- Python Pytorch gpu 分析環(huán)境配置
- 利用Python進(jìn)行全面的GPU環(huán)境檢測(cè)與分析
- Python調(diào)用GPU算力的實(shí)現(xiàn)步驟
相關(guān)文章
Python利用prettytable實(shí)現(xiàn)格式化輸出內(nèi)容
Python有一個(gè)第三方模塊叫?prettytable,專(zhuān)門(mén)用來(lái)將數(shù)據(jù)格式輸出。本文將通過(guò)示例為大家詳細(xì)講講prettytable的用法,感興趣的可以了解一下2022-07-07
詳解NumPy中的線性關(guān)系與數(shù)據(jù)修剪壓縮
本文將通過(guò)股票均線計(jì)算的案例來(lái)為大家講解一下NumPy中的線性關(guān)系以及數(shù)據(jù)修剪壓縮的實(shí)現(xiàn),文中的示例代碼講解詳細(xì),感興趣的可以了解一下2022-05-05
python實(shí)現(xiàn)telnet客戶端的方法
這篇文章主要介紹了python實(shí)現(xiàn)telnet客戶端的方法,分析了Python中telnetlib模塊實(shí)現(xiàn)telnet操作的方法,并實(shí)例敘述了Telnet客戶端的實(shí)現(xiàn)技巧,需要的朋友可以參考下2015-04-04
盤(pán)點(diǎn)Python加密解密模塊hashlib的7種加密算法(推薦)
這篇文章主要介紹了盤(pán)點(diǎn)Python加密解密模塊hashlib的7種加密算法,本文給大家介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2021-04-04
Numpy數(shù)組的廣播機(jī)制的實(shí)現(xiàn)
這篇文章主要介紹了Numpy數(shù)組的廣播機(jī)制的實(shí)現(xiàn),文中通過(guò)示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友們下面隨著小編來(lái)一起學(xué)習(xí)學(xué)習(xí)吧2020-11-11
Python3 sort和sorted用法+cmp_to_key()函數(shù)詳解
這篇文章主要介紹了Python3 sort和sorted用法+cmp_to_key()函數(shù)詳解,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2023-07-07
python自動(dòng)統(tǒng)計(jì)zabbix系統(tǒng)監(jiān)控覆蓋率的示例代碼
這篇文章主要介紹了python自動(dòng)統(tǒng)計(jì)zabbix系統(tǒng)監(jiān)控覆蓋率的示例代碼,本文給大家介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或工作具有一定的參考借鑒價(jià)值,需要的朋友可以參考下2021-04-04
http請(qǐng)求 request失敗自動(dòng)重新嘗試代碼示例
這篇文章主要介紹了http請(qǐng)求 request失敗自動(dòng)重新嘗試代碼示例,小編覺(jué)得還是挺不錯(cuò)的,具有一定借鑒價(jià)值,需要的朋友可以參考下2018-01-01

