一文詳解如何用GPU來運行Python代碼
簡介
前幾天搗鼓了一下Ubuntu,正是想用一下我舊電腦上的N卡,可以用GPU來跑代碼,體驗一下多核的快樂。
還好我這破電腦也是支持Cuda的:
$ sudo lshw -C display *-display description: 3D controller product: GK208M [GeForce GT 740M] vendor: NVIDIA Corporation physical id: 0 bus info: pci@0000:01:00.0 version: a1 width: 64 bits clock: 33MHz capabilities: pm msi pciexpress bus_master cap_list rom configuration: driver=nouveau latency=0 resources: irq:35 memory:f0000000-f0ffffff memory:c0000000-cfffffff memory:d0000000-d1ffffff ioport:6000(size=128)
安裝相關(guān)工具
首先安裝一下Cuda的開發(fā)工具,命令如下:
$ sudo apt install nvidia-cuda-toolkit
查看一下相關(guān)信息:
$ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2021 NVIDIA Corporation Built on Thu_Nov_18_09:45:30_PST_2021 Cuda compilation tools, release 11.5, V11.5.119 Build cuda_11.5.r11.5/compiler.30672275_0
通過Conda安裝相關(guān)的依賴包:
conda install numba & conda install cudatoolkit
通過pip安裝也可以,一樣的。
測試與驅(qū)動安裝
簡單測試了一下,發(fā)覺報錯了:
$ /home/larry/anaconda3/bin/python /home/larry/code/pkslow-samples/python/src/main/python/cuda/test1.py Traceback (most recent call last): File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 246, in ensure_initialized self.cuInit(0) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 319, in safe_cuda_api_call self._check_ctypes_error(fname, retcode) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 387, in _check_ctypes_error raise CudaAPIError(retcode, msg) numba.cuda.cudadrv.driver.CudaAPIError: [100] Call to cuInit results in CUDA_ERROR_NO_DEVICE During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/larry/code/pkslow-samples/python/src/main/python/cuda/test1.py", line 15, in <module> gpu_print[1, 2]() File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/compiler.py", line 862, in __getitem__ return self.configure(*args) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/compiler.py", line 857, in configure return _KernelConfiguration(self, griddim, blockdim, stream, sharedmem) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/compiler.py", line 718, in __init__ ctx = get_context() File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/devices.py", line 220, in get_context return _runtime.get_or_create_context(devnum) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/devices.py", line 138, in get_or_create_context return self._get_or_create_context_uncached(devnum) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/devices.py", line 153, in _get_or_create_context_uncached with driver.get_active_context() as ac: File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 487, in __enter__ driver.cuCtxGetCurrent(byref(hctx)) File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 284, in __getattr__ self.ensure_initialized() File "/home/larry/anaconda3/lib/python3.9/site-packages/numba/cuda/cudadrv/driver.py", line 250, in ensure_initialized raise CudaSupportError(f"Error at driver init: {description}") numba.cuda.cudadrv.error.CudaSupportError: Error at driver init: Call to cuInit results in CUDA_ERROR_NO_DEVICE (100)
網(wǎng)上搜了一下,發(fā)現(xiàn)是驅(qū)動問題。通過Ubuntu自帶的工具安裝顯卡驅(qū)動:
還是失敗:
$ nvidia-smi NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
最后,通過命令行安裝驅(qū)動,成功解決這個問題:
$ sudo apt install nvidia-driver-470
檢查后發(fā)現(xiàn)正常了:
$ nvidia-smi Wed Dec 7 22:13:49 2022 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 470.161.03 Driver Version: 470.161.03 CUDA Version: 11.4 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... Off | 00000000:01:00.0 N/A | N/A | | N/A 51C P8 N/A / N/A | 4MiB / 2004MiB | N/A Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
測試代碼也可以跑了。
測試Python代碼
打印ID
準備以下代碼:
from numba import cuda import os def cpu_print(): print('cpu print') @cuda.jit def gpu_print(): dataIndex = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x print('gpu print ', cuda.threadIdx.x, cuda.blockIdx.x, cuda.blockDim.x, dataIndex) if __name__ == '__main__': gpu_print[4, 4]() cuda.synchronize() cpu_print()
這個代碼主要有兩個函數(shù),一個是用CPU執(zhí)行,一個是用GPU執(zhí)行,執(zhí)行打印操作。關(guān)鍵在于@cuda.jit
這個注解,讓代碼在GPU上執(zhí)行。運行結(jié)果如下:
$ /home/larry/anaconda3/bin/python /home/larry/code/pkslow-samples/python/src/main/python/cuda/print_test.py
gpu print 0 3 4 12
gpu print 1 3 4 13
gpu print 2 3 4 14
gpu print 3 3 4 15
gpu print 0 2 4 8
gpu print 1 2 4 9
gpu print 2 2 4 10
gpu print 3 2 4 11
gpu print 0 1 4 4
gpu print 1 1 4 5
gpu print 2 1 4 6
gpu print 3 1 4 7
gpu print 0 0 4 0
gpu print 1 0 4 1
gpu print 2 0 4 2
gpu print 3 0 4 3
cpu print
可以看到GPU總共打印了16次,使用了不同的Thread來執(zhí)行。這次每次打印的結(jié)果都可能不同,因為提交GPU是異步執(zhí)行的,無法確保哪個單元先執(zhí)行。同時也需要調(diào)用同步函數(shù)cuda.synchronize()
,確保GPU執(zhí)行完再繼續(xù)往下跑。
查看時間
我們通過這個函數(shù)來看GPU并行的力量:
from numba import jit, cuda import numpy as np # to measure exec time from timeit import default_timer as timer # normal function to run on cpu def func(a): for i in range(10000000): a[i] += 1 # function optimized to run on gpu @jit(target_backend='cuda') def func2(a): for i in range(10000000): a[i] += 1 if __name__ == "__main__": n = 10000000 a = np.ones(n, dtype=np.float64) start = timer() func(a) print("without GPU:", timer() - start) start = timer() func2(a) print("with GPU:", timer() - start)
結(jié)果如下:
$ /home/larry/anaconda3/bin/python /home/larry/code/pkslow-samples/python/src/main/python/cuda/time_test.py
without GPU: 3.7136273959999926
with GPU: 0.4040513340000871
可以看到使用CPU需要3.7秒,而GPU則只要0.4秒,還是能快不少的。當然這里不是說GPU一定比CPU快,具體要看任務(wù)的類型。
以上就是一文詳解如何用GPU來運行Python代碼的詳細內(nèi)容,更多關(guān)于用GPU運行Python代碼的資料請關(guān)注腳本之家其它相關(guān)文章!
- Python基于pyCUDA實現(xiàn)GPU加速并行計算功能入門教程
- 關(guān)于Python的GPU編程實例近鄰表計算的講解
- Python實現(xiàn)GPU加速的基本操作
- Python3實現(xiàn)打格點算法的GPU加速實例詳解
- GPU排隊腳本實現(xiàn)空閑觸發(fā)python腳本實現(xiàn)示例
- python 詳解如何使用GPU大幅提高效率
- python沒有g(shù)pu,如何改用cpu跑代碼
- 淺談Python實時檢測CPU和GPU的功耗
- Python Pytorch gpu 分析環(huán)境配置
- 利用Python進行全面的GPU環(huán)境檢測與分析
- Python調(diào)用GPU算力的實現(xiàn)步驟
相關(guān)文章
Python利用prettytable實現(xiàn)格式化輸出內(nèi)容
Python有一個第三方模塊叫?prettytable,專門用來將數(shù)據(jù)格式輸出。本文將通過示例為大家詳細講講prettytable的用法,感興趣的可以了解一下2022-07-07詳解NumPy中的線性關(guān)系與數(shù)據(jù)修剪壓縮
本文將通過股票均線計算的案例來為大家講解一下NumPy中的線性關(guān)系以及數(shù)據(jù)修剪壓縮的實現(xiàn),文中的示例代碼講解詳細,感興趣的可以了解一下2022-05-05盤點Python加密解密模塊hashlib的7種加密算法(推薦)
這篇文章主要介紹了盤點Python加密解密模塊hashlib的7種加密算法,本文給大家介紹的非常詳細,對大家的學習或工作具有一定的參考借鑒價值,需要的朋友可以參考下2021-04-04Python3 sort和sorted用法+cmp_to_key()函數(shù)詳解
這篇文章主要介紹了Python3 sort和sorted用法+cmp_to_key()函數(shù)詳解,具有很好的參考價值,希望對大家有所幫助。如有錯誤或未考慮完全的地方,望不吝賜教2023-07-07python自動統(tǒng)計zabbix系統(tǒng)監(jiān)控覆蓋率的示例代碼
這篇文章主要介紹了python自動統(tǒng)計zabbix系統(tǒng)監(jiān)控覆蓋率的示例代碼,本文給大家介紹的非常詳細,對大家的學習或工作具有一定的參考借鑒價值,需要的朋友可以參考下2021-04-04