快捷導(dǎo)航

Pytorch基礎(chǔ)教程之torchserve模型部署解析

更新時(shí)間：2023年07月14日 08:50:16 作者：山頂夕景

torchserve是基于netty網(wǎng)絡(luò)框架實(shí)現(xiàn)的，底層使用EpollServerSocketChannel服務(wù)進(jìn)行網(wǎng)絡(luò)通信，通過(guò)epoll多路復(fù)用技術(shù)實(shí)現(xiàn)高并發(fā)網(wǎng)絡(luò)連接處理，這篇文章主要介紹了Pytorch基礎(chǔ)教程之torchserve模型部署和推理,需要的朋友可以參考下

note

torch-model-archiver打包模型；利用torchserve加載前面打包的模型，并以grpc和http等接口往外提供推理服務(wù)
- 自定義handler類時(shí)initialize()、preprocess()、postprocess()和handle()這四個(gè)方法都是可選的
啟動(dòng)模型的api服務(wù)、curl命令發(fā)送http post請(qǐng)求，請(qǐng)求模型服務(wù)API；流程和TensorFlow serving流程大同小異
torchserve是基于netty網(wǎng)絡(luò)框架實(shí)現(xiàn)的，底層使用EpollServerSocketChannel服務(wù)進(jìn)行網(wǎng)絡(luò)通信，通過(guò)epoll多路復(fù)用技術(shù)實(shí)現(xiàn)高并發(fā)網(wǎng)絡(luò)連接處理。

一、torchserve和archiver模塊

在這里插入圖片描述

模型部署需要用到兩個(gè)模塊
torchserve用來(lái)模型部署
torch-model-archiver打包模型

pip:
    - torch-workflow-archiver
    - torch-model-archiver 
    - torchserve

二、Speech2Text Wav2Vec2模型部署

2.1 準(zhǔn)備模型和自定義handler

Wav2Vec2語(yǔ)音轉(zhuǎn)文本的模型。這里我們?yōu)榱撕?jiǎn)化流程從huggingface下載對(duì)應(yīng)的模型，進(jìn)行本地化利用torchserve部署
hander將原始data進(jìn)行轉(zhuǎn)為模型輸入所需的格式；nlp中很多任務(wù)可以直接用torchtext的text_classifier。

# 1. 導(dǎo)入huggingface模型
from transformers import AutoModelForCTC, AutoProcessor
import os
modelname = "facebook/wav2vec2-base-960h"
model = AutoModelForCTC.from_pretrained(modelname)
processor = AutoProcessor.from_pretrained(modelname)
modelpath = "model"
os.makedirs(modelpath, exist_ok=True)
model.save_pretrained(modelpath)
processor.save_pretrained(modelpath)
# 2. 自定義handler
import torch
import torchaudio
from transformers import AutoProcessor, AutoModelForCTC
import io
class Wav2VecHandler(object):
    def __init__(self):
        self._context = None
        self.initialized = False
        self.model = None
        self.processor = None
        self.device = None
        # Sampling rate for Wav2Vec model must be 16k
        self.expected_sampling_rate = 16_000
    def initialize(self, context):
        """Initialize properties and load model"""
        self._context = context
        self.initialized = True
        properties = context.system_properties
        # See https://pytorch.org/serve/custom_service.html#handling-model-execution-on-multiple-gpus
        self.device = torch.device("cuda:" + str(properties.get("gpu_id")) if torch.cuda.is_available() else "cpu")
        model_dir = properties.get("model_dir")
        self.processor = AutoProcessor.from_pretrained(model_dir)
        self.model = AutoModelForCTC.from_pretrained(model_dir)
    def handle(self, data, context):
        """Transform input to tensor, resample, run model and return transcribed text."""
        input = data[0].get("data")
        if input is None:
            input = data[0].get("body")
        # torchaudio.load accepts file like object, here `input` is bytes
        model_input, sample_rate = torchaudio.load(io.BytesIO(input), format="WAV")
        # Ensure sampling rate is the same as the trained model
        if sample_rate != self.expected_sampling_rate:
            model_input = torchaudio.functional.resample(model_input, sample_rate, self.expected_sampling_rate)
        model_input = self.processor(model_input, sampling_rate = self.expected_sampling_rate, return_tensors="pt").input_values[0]
        logits = self.model(model_input)[0]
        pred_ids = torch.argmax(logits, axis=-1)[0]
        output = self.processor.decode(pred_ids)
        return [output]

在自定義 Handler 中，需要實(shí)現(xiàn)以下方法：

initialize: 用于初始化模型，加載權(quán)重等操作。
preprocess: 用于將原始輸入數(shù)據(jù)轉(zhuǎn)換為 PyTorch 張量。
inference: 用于執(zhí)行模型推理。
postprocess: 用于將模型輸出轉(zhuǎn)換為 API 輸出格式。

2.2 打包模型和啟動(dòng)模型api服務(wù)

可以直接在linux環(huán)境的terminal進(jìn)行如下相關(guān)操作（打包模型、啟動(dòng)模型的api服務(wù)、curl命令發(fā)送http post請(qǐng)求，請(qǐng)求模型服務(wù)API）
curl命令發(fā)送http post請(qǐng)求，請(qǐng)求模型服務(wù)API，如果遇到報(bào)錯(cuò)java.lang.NoSuchMethodError: java.nio.file.Files.readString(Ljava/nio/file/Path;)Ljava/lang/String;則應(yīng)該是JRE沒(méi)有安裝或者需要升級(jí)：sudo apt install default-jre即可。
curl那坨后正常會(huì)返回I HAD THAT CURIOSITY BESIDE ME AT THIS MOMENT%，測(cè)試數(shù)據(jù)是一段簡(jiǎn)單的sample.wav語(yǔ)音文件
Waveform Audio File Format（WAVE，又或者是因?yàn)閃AV后綴而被大眾所知的），它采用RIFF（Resource Interchange File Format）文件格式結(jié)構(gòu)。通常用來(lái)保存PCM格式的原始音頻數(shù)據(jù)，所以通常被稱為無(wú)損音頻

# 打包部署模型文件, 把model部署到torchserve 
torch-model-archiver --model-name Wav2Vec2 --version 1.0 --serialized-file model/pytorch_model.bin --handler ./handler.py --extra-files "model/config.json,model/special_tokens_map.json,model/tokenizer_config.json,model/vocab.json,model/preprocessor_config.json" -f
mv Wav2Vec2.mar model_store
# 啟動(dòng)model服務(wù), 加載前面打包的model, 并以grpc和http接口向外提供推理服務(wù)
torchserve --start --model-store model_store --models Wav2Vec2=Wav2Vec2.mar --ncs
# Once the server is running, let's try it with:
curl -X POST http://127.0.0.1:8080/predictions/Wav2Vec2 --data-binary '@./sample.wav' -H "Content-Type: audio/basic"
# 暫停torchserve serving
torchserve --stop

2.3 相關(guān)參數(shù)記錄

torch-model-archiver：用來(lái)打包模型

model-name: 設(shè)定部署的模型名稱
version: 設(shè)定部署的模型版本
model-file: 定義模型結(jié)構(gòu)的python文件
serialized-file: 設(shè)定訓(xùn)練模型保存的pth文件
export-path: 設(shè)定打包好的模型保存路徑
extra-files: 設(shè)定額外的文件，如label跟id映射的定義文件等，用于一并打包到模型壓縮包中
handler: 為一個(gè)處理器，用來(lái)指定模型推理預(yù)測(cè)前后的數(shù)據(jù)的處理問(wèn)題；如 nlp模型中的文本分詞和轉(zhuǎn)換為id的步驟；以及分類問(wèn)題中，模型預(yù)測(cè)結(jié)果映射為具體的label等數(shù)據(jù)處理功能

torch-model-archiver：用來(lái)打包模型
usage: torch-model-archiver [-h] --model-name MODEL_NAME
                            [--serialized-file SERIALIZED_FILE]
                            [--model-file MODEL_FILE] --handler HANDLER
                            [--extra-files EXTRA_FILES]
                            [--runtime {python,python2,python3}]
                            [--export-path EXPORT_PATH]
                            [--archive-format {tgz,no-archive,default}] [-f]
                            -v VERSION [-r REQUIREMENTS_FILE]

torchserve：該組件用來(lái)加載前面打包的模型，并以grpc和http等接口往外提供推理服務(wù)