Python?調(diào)用GPT-3?API實(shí)現(xiàn)過(guò)程詳解

更新時(shí)間：2023年02月16日 15:14:43 作者：老齊Py

這篇文章主要為大家介紹了Python?調(diào)用GPT-3?API實(shí)現(xiàn)過(guò)程詳解，有需要的朋友可以借鑒參考下，希望能夠有所幫助，祝大家多多進(jìn)步，早日升職加薪

用 Python 調(diào)用 GPT-3 API

GPT-3 是去年由 Open AI 推出的語(yǔ)言機(jī)器學(xué)習(xí)模型。它因其能夠?qū)懽?、?xiě)歌、寫(xiě)詩(shī)，甚至寫(xiě)代碼而獲得了廣泛的媒體關(guān)注！該工具免費(fèi)使用，只需要注冊(cè)一個(gè)電子郵件即可。

GPT-3 是一種叫 transformer 的機(jī)器學(xué)習(xí)模型。具體來(lái)說(shuō)，它就是 Generative Pre-training Transformer，因此叫做“GPT”。Transformer 架構(gòu)使用自我注意和強(qiáng)化學(xué)習(xí)來(lái)模擬會(huì)話(huà)文本。通常，它一次處理一個(gè)單詞，并使用前面的單詞預(yù)測(cè)序列中的下一個(gè)單詞。

GPT-3 具有廣泛的應(yīng)用場(chǎng)景，涵蓋科學(xué)、藝術(shù)和技術(shù)等所有領(lǐng)域。它可以用來(lái)回答有關(guān)科學(xué)和數(shù)學(xué)的基本問(wèn)題。甚至可以準(zhǔn)確回答研究生級(jí)別的數(shù)學(xué)和科學(xué)概念相關(guān)的問(wèn)題。更令人驚訝的是，我詢(xún)問(wèn)了一些與我的物理化學(xué)博士研究有關(guān)的問(wèn)題，它能夠提供較好的解釋。不過(guò)，它也有其局限性。當(dāng)我詢(xún)問(wèn) GPT-3 有關(guān)物理化學(xué)中更新奇的研究方法時(shí)，它無(wú)法提供明確的答案。因此，在作為教育和研究的搜索引擎使用時(shí)，應(yīng)該謹(jǐn)慎使用 GPT-3。GPT-3 沒(méi)有事實(shí)核查功能。隨著事實(shí)核查功能的提高，我可以想象 GPT-3 在研究生階段甚至在研究領(lǐng)域?qū)⒎浅Ｓ杏谩?/p>

此外，除了我個(gè)人的經(jīng)驗(yàn)外，我還看到了其他很多很酷的工具應(yīng)用。例如，一個(gè)開(kāi)發(fā)人員使用 GPT-3 來(lái)編排完成復(fù)雜任務(wù)的云服務(wù)。其他用戶(hù)使用 GPT-3 生成了工作的 python 和 SQL 腳本，以及其他語(yǔ)言的程序。在藝術(shù)領(lǐng)域，用戶(hù)請(qǐng) GPT-3 寫(xiě)一篇比較現(xiàn)代和當(dāng)代藝術(shù)的文章。GPT-3 的潛在應(yīng)用幾乎在任何領(lǐng)域都是豐富的。

GPT-3 在回答有準(zhǔn)確內(nèi)容的基本問(wèn)題方面表現(xiàn)得很好。例如，它可以對(duì)光合作用做出相當(dāng)不錯(cuò)的解釋。它不能很好地回答關(guān)于光合作用的前沿研究問(wèn)題，例如，它不能描述光合作用的機(jī)理和涉及的量子概念。它可以給出體面的回應(yīng)，但不太可能提供大多數(shù)研究問(wèn)題的技術(shù)細(xì)節(jié)。同樣，GPT-3 可以編寫(xiě)一些簡(jiǎn)單的工作代碼，但是隨著任務(wù)的復(fù)雜度增加，生成的代碼就越容易出錯(cuò)。它也不能生成政治觀(guān)點(diǎn)、倫理價(jià)值觀(guān)、投資建議、準(zhǔn)確的新聞報(bào)道等通常是由人類(lèi)生成的內(nèi)容。

盡管 GPT-3 有其局限性，但其廣泛適用性令人印象深刻。我認(rèn)為提出一些有趣的數(shù)據(jù)科學(xué)和機(jī)器學(xué)習(xí)提示，以看看它們是否可以補(bǔ)充數(shù)據(jù)科學(xué)工作流程的部分是有趣的。

首先，我們將根據(jù)一些簡(jiǎn)單的提示生成一些與數(shù)據(jù)科學(xué)有關(guān)的文本。一旦我們對(duì)該工具有了一些了解，就可以詢(xún)問(wèn)一些可以幫助解決數(shù)據(jù)科學(xué)任務(wù)的問(wèn)題。有幾個(gè)有趣的數(shù)據(jù)科學(xué)和機(jī)器學(xué)習(xí)問(wèn)題，我們可以向 GPT-3 詢(xún)問(wèn)。例如，是否可以使用 GPT-3 源自公開(kāi)可用的數(shù)據(jù)集？GPT-3 的訓(xùn)練數(shù)據(jù)有多少等。另一個(gè)有趣的應(yīng)用是問(wèn)題框架。 GPT-3 可以幫助用戶(hù)構(gòu)建良好的機(jī)器學(xué)習(xí)研究問(wèn)題嗎？雖然它難以給出具體的技術(shù)答案，但也許它可以很好地構(gòu)建出未解決的研究問(wèn)題。

另一個(gè)很酷的應(yīng)用是使用 GPT-3 來(lái)決定用于特定應(yīng)用程序的 ML 模型。這很好，因?yàn)閷?duì)于在線(xiàn)文獻(xiàn)豐富的經(jīng)過(guò)驗(yàn)證的技術(shù)，它應(yīng)該能夠很好地幫助用戶(hù)選擇模型，并解釋為什么選定的模型最適合。最后，我們可以嘗試使用GPT-3 編寫(xiě)一些數(shù)據(jù)科學(xué)任務(wù)的 Python 代碼。例如，我們將看看是否可以使用它來(lái)編寫(xiě)生成特定用例的合成數(shù)據(jù)的代碼。

注意：GPT-3 API 的結(jié)果是不確定的。因此，您獲得的結(jié)果可能與此處顯示的輸出略有不同。此外，由于 GPT-3 沒(méi)有事實(shí)核查機(jī)制，建議您對(duì)計(jì)劃用于工作，學(xué)?；騻€(gè)人項(xiàng)目的任何事實(shí)結(jié)果進(jìn)行雙重核查。

在這項(xiàng)工作中，我將在 Deepnote 中編寫(xiě)代碼，它是一個(gè)協(xié)作數(shù)據(jù)科學(xué)筆記本，使得運(yùn)行可再現(xiàn)實(shí)驗(yàn)非常簡(jiǎn)單。

安裝 GPT-3

首先，讓我們到 Deepnote 并創(chuàng)建一個(gè)新項(xiàng)目（如果您還沒(méi)有賬戶(hù)，可以免費(fèi)注冊(cè)）。

創(chuàng)建一個(gè)名為“GPT3”的項(xiàng)目以及該項(xiàng)目中的一個(gè)名為“GPT3_ds”的 notebook。

接下來(lái)，我們?cè)诘谝粋€(gè)單元中使用 pip 安裝 OpenAI：

%pip install openai
%pip install catboost

將密鑰保存在 openAI 對(duì)象的 api_key 屬性：

import openai
openai.api_key = "your-key"

接下來(lái)就可以提問(wèn)了，比如問(wèn)“什么是 Pandas 庫(kù)”，GP3 會(huì)給反饋：

completion = openai.Completion.create(engine="text-davinci-003", prompt="What is the pandas library?", max_tokens=1000)
print(completion.choices[0]['text'])
# output
Pandas is an open source software library written in Python for data manipulation and analysis. Pandas is widely used in data science, machine learning and many other fields. It provides high-level data structures and tools for handling and manipulating data, including data frames, series, plotting tools and more.

我們甚至可以詢(xún)問(wèn)更具體的問(wèn)題，例如“Pandas 的一些常見(jiàn)用途是什么？”。它給出了合理的答案，列出了數(shù)據(jù)整理、數(shù)據(jù)可視化、數(shù)據(jù)聚合和時(shí)間序列分析：

completion = openai.Completion.create(engine="text-davinci-003", prompt="what are some common Pandas use cases?", max_tokens=240)
print(completion.choices[0]['text'])
# output
1. Data Cleaning and Transformation
2. Data Analysis and Exploration
3. Time Series Analysis
4. Data Visualization
5. Statistical Modeling
6. Predictive Modeling
7. Machine Learning
8. Web Scraping

詢(xún)問(wèn)“最常見(jiàn)的深度學(xué)習(xí)庫(kù)是什么？”：

#what are the most common deep learning libraries?
completion = openai.Completion.create(engine="text-davinci-003", prompt="what are the most common deep learning libraries?", max_tokens=240)
print(completion.choices[0]['text'])
# output
. TensorFlow 
2. PyTorch 
3. Keras 
4. Caffe 
5. CNTK 
6. MXNet 
7. Theano 
8. Deeplearning4j 
9. Gensim 
10. LUNA

甚至可以詢(xún)問(wèn)“什么是深度神經(jīng)網(wǎng)絡(luò)？”：

completion = openai.Completion.create(engine="text-davinci-003", prompt="What is a deep neural network?", max_tokens=1000)
print(completion.choices[0]['text'])
# output
A deep neural network (DNN) is a type of artificial neural network (ANN) with multiple layers of neurons between the input and output layers. DNNs are designed to learn complex non-linear relationships from data, and have been successfully applied in a wide range of areas such as image recognition, natural language processing, and financial forecasting.

使用 GPT-3 進(jìn)行事實(shí)搜索應(yīng)該謹(jǐn)慎進(jìn)行，因?yàn)闆](méi)有任何事實(shí)核查機(jī)制。在大多數(shù)情況下，如果您想獲得關(guān)于數(shù)據(jù)科學(xué)和機(jī)器學(xué)習(xí)的眾多在線(xiàn)博客和論壇中已知概念的表面理解，GPT-3 應(yīng)該能夠很好地工作。

特征工程

GPT-3 的另一個(gè)有趣的應(yīng)用案例是用作指導(dǎo)數(shù)據(jù)科學(xué)工作的輔助工具。例如，GPT-3 能否給出一些改進(jìn)模型性能的特征變換的想法？讓我們?cè)儐?wèn) GPT-3：

completion = openai.Completion.create(engine="text-davinci-003", prompt="give some ideas on feature transformations that can improve model performance", max_tokens=1000)
print(completion.choices[0]['text'])
# output
1. Standardization/Normalization: A common feature transform used to ensure features are on the same scale, standardizing or normalizing variables can help limit the severity of outliers and improve the overall model performance.
2. Feature Binning: Binning is a process of transforming numerical variables into categorical ones. This can be useful when working with variables that have too many levels and can have a significant effect on the model performance.
3. Polynomial Expansion: When a nonlinear relationship is expected between features and the output variable, a polynomial expansion feature transformation can help improve model performance.
4. Feature Selection: Removing redundant or irrelevant features from the dataset can help improve the model performance as these features may lead to overfitting.
5. Ensemble: Combining different types of models (or different versions of the same model) can often improve performance due to their combined capabilities.

我們看到它給出了一些很好的特征變換建議以及每個(gè)變換的解釋。

讓我們看看是否可以更進(jìn)一步。讓它寫(xiě)一些 Python 代碼：

completion = openai.Completion.create(engine="text-davinci-003", prompt="Write example python code that performs data standardization", max_tokens=1000)
print(completion.choices[0]['text'])
#output
# Import the necessary libraries
import numpy as np
# Define the data 
data = np.array([[-3, 9, 0, 8],
                 [ 4, 6, 5, 12],
                 [20, 2, 3, 15]])
# Calculate mean and standard deviation
mean = np.mean(data, axis=0)
std = np.std(data, axis=0)
# Perform data standardization
standardized_data = (data - mean) / std
# Print the results
print(standardized_data)

復(fù)制并粘貼到一個(gè)新單元格中并運(yùn)行它：

# Import the necessary libraries
import numpy as np
# Define the data 
data = np.array([[-3, 9, 0, 8],
                 [ 4, 6, 5, 12],
                 [20, 2, 3, 15]])
# Calculate mean and standard deviation
mean = np.mean(data, axis=0)
std = np.std(data, axis=0)
# Perform data standardization
standardized_data = (data - mean) / std
# Print the results
print(standardized_data)
# output
[[-1.03881504  1.16247639 -1.29777137 -1.27872403]
 [-0.31164451  0.11624764  1.13554995  0.11624764]
 [ 1.35045955 -1.27872403  0.16222142  1.16247639]]

接下來(lái)，讓我們對(duì)特征標(biāo)準(zhǔn)化變換做同樣的操作：

completion = openai.Completion.create(engine="text-davinci-003", prompt="Write example python code that performs data normalization on fake data", max_tokens=1000)
print(completion.choices[0]['text'])
# output
# Normalizing data will rescale features in the range [0,1]
data = [3, 7, 10, 13] # Sample data
# Calculate the maximum and minimum of the data
max_data = max(data)
min_data = min(data)
# Normalize the data
normalized_data = [(x-min_data)/(max_data-min_data) for x in data]
# Print first value to check 
print(normalized_data[0]) # Prints 0.2

執(zhí)行返回的代碼：

# Normalizing data will rescale features in the range [0,1]
data = [3, 7, 10, 13] # Sample data
# Calculate the maximum and minimum of the data
max_data = max(data)
min_data = min(data)
# Normalize the data
normalized_data = [(x-min_data)/(max_data-min_data) for x in data]
# Print first value to check 
print(normalized_data)

模型選擇

另一個(gè)酷的應(yīng)用是模型選擇。讓我們看看 GPT-3 是否可以指導(dǎo)我們選擇機(jī)器學(xué)習(xí)模型。詢(xún)問(wèn)“如何選擇時(shí)間序列模型？”

completion = openai.Completion.create(engine="text-davinci-003", prompt="How do i select a time series model?", max_tokens=1000)
print(completion.choices[0]['text'])
# output
The selection of a time series model depends on the type of data that is being analyzed. If there are long-term trends or cycles in the data, then an autoregressive integrated moving average (ARIMA) model or a exponential smoothing (ETS) model may be the best option. If the data are seasonal or there are outliers present, then a Holt-Winters model may be a better choice. If the data do not contain any long-term trends or seasonality, a Box-Jenkins model may be suitable. Regardless of the model selected, it is important to consider the type of data that is being modeled when selecting a time series model.

答案是合理的，列出了平穩(wěn)性、季節(jié)性、趨勢(shì)以及模型性能評(píng)估的檢查。讓我們?cè)儐?wèn)更具體的問(wèn)題，例如“何時(shí)使用 ARIMA vs Prophet？”：

completion = openai.Completion.create(engine="text-davinci-003", prompt="When to use ARIMA vs FB prophet?", max_tokens=1000)
print(completion.choices[0]['text'])
# output
ARIMA and FB Prophet are both used for time series forecasting. ARIMA is a linear model and is better for predicting long-term trends with stationary data. FB Prophet is a non-linear model and is better for predicting short-term trends with non-stationary data. So, you would use ARIMA when you are mainly concerned with predicting long-term trends, and you would use FB Prophet when you are mainly concerned with predicting short-term trends.

生成合成數(shù)據(jù)

能用 GPT-3 生成合成數(shù)據(jù)嗎？讓我們看一看！我們將生成一個(gè)眾所周知的數(shù)據(jù)：醫(yī)院再入院和交易數(shù)據(jù)。

醫(yī)院再入院數(shù)據(jù)

詢(xún)問(wèn) GPT-3：“編寫(xiě) python 代碼，生成合成醫(yī)院再入院數(shù)據(jù)”。

completion = openai.Completion.create(engine="text-davinci-003", prompt="Write example python code that generates synthetic healthcare readmission data stored in a dataframe", max_tokens=1000)
print(completion.choices[0]['text'])
# output
import pandas as pd
import numpy as np
# Create Dataframe 
df = pd.DataFrame(columns=['Patient_ID', 'Age', 'Admission_Type', 'Readmitted'])
# Generate Data
np.random.seed(0)
for i in range(10):
  admission_type = np.random.choice(['Urgent', 'Scheduled', 'Emergency'])
  patient_age = np.random.randint(18, 80)
  readmission = np.random.choice([0, 1])
  df.loc[i] = [i+1, patient_age, admission_type, readmission]
# Print Dataframe to Console
print(df)

執(zhí)行此代碼：

import pandas as pd
import numpy as np
# Create Dataframe 
df = pd.DataFrame(columns=['Patient_ID', 'Age', 'Admission_Type', 'Readmitted'])
# Generate Data
np.random.seed(0)
for i in range(10):
  admission_type = np.random.choice(['Urgent', 'Scheduled', 'Emergency'])
  patient_age = np.random.randint(18, 80)
  readmission = np.random.choice([0, 1])
  df.loc[i] = [i+1, patient_age, admission_type, readmission]
# Print Dataframe to Console
df

輸出結(jié)果：

讓我們看看是否可以用這個(gè)合成數(shù)據(jù)構(gòu)建一個(gè)分類(lèi)模型，預(yù)測(cè)重新入院的人，并評(píng)估性能。

completion = openai.Completion.create(engine="text-davinci-003", prompt="Write example python code that generates synthetic healthcare readmission data stored in a dataframe. From this write code that builds a catboost model that predicts readmission outcomes. Also write code to calculate and print performance", max_tokens=3000)
print(completion.choices[0]['text'])
# output
 metrics
## Generate Synthetic Healthcare Readmission Data
import pandas as pd 
import numpy as np 
# set the seed for reproducibility 
np.random.seed(1)
# create dataframe 
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 10)), columns=['age','gender','length_of_stay','diagnosis','NIV','laboratory','past_hospitalizations','medications','bmi','readmission'])
# add labels to data frame 
df['age'] = np.random.randint(20, 80, size=(100))
df['gender'] = np.random.randint(1, 2, size=(100))
df['length_of_stay'] = np.random.randint(2, 14, size=(100))
df['diagnosis'] = np.random.randint(1, 5, size=(100))
df['NIV'] = np.random.randint(0, 2, size=(100))
df['laboratory'] = np.random.randint(1, 6, size=(100))
df['past_hospitalizations'] = np.random.randint(0, 10, size=(100))
df['medications'] = np.random.randint(1, 6, size=(100))
df['bmi'] = np.random.randint(18, 35, size=(100))
df['readmission'] = np.random.randint(0, 2, size=(100))
# print the dataframe 
print(df)
## Build a CatBoost Model
from catboost import CatBoostClassifier
from sklearn.metrics import confusion_matrix
# separate X and y
X = df.iloc[:, 0:9]
y = df.iloc[:, 9]
# initialize catboost classifier 
cat_clf = CatBoostClassifier(iterations=50,
                            learning_rate=0.3,
                            depth=8,
                            eval_metric='Accuracy',
                            random_seed=42)
# fit the model 
cat_clf.fit(X, y)
# predict values
y_pred = cat_clf.predict(X)
# print confusion matrix
conf_mat = confusion_matrix(y, y_pred)
print(conf_mat)
## Calculate and Print Performance Metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score
# calculate performance metrics
acc = accuracy_score(y, y_pred)
precision = precision_score(y, y_pred)
recall = recall_score(y, y_pred)
# print performance metrics 
print("Accuracy: {}".format(acc))
print("Precision: {}".format(precision))
print("Recall: {}".format(recall))

復(fù)制上面的代碼，并執(zhí)行：

## Generate Synthetic Healthcare Readmission Data
import pandas as pd 
import numpy as np 
# set the seed for reproducibility 
np.random.seed(1)
# create dataframe 
df = pd.DataFrame(np.random.randint(0, 100, size=(100, 10)), columns=['age','gender','length_of_stay','diagnosis','NIV','laboratory','past_hospitalizations','medications','bmi','readmission'])
# add labels to data frame 
df['age'] = np.random.randint(20, 80, size=(100))
df['gender'] = np.random.randint(1, 2, size=(100))
df['length_of_stay'] = np.random.randint(2, 14, size=(100))
df['diagnosis'] = np.random.randint(1, 5, size=(100))
df['NIV'] = np.random.randint(0, 2, size=(100))
df['laboratory'] = np.random.randint(1, 6, size=(100))
df['past_hospitalizations'] = np.random.randint(0, 10, size=(100))
df['medications'] = np.random.randint(1, 6, size=(100))
df['bmi'] = np.random.randint(18, 35, size=(100))
df['readmission'] = np.random.randint(0, 2, size=(100))
# print the dataframe 
print(df)
## Build a CatBoost Model
from catboost import CatBoostClassifier
from sklearn.metrics import confusion_matrix
# separate X and y
X = df.iloc[:, 0:9]
y = df.iloc[:, 9]
# initialize catboost classifier 
cat_clf = CatBoostClassifier(iterations=50,
                            learning_rate=0.3,
                            depth=8,
                            eval_metric='Accuracy',
                            random_seed=42)
# fit the model 
cat_clf.fit(X, y)
# predict values
y_pred = cat_clf.predict(X)
# print confusion matrix
conf_mat = confusion_matrix(y, y_pred)
print(conf_mat)
## Calculate and Print Performance Metrics
from sklearn.metrics import accuracy_score, precision_score, recall_score
# calculate performance metrics
acc = accuracy_score(y, y_pred)
precision = precision_score(y, y_pred)
recall = recall_score(y, y_pred)
# print performance metrics 
print("Accuracy: {}".format(acc))
print("Precision: {}".format(precision))
print("Recall: {}".format(recall))
# output
略

交易數(shù)據(jù)

詢(xún)問(wèn) GPT-3：“編寫(xiě) Python 代碼，生成交易數(shù)據(jù)”。

completion = openai.Completion.create(engine="text-davinci-003", prompt="Write example python code that generates synthetic transaction data stored in a dataframe", max_tokens=1000)
print(completion.choices[0]['text'])
# output
import pandas as pd 
import numpy as np 
#create randomly generated customer data
customer_id = np.arange(1,101) 
customer_names = [f'John Doe {x}' for x in range(1,101)] 
#create randomly generated transaction data
transaction_id = np.arange(1,101)
dates = [f'2020-07-{x}' for x in range(1,101)]
amounts = np.random.randint(low=1, high=1000, size=(100,)) 
#create dataframe with randomly generated data
transaction_data = pd.DataFrame({'Customer ID': customer_id, 
                            'Customer Name': customer_names,
                            'Transaction ID': transaction_id, 
                            'Date': dates, 
                            'Amount': amounts})
print(transaction_data)

拷貝代碼，并執(zhí)行：

import pandas as pd 
import numpy as np 
#create randomly generated customer data
customer_id = np.arange(1,101) 
customer_names = [f'John Doe {x}' for x in range(1,101)] 
#create randomly generated transaction data
transaction_id = np.arange(1,101)
dates = [f'2020-07-{x}' for x in range(1,101)]
amounts = np.random.randint(low=1, high=1000, size=(100,)) 
#create dataframe with randomly generated data
transaction_data = pd.DataFrame({'Customer ID': customer_id, 
                            'Customer Name': customer_names,
                            'Transaction ID': transaction_id, 
                            'Date': dates, 
                            'Amount': amounts})
transaction_data

（部分輸出結(jié)果）

現(xiàn)在有物品 ID、客戶(hù)和購(gòu)買(mǎi)金額。讓我們看看是否可以更具體一點(diǎn)。再增加年齡、性別和郵政編碼。

completion = openai.Completion.create(engine="text-davinci-003", prompt="Write example python code that generates synthetic transaction data stored in a dataframe. Include customer ID, transaction amount, item ID, item name, age, gender, and zipcode", max_tokens=2000)
print(completion.choices[0]['text'])
# output
import pandas as pd
import numpy as np
rows = ['customer_ID', 'transaction_amnt', 'item_ID', 'item_name', 'age', 'gender', 'zipcode']
data = pd.DataFrame(columns=rows)  
for i in range(1,100):
        customer_ID = int( np.random.uniform(100,600-100)) 
        transaction_amnt = np.random.uniform(1.25, 10.00)
        item_ID = int( np.random.uniform(1,35))
        item_name = np.random.choice(["phone", "tablet", "laptop", "smartwatch"])
        age = int( np.random.uniform(17,75)) 
        gender = np.random.choice(["male", "female"]) 
        zipcode = np.random.choice(["98101", "98200", "98469", "98801"])
        data.loc[i] = [customer_ID, transaction_amnt, item_ID, item_name, age, gender, zipcode]
print (data)

執(zhí)行代碼：

import pandas as pd
import numpy as np
rows = ['customer_ID', 'transaction_amnt', 'item_ID', 'item_name', 'age', 'gender', 'zipcode']
data = pd.DataFrame(columns=rows)  
for i in range(1,100):
        customer_ID = int( np.random.uniform(100,600-100)) 
        transaction_amnt = np.random.uniform(1.25, 10.00)
        item_ID = int( np.random.uniform(1,35))
        item_name = np.random.choice(["phone", "tablet", "laptop", "smartwatch"])
        age = int( np.random.uniform(17,75)) 
        gender = np.random.choice(["male", "female"]) 
        zipcode = np.random.choice(["98101", "98200", "98469", "98801"])
        data.loc[i] = [customer_ID, transaction_amnt, item_ID, item_name, age, gender, zipcode]
data

（部分輸出結(jié)果）

公共數(shù)據(jù)集的詢(xún)問(wèn)提示

另一種應(yīng)用是詢(xún)問(wèn) GPT-3 關(guān)于公共數(shù)據(jù)集。讓我們?cè)儐?wèn) GPT-3 列出一些公共數(shù)據(jù)集：

completion = openai.Completion.create(engine="text-davinci-003", prompt=" list some good public datasets", max_tokens=1000)
print(completion.choices[0]['text'])
# output
1. US Census Data
2. Enron Email Dataset
3. Global Open Data Index
4. Air Quality Monitoring Data
5. New York City Taxi Trip Data
6. IMF Data
7. World Bank Open Data
8. Google Books Ngrams Dataset
9. Amazon Reviews Dataset
10. UCI Machine Learning Repository

讓我們看看是否可以找到根據(jù) Apache 2.0 許可的公共數(shù)據(jù)。還詢(xún)問(wèn)源鏈接：

completion = openai.Completion.create(engine="text-davinci-003", prompt=" list some good public datasets under apache 2.0 license. provide links to their source", max_tokens=1000, temperature=0)
print(completion.choices[0]['text'])
# output
1. OpenStreetMap: https://www.openstreetmap.org/
2. US Census Data: https://www.census.gov/data.html
3. Google Books Ngrams: https://aws.amazon.com/datasets/google-books-ngrams/
4. Wikipedia: https://dumps.wikimedia.org/enwiki/
5. US Government Spending Data: https://www.usaspending.gov/
6. World Bank Open Data: https://data.worldbank.org/
7. Common Crawl: http://commoncrawl.org/
8. Open Images: https://storage.googleapis.com/openimages/web/index.html
9. OpenFlights: https://openflights.org/data.html
10. GDELT: http://data.gdeltproject.org/

雖然并不是所有這些鏈接都是正確的，但它在尋找源鏈接方面做得相當(dāng)不錯(cuò)。Google Ngrams、Common Crawl和 NASA 數(shù)據(jù)都相當(dāng)出色。如果不提供數(shù)據(jù)的確切位置，在大多數(shù)情況下，它提供了一個(gè)可以找到數(shù)據(jù)的網(wǎng)頁(yè)鏈接。

再請(qǐng)求對(duì)數(shù)據(jù)進(jìn)行描述。請(qǐng)注意，雖然結(jié)果可能重疊，但它們?cè)诿看芜\(yùn)行時(shí)略有不同。據(jù)我所知，結(jié)果并不總是可以相同的：

completion = openai.Completion.create(engine="text-davinci-003", prompt=" list some good public datasets under apache 2.0 license. provide links to their source and descriptions", max_tokens=1000, temperature=0)
print(completion.choices[0]['text'])
# output
1. OpenStreetMap: OpenStreetMap is a free, editable map of the world, created and maintained by volunteers and available for use under an open license. It contains millions of data points, including roads, buildings, and points of interest. Source: https://www.openstreetmap.org/
2. Google Books Ngrams: Google Books Ngrams is a dataset of over 5 million books from Google Books, spanning from 1500 to 2008. It contains word counts for each year, allowing researchers to track the usage of words over time. Source: https://aws.amazon.com/datasets/google-books-ngrams/
3. Wikipedia: Wikipedia is a free, open-source encyclopedia with millions of articles in hundreds of languages. It is available for use under the Creative Commons Attribution-ShareAlike license. Source: https://www.wikipedia.org/
4. Common Crawl: Common Crawl is a large-scale web crawl that collects data from over 5 billion webpages. It is available for use under the Apache 2.0 license. Source: https://commoncrawl.org/
5. Open Images Dataset: The Open Images Dataset is a collection of 9 million images annotated with labels spanning over 6000 categories. It is available for use under the Apache 2.0 license. Source: https://storage.googleapis.com/openimages/web/index.html

機(jī)器學(xué)習(xí)問(wèn)題整理

最后一個(gè)示例，將看看 GPT-3 是否可以幫助我們整理機(jī)器學(xué)習(xí)問(wèn)題。

詢(xún)問(wèn)用例

雖然 GPT-3 中的數(shù)據(jù)僅截至2021年，但它仍然可以幫助我們構(gòu)建與今天仍然相關(guān)的 ML 用例。讓我們?cè)儐?wèn)“社交媒體中有哪些新興的機(jī)器學(xué)習(xí)用例？”:

completion = openai.Completion.create(engine="text-davinci-003", prompt="What are some emerging machine learning use-cases in social media?", max_tokens=1000, temperature=0)
print(completion.choices[0]['text'])
# output
1. Automated Content Curation: Automatically curating content from social media platforms to create personalized content feeds for users.
2. Sentiment Analysis: Analyzing user sentiment from social media posts to gain insights into customer opinions and preferences.
3. Social Media Monitoring: Using machine learning algorithms to monitor social media conversations and detect potential issues or trends.
4. Social Media Advertising: Leveraging machine learning to optimize social media advertising campaigns and target the right audience.
5. Social Media Recommendations: Using machine learning to recommend content to users based on their interests and preferences.

我們也可以看到在醫(yī)療領(lǐng)域的回復(fù)：

completion = openai.Completion.create(engine="text-davinci-003", prompt="What are some emerging machine learning use-cases in healthcare?", max_tokens=1000, temperature=0)
print(completion.choices[0]['text'])
# output
1. Automated Diagnosis: Machine learning algorithms can be used to analyze patient data and medical images to detect and diagnose diseases.
2. Personalized Medicine: Machine learning algorithms can be used to analyze patient data and medical images to create personalized treatment plans for each patient.
3. Drug Discovery: Machine learning algorithms can be used to analyze large datasets of chemical compounds to identify potential new drugs.
4. Clinical Decision Support: Machine learning algorithms can be used to analyze patient data and medical images to provide clinicians with real-time decision support.
5. Predictive Analytics: Machine learning algorithms can be used to analyze patient data and medical images to predict future health outcomes.

詢(xún)問(wèn)研究項(xiàng)目

現(xiàn)在讓我們看看它是否可以提出一些研究問(wèn)題。讓我們?cè)儐?wèn)：“使用深度學(xué)習(xí)進(jìn)行圖像檢測(cè)有哪些好的研究項(xiàng)目？”

completion = openai.Completion.create(engine="text-davinci-003", prompt="What are some good research questions on using deep learning for image detection?", max_tokens=1000, temperature=0)
print(completion.choices[0]['text'])
# output
1. What are the most effective deep learning architectures for image detection?
2. How can deep learning be used to improve the accuracy of image detection?
3. What are the most effective methods for training deep learning models for image detection?
4. How can deep learning be used to detect objects in images with varying levels of complexity?
5. How can deep learning be used to detect objects in images with varying levels of illumination?
6. How can deep learning be used to detect objects in images with varying levels of noise?
7. How can deep learning be used to detect objects in images with varying levels of resolution?
8. How can deep learning be used to detect objects in images with varying levels of occlusion?
9. How can deep learning be used to detect objects in images with varying levels of background clutter?
10. How can deep learning be used to detect objects in images with varying levels of rotation?

再問(wèn)一下 NLP 方向：

completion = openai.Completion.create(engine="text-davinci-003", prompt="What are some good research questions related to NLP transformer models?", max_tokens=1000, temperature=0)
print(completion.choices[0]['text'])
# output
1. How can transformer models be used to improve the accuracy of natural language processing tasks?
2. What are the most effective methods for training transformer models for natural language processing tasks?
3. How can transformer models be used to improve the efficiency of natural language processing tasks?
4. What are the most effective methods for optimizing transformer models for natural language processing tasks?
5. How can transformer models be used to improve the interpretability of natural language processing tasks?
6. What are the most effective methods for deploying transformer models for natural language processing tasks?
7. How can transformer models be used to improve the scalability of natural language processing tasks?
8. What are the most effective methods for combining transformer models with other natural language processing techniques?
9. How can transformer models be used to improve the robustness of natural language processing tasks?
10. What are the most effective methods for evaluating transformer models for natural language processing tasks?

本文所有代碼都發(fā)布在 GitHub 上。

以上就是Python 調(diào)用GPT-3 API實(shí)現(xiàn)過(guò)程詳解的詳細(xì)內(nèi)容，更多關(guān)于Python調(diào)用GPT-3 API的資料請(qǐng)關(guān)注腳本之家其它相關(guān)文章！

您可能感興趣的文章: