PyTorch如何創(chuàng)建自己的數(shù)據(jù)集

更新時間：2022年11月28日 15:06:09 作者：ZQ_ZHU

這篇文章主要介紹了PyTorch如何創(chuàng)建自己的數(shù)據(jù)集，具有很好的參考價值，希望對大家有所幫助。如有錯誤或未考慮完全的地方，望不吝賜教

PyTorch創(chuàng)建自己的數(shù)據(jù)集

圖片文件在同一的文件夾下

思路是繼承 torch.utils.data.Dataset，并重點重寫其 __getitem__方法，示例代碼如下：

class ImageFolder(Dataset):
? ? def __init__(self, folder_path):
? ? ? ? self.files = sorted(glob.glob('%s/*.*' % folder_path))

? ? def __getitem__(self, index):
? ? ? ? path = self.files[index % len(self.files)]
? ? ? ? img = np.array(Image.open(path))
? ? ? ? h, w, c = img.shape
? ? ? ? pad = ((40, 40), (4, 4), (0, 0))

? ? ? ? # img = np.pad(img, pad, 'constant', constant_values=0) / 255
? ? ? ? img = np.pad(img, pad, mode='edge') / 255.0
? ? ? ? img = torch.from_numpy(img).float()
? ? ? ? patches = np.reshape(img, (3, 10, 128, 11, 128))
? ? ? ? patches = np.transpose(patches, (0, 1, 3, 2, 4))

? ? ? ? return img, patches, path

? ? def __len__(self):
? ? ? ? return len(self.files)

圖片文件在不同的文件夾下

比如我們有數(shù)據(jù)如下：

─── data
├── train
│ ├── 0.jpg
│ └── 1.jpg
├── test
│ ├── 0.jpg
│ └── 1.jpg
└── val
├── 1.jpg
└── 2.jpg

此時我們只需要將以上代碼稍作修改即可，修改的代碼如下：

self.files = sorted(glob.glob('%s/**/*.*' % folder_path, recursive=True))

其他代碼不變。

pytorch常用數(shù)據(jù)集的使用

對于pytorch數(shù)據(jù)集的使用，示例代碼如下：

from torch.utils.tensorboard import SummaryWriter
from torchvision.transforms import Compose
from torchvision import transforms
import torchvision
import ssl

ssl._create_default_https_context = ssl._create_unverified_context

dataset_transform = Compose([transforms.ToTensor()])


# 關(guān)于官方數(shù)據(jù)集的使用還是關(guān)鍵要看pytorch的官方文檔
train_set = torchvision.datasets.CIFAR10(root="./CIFAR10",train=True,transform=dataset_transform,download=True)
test_set = torchvision.datasets.CIFAR10(root="./CIFAR10",train=False,transform=dataset_transform,download=True)

# 查看測試數(shù)據(jù)集中的第一個數(shù)據(jù)
# print(test_set[0])
# 查看測試數(shù)據(jù)集中的分類情況
# print(test_set.classes)
#
# 取出第一個數(shù)據(jù)中的圖片（img）和分類結(jié)果（target）
# img,target = test_set[0]
# 查看圖片數(shù)據(jù)的類型
# print(img)
# print(target)
# 輸出類別
# print(test_set.classes[target])
# 查看圖片
# img.show()

# 使用tensorboard顯示tensor數(shù)據(jù)類型的圖片
writer = SummaryWriter("logs")
for i in range(10):
	# 取出數(shù)據(jù)中的圖片（img）和分類結(jié)果（target）
    img,target = test_set[i]
    writer.add_image("test_set",img,i)

writer.close()

上述代碼運行結(jié)果在tensorboard可視化：

代碼

train_set = torchvision.datasets.CIFAR10(root="./CIFAR10",train=True,transform=dataset_transform,download=True)

常用參數(shù)講解

root:根目錄，存放數(shù)據(jù)集的位置
train:若為True，則劃分為訓練數(shù)據(jù)集，若為False，則劃分為測試數(shù)據(jù)集
transform：指定輸入數(shù)據(jù)集處理方式
download:若為True，則會將數(shù)據(jù)集下載到root指定的目錄下，否則不會下載

官方文檔對參數(shù)的解釋：

root (string) – Root directory of dataset where directory cifar-10-batches-py exists or will be saved to if download is set to True.

train (bool, optional) – If True, creates dataset from training set, otherwise creates from test set.

transform (callable, optional) – A function/transform that takes in an PIL image and returns a transformed version. E.g, transforms.RandomCrop

target_transform (callable, optional) – A function/transform that takes in the target and transforms it.

download (bool, optional) – If true, downloads the dataset from the internet and puts it in root directory. If dataset is already downloaded, it is not downloaded again.

注意：