亚洲乱码中文字幕综合,中国熟女仑乱hd,亚洲精品乱拍国产一区二区三区,一本大道卡一卡二卡三乱码全集资源,又粗又黄又硬又爽的免费视频

解決Pytorch中Batch Normalization layer踩過的坑

 更新時(shí)間:2021年05月27日 09:48:58   作者:機(jī)器AI  
這篇文章主要介紹了解決Pytorch中Batch Normalization layer踩過的坑,具有很好的參考價(jià)值,希望對(duì)大家有所幫助。如有錯(cuò)誤或未考慮完全的地方,望不吝賜教

1. 注意momentum的定義

Pytorch中的BN層的動(dòng)量平滑和常見的動(dòng)量法計(jì)算方式是相反的,默認(rèn)的momentum=0.1

BN層里的表達(dá)式為:

其中γ和β是可以學(xué)習(xí)的參數(shù)。在Pytorch中,BN層的類的參數(shù)有:

CLASS torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

每個(gè)參數(shù)具體含義參見文檔,需要注意的是,affine定義了BN層的參數(shù)γ和β是否是可學(xué)習(xí)的(不可學(xué)習(xí)默認(rèn)是常數(shù)1和0).

2. 注意BN層中含有統(tǒng)計(jì)數(shù)據(jù)數(shù)值,即均值和方差

track_running_stats – a boolean value that when set to True, this module tracks the running mean and variance, and when set to False, this module does not track such statistics and always uses batch statistics in both training and eval modes. Default: True

在訓(xùn)練過程中model.train(),train過程的BN的統(tǒng)計(jì)數(shù)值—均值和方差是通過當(dāng)前batch數(shù)據(jù)估計(jì)的。

并且測(cè)試時(shí),model.eval()后,若track_running_stats=True,模型此刻所使用的統(tǒng)計(jì)數(shù)據(jù)是Running status 中的,即通過指數(shù)衰減規(guī)則,積累到當(dāng)前的數(shù)值。否則依然使用基于當(dāng)前batch數(shù)據(jù)的估計(jì)值。

3. BN層的統(tǒng)計(jì)數(shù)據(jù)更新

是在每一次訓(xùn)練階段model.train()后的forward()方法中自動(dòng)實(shí)現(xiàn)的,而不是在梯度計(jì)算與反向傳播中更新optim.step()中完成

4. 凍結(jié)BN及其統(tǒng)計(jì)數(shù)據(jù)

從上面的分析可以看出來,正確的凍結(jié)BN的方式是在模型訓(xùn)練時(shí),把BN單獨(dú)挑出來,重新設(shè)置其狀態(tài)為eval (在model.train()之后覆蓋training狀態(tài)).

解決方案:

You should use apply instead of searching its children, while named_children() doesn't iteratively search submodules.

def set_bn_eval(m):
    classname = m.__class__.__name__
    if classname.find('BatchNorm') != -1:
      m.eval()
model.apply(set_bn_eval)

或者,重寫module中的train()方法:

def train(self, mode=True):
        """
        Override the default train() to freeze the BN parameters
        """
        super(MyNet, self).train(mode)
        if self.freeze_bn:
            print("Freezing Mean/Var of BatchNorm2D.")
            if self.freeze_bn_affine:
                print("Freezing Weight/Bias of BatchNorm2D.")
        if self.freeze_bn:
            for m in self.backbone.modules():
                if isinstance(m, nn.BatchNorm2d):
                    m.eval()
                    if self.freeze_bn_affine:
                        m.weight.requires_grad = False
                        m.bias.requires_grad = False

5. Fix/frozen Batch Norm when training may lead to RuntimeError: expected scalar type Half but found Float

解決辦法:

import torch
import torch.nn as nn
from torch.nn import init
from torchvision import models
from torch.autograd import Variable
from apex.fp16_utils import *
def fix_bn(m):
    classname = m.__class__.__name__
    if classname.find('BatchNorm') != -1:
        m.eval()
model = models.resnet50(pretrained=True)
model.cuda()
model = network_to_half(model)
model.train()
model.apply(fix_bn) # fix batchnorm
input = Variable(torch.FloatTensor(8, 3, 224, 224).cuda().half())
output = model(input)
output_mean = torch.mean(output)
output_mean.backward()

Please do

def fix_bn(m):
    classname = m.__class__.__name__
    if classname.find('BatchNorm') != -1:
        m.eval().half()

Reason for this is, for regular training it is better (performance-wise) to use cudnn batch norm, which requires its weights to be in fp32, thus batch norm modules are not converted to half in network_to_half. However, cudnn does not support batchnorm backward in the eval mode , which is what you are doing, and to use pytorch implementation for this, weights have to be of the same type as inputs.

補(bǔ)充:深度學(xué)習(xí)總結(jié):用pytorch做dropout和Batch Normalization時(shí)需要注意的地方,用tensorflow做dropout和BN時(shí)需要注意的地方

用pytorch做dropout和BN時(shí)需要注意的地方

pytorch做dropout:

就是train的時(shí)候使用dropout,訓(xùn)練的時(shí)候不使用dropout,

pytorch里面是通過net.eval()固定整個(gè)網(wǎng)絡(luò)參數(shù),包括不會(huì)更新一些前向的參數(shù),沒有dropout,BN參數(shù)固定,理論上對(duì)所有的validation set都要使用net.eval()

net.train()表示會(huì)納入梯度的計(jì)算。

net_dropped = torch.nn.Sequential(
    torch.nn.Linear(1, N_HIDDEN),
    torch.nn.Dropout(0.5),  # drop 50% of the neuron
    torch.nn.ReLU(),
    torch.nn.Linear(N_HIDDEN, N_HIDDEN),
    torch.nn.Dropout(0.5),  # drop 50% of the neuron
    torch.nn.ReLU(),
    torch.nn.Linear(N_HIDDEN, 1),
)
for t in range(500):
    pred_drop = net_dropped(x)
    loss_drop = loss_func(pred_drop, y)
    optimizer_drop.zero_grad()
    loss_drop.backward()
    optimizer_drop.step()
    if t % 10 == 0:
        # change to eval mode in order to fix drop out effect
        net_dropped.eval()  # parameters for dropout differ from train mode
        test_pred_drop = net_dropped(test_x)
        # change back to train mode
        net_dropped.train()

pytorch做Batch Normalization:

net.eval()固定整個(gè)網(wǎng)絡(luò)參數(shù),固定BN的參數(shù),moving_mean 和moving_var,不懂這個(gè)看下圖:

            if self.do_bn:
                bn = nn.BatchNorm1d(10, momentum=0.5)
                setattr(self, 'bn%i' % i, bn)   # IMPORTANT set layer to the Module
                self.bns.append(bn)
    for epoch in range(EPOCH):
        print('Epoch: ', epoch)
        for net, l in zip(nets, losses):
            net.eval()              # set eval mode to fix moving_mean and moving_var
            pred, layer_input, pre_act = net(test_x)
            net.train()             # free moving_mean and moving_var
        plot_histogram(*layer_inputs, *pre_acts)  

moving_mean 和moving_var

在這里插入圖片描述

用tensorflow做dropout和BN時(shí)需要注意的地方

dropout和BN都有一個(gè)training的參數(shù)表明到底是train還是test, 表明test那dropout就是不dropout,BN就是固定住了BN的參數(shù);

tf_is_training = tf.placeholder(tf.bool, None)  # to control dropout when training and testing
# dropout net
d1 = tf.layers.dense(tf_x, N_HIDDEN, tf.nn.relu)
d1 = tf.layers.dropout(d1, rate=0.5, training=tf_is_training)   # drop out 50% of inputs
d2 = tf.layers.dense(d1, N_HIDDEN, tf.nn.relu)
d2 = tf.layers.dropout(d2, rate=0.5, training=tf_is_training)   # drop out 50% of inputs
d_out = tf.layers.dense(d2, 1)
for t in range(500):
    sess.run([o_train, d_train], {tf_x: x, tf_y: y, tf_is_training: True})  # train, set is_training=True
    if t % 10 == 0:
        # plotting
        plt.cla()
        o_loss_, d_loss_, o_out_, d_out_ = sess.run(
            [o_loss, d_loss, o_out, d_out], {tf_x: test_x, tf_y: test_y, tf_is_training: False} # test, set is_training=False
        )
# pytorch
    def add_layer(self, x, out_size, ac=None):
        x = tf.layers.dense(x, out_size, kernel_initializer=self.w_init, bias_initializer=B_INIT)
        self.pre_activation.append(x)
        # the momentum plays important rule. the default 0.99 is too high in this case!
        if self.is_bn: x = tf.layers.batch_normalization(x, momentum=0.4, training=tf_is_train)    # when have BN
        out = x if ac is None else ac(x)
        return out

當(dāng)BN的training的參數(shù)為train時(shí),只是表示BN的參數(shù)是可變化的,并不是代表BN會(huì)自己更新moving_mean 和moving_var,因?yàn)檫@個(gè)操作是前向更新的op,在做train之前必須確保moving_mean 和moving_var更新了,更新moving_mean 和moving_var的操作在tf.GraphKeys.UPDATE_OPS

 # !! IMPORTANT !! the moving_mean and moving_variance need to be updated,
        # pass the update_ops with control_dependencies to the train_op
        update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
        with tf.control_dependencies(update_ops):
            self.train = tf.train.AdamOptimizer(LR).minimize(self.loss)

以上為個(gè)人經(jīng)驗(yàn),希望能給大家一個(gè)參考,也希望大家多多支持腳本之家。

相關(guān)文章

  • ?python中的元類metaclass詳情

    ?python中的元類metaclass詳情

    這篇文章主要介紹了python中的metaclass詳情,在python中的metaclass就是幫助developer實(shí)現(xiàn)元編程,更多詳細(xì)內(nèi)容需要的小伙伴可以參考一下
    2022-05-05
  • Python數(shù)據(jù)類型之Set集合實(shí)例詳解

    Python數(shù)據(jù)類型之Set集合實(shí)例詳解

    這篇文章主要介紹了Python數(shù)據(jù)類型之Set集合,結(jié)合實(shí)例形式詳細(xì)分析了Python數(shù)據(jù)類型中集合的概念、原理、創(chuàng)建、遍歷、交集、并集等相關(guān)操作技巧,需要的朋友可以參考下
    2019-05-05
  • python3.x中安裝web.py步驟方法

    python3.x中安裝web.py步驟方法

    在本篇文章里小編給大家分享的是關(guān)于python3.x中安裝web.py步驟方法,需要的朋友們可以學(xué)習(xí)下。
    2020-06-06
  • python3.10及以上版本編譯安裝ssl模塊的詳細(xì)過程

    python3.10及以上版本編譯安裝ssl模塊的詳細(xì)過程

    最近搞安裝ssl模塊每天都弄到很晚,所以這里給大家整理下,這篇文章主要給大家介紹了關(guān)于python3.10及以上版本編譯安裝ssl模塊的詳細(xì)過程,文中介紹的非常詳細(xì),需要的朋友可以參考下
    2023-05-05
  • Pyinstaller加密打包成反編譯可執(zhí)行文件

    Pyinstaller加密打包成反編譯可執(zhí)行文件

    這篇文章主要為大家介紹了Pyinstaller加密打包成可執(zhí)行文件方法示例。有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪
    2022-06-06
  • 去除python中的字符串空格的簡(jiǎn)單方法

    去除python中的字符串空格的簡(jiǎn)單方法

    在本篇文章里小編給大家整理了一篇關(guān)于去除python中的字符串空格的簡(jiǎn)單方法,有興趣的朋友們可以學(xué)習(xí)下。
    2020-12-12
  • Python獲取當(dāng)前函數(shù)名稱方法實(shí)例分享

    Python獲取當(dāng)前函數(shù)名稱方法實(shí)例分享

    這篇文章主要介紹了Python獲取當(dāng)前函數(shù)名稱方法實(shí)例分享,具有一定借鑒價(jià)值
    2018-01-01
  • python在不同層級(jí)目錄import模塊的方法

    python在不同層級(jí)目錄import模塊的方法

    這篇文章主要介紹了python 在不同層級(jí)目錄import 模塊的方法,需要的朋友可以參考下
    2016-01-01
  • python數(shù)據(jù)可視化之初探?Seaborn

    python數(shù)據(jù)可視化之初探?Seaborn

    Seaborn?是一個(gè)基于?Matplotlib?的?Python?數(shù)據(jù)可視化庫,它提供了更高級(jí)別的接口,使得創(chuàng)建美觀的統(tǒng)計(jì)圖形變得非常簡(jiǎn)單,在這篇文章中,我們將討論?Seaborn?的基礎(chǔ)使用方法,包括如何創(chuàng)建各種常見的統(tǒng)計(jì)圖形
    2023-07-07
  • 深度解析Python線程和進(jìn)程

    深度解析Python線程和進(jìn)程

    這篇文章主要介紹了Python線程和進(jìn)程的相關(guān)知識(shí),包括線程與進(jìn)程的區(qū)別,通過示例代碼介紹了進(jìn)程與線程的操作方法,需要的朋友可以參考下
    2022-04-04

最新評(píng)論