SE block

0. 前言

这篇文章主要是通过在神经网络中加入 SE-block 来加强通道之间的关系，提高性能，理论上讲可以加入任意网络任意任务，并且这篇文章获得了 ImageNet2017 的冠军。很牛逼。

这篇文章清晰易懂，讲得很细(没用的话也比较多)，很work，一起拜读一下。

1. Introduction

SE block: Squeeze and Excitation block

符号表达：$F_{tr}:X\to U, X\in R^{H’×W’×C’}, U\in R^{H×W×C}$

SE block 在底层时更偏向于提取任务之间的共享特征，在高层时更偏向于提取任务相关的特征。

3. Squeeze and Excitation Blocks

3.1 Squeeze: Global Information Embedding

$z_c=F_{sq}(u_c)=\frac{1}{H×W}\sum_{i=1}^H \sum_{j=1}^W u_c(i,j)$

其中，$z\in R^C$

3.2 Excitation: Adaptive Recalibration

$s=F_{ex}(z,W)=\sigma(g(z,W))=\sigma(W_2\delta (W_1z))$

其中，$\delta$表示ReLU函数，$W_1\in R^{\frac{C}{r}×C}$ 并且 $W_2\in R^{C×\frac{C}{r}}$，也就是两个FC层。

$\tilde{x}_c=F_{scale}(u_c, s_c)=s_c\cdot u_c$

3.3 Instantiations

4. Experiments

主要是从分类等场景出发，说明了 SE block 在ResNet,Inception等各种网络和ImageNet, cifar-100等数据集上表现都好。

5. Ablation Study

5.1 Reduction ratio

Reduction ratio r in Excitation

作者设置为r=16.

5.2 Squeeze Operator

作者只比较了max pooling 和avg pooling.这两种方法差不多。

5.3 Excitation Operator

作者比较了 ReLU, Tanh, Sigmoid三种函数，实验证明 Sigmoid 函数更好一些，这里指的是第二个激活函数。

5.4 Different stages

5.5 Integration strategy

6. code

代码还是很简单的

# SE ResNet50
from torch import nn

class SELayer(nn.Module):
    def __init__(self, channel, reduction=16):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel, bias=False),
            nn.Sigmoid()
        )

    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)

class SEBasicBlock(nn.Module):
    expansion = 1

    def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=16):
        super(SEBasicBlock, self).__init__()
        self.conv1 = conv3x3(inplanes, planes, stride)
        self.bn1 = nn.BatchNorm2d(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = conv3x3(planes, planes, 1)
        self.bn2 = nn.BatchNorm2d(planes)
        self.se = SELayer(planes, reduction)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        residual = x
        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.se(out)

        if self.downsample is not None:
            residual = self.downsample(x)

        out += residual
        out = self.relu(out)

        return out