0%

SE block

0. 前言

这篇文章主要是通过在神经网络中加入 SE-block 来加强通道之间的关系,提高性能,理论上讲可以加入任意网络任意任务,并且这篇文章获得了 ImageNet2017 的冠军。很牛逼。

这篇文章清晰易懂,讲得很细(没用的话也比较多),很work,一起拜读一下。

1. Introduction

  • SE block: Squeeze and Excitation block

符号表达:$F_{tr}:X\to U, X\in R^{H’×W’×C’}, U\in R^{H×W×C}$

SE block 在底层时更偏向于提取任务之间的共享特征,在高层时更偏向于提取任务相关的特征。

3. Squeeze and Excitation Blocks

3.1 Squeeze: Global Information Embedding

其中,$z\in R^C$

3.2 Excitation: Adaptive Recalibration

其中,$\delta$表示ReLU函数,$W_1\in R^{\frac{C}{r}×C}$ 并且 $W_2\in R^{C×\frac{C}{r}}$,也就是两个FC层。

3.3 Instantiations

4. Experiments

主要是从分类等场景出发,说明了 SE block 在ResNet,Inception等各种网络和ImageNet, cifar-100等数据集上表现都好。

5. Ablation Study

5.1 Reduction ratio

Reduction ratio r in Excitation

作者设置为r=16.

5.2 Squeeze Operator

作者只比较了max pooling 和avg pooling.这两种方法差不多。

5.3 Excitation Operator

作者比较了 ReLU, Tanh, Sigmoid三种函数,实验证明 Sigmoid 函数更好一些,这里指的是第二个激活函数。

5.4 Different stages

5.5 Integration strategy

6. code

代码还是很简单的

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
# SE ResNet50
from torch import nn

class SELayer(nn.Module):
def __init__(self, channel, reduction=16):
super(SELayer, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(channel, channel // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channel // reduction, channel, bias=False),
nn.Sigmoid()
)

def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
return x * y.expand_as(x)

class SEBasicBlock(nn.Module):
expansion = 1

def __init__(self, inplanes, planes, stride=1, downsample=None, reduction=16):
super(SEBasicBlock, self).__init__()
self.conv1 = conv3x3(inplanes, planes, stride)
self.bn1 = nn.BatchNorm2d(planes)
self.relu = nn.ReLU(inplace=True)
self.conv2 = conv3x3(planes, planes, 1)
self.bn2 = nn.BatchNorm2d(planes)
self.se = SELayer(planes, reduction)
self.downsample = downsample
self.stride = stride

def forward(self, x):
residual = x
out = self.conv1(x)
out = self.bn1(out)
out = self.relu(out)

out = self.conv2(out)
out = self.bn2(out)
out = self.se(out)

if self.downsample is not None:
residual = self.downsample(x)

out += residual
out = self.relu(out)

return out