0%

Grad-CAM

Grad-CAM:Visual Explanations from Deep Networks via Gradient-based Localization

Gradient-weighted Class Activation Mapping

Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam

code-torch: https://github.com/ramprs/grad-cam/

code-pytorch: https://github.com/jacobgil/pytorch-grad-cam

code-keras: https://github.com/jacobgil/keras-grad-cam

参考链接:

1. 声明

最近在看到一篇person-reid的文章Re-Identification with Consistent Attentive Siamese Networks,其中涉及到了Grad-CAM,所以简单学习一下Grad-CAM,但不作为重点。

2018-12-18

在使用的过程中,发现自己写的这篇博客不太容易让自己一目了然,所以根据链接来进行更新一次。

2018-12-20

在重新看论文person-reid的过程中,发现其中涉及到的网络架构师GAIN,所以补充GAIN的说明及其代码。此时,这篇的重点变成了Grad-CAM和GAIN。

2. 前言

对于深度模型的可解释性和可视化,现在已经研究出了一些方法,包括不限于Deconvolution, Guided-Backpropagation, CAM, Grad-CAM.

其中 Deconvolution 和 Guided-Backpropagation 得到更偏向于细粒度图, CAM 和 Grad-CAM 得到更偏向于类区分的热力图。

各种可视化方法及其效果图参见:https://github.com/utkuozbulak/pytorch-cnn-visualizations

参考链接: https://blog.csdn.net/geek_wh2016/article/details/81060315

2.1 Deconvolution

Deconvolution: Visualizing and Understanding Convolutional Networks

code: https://github.com/kvfrans/feature-visualization

综述: 这篇paper是CNN可视化的开山之作(由 Lecun 得意门生 Matthew Zeiler 发表于2013年),主要解决了两个问题:

  1. why CNN perform so well?
  2. how CNN might be improved?

实现: 对于CNN,可视化就是整个过程的逆过程,即Unpooling+ReLU+Deconv.

  1. Unpooling: 记录max-pool的位置,即Switches表格,unpooling时,最大值放回该位置,其他位置放0.
  2. ReLU: 继续使用ReLU.
  3. Deconv: 使用相同卷积核的转置作为新的卷积核,对特征进行卷积.

参考链接:

Deconvolution的结构图

2.2 Guided-Backpropagation

Guided-Backpropagation: Striving for Simplicity: The All Convolutional Net

反向传播、反卷积和导向反向传播都是反向传播,区别在于经过 ReLU 层时对梯度的不同处理策略。在这篇论文中有详细的解释。

计算公式如下:

反向传播梯度的选择

文中提出使用 stride convolution 代替 pooling,研究这种结构的有效性。

效果显示如下:

三种反向传播的效果

可以看出 Guided-Backpropagation 主要提取对分类有效果的特征,但是与是哪类没有关系。

2.3 CAM

CAM: Learning Deep Features for Discriminative Localization

综述:论文重新审视了global average pooling (GAP) 的有效性,并详细阐述了GAP如何使得CNN有优异的目标定位能力。
介绍:摒弃FC,使用GAP。
实现

CAM的网络结构

结果图

2.4 Grad-CAM

Grad-CAM 是CAM的改进版, 与 CAM 的不同点在于前者的特征加权系数是反向传播得到的,后者的特征加权系数是分类器的权重。

Grad-CAM 可以加载到任意网络架构上,而不需要修改网络架构,而CAM必须使用GAP。

下面会详细介绍。

2.5 GAIN

GAIN: Tell Me Where to Look: Guided Attention Inference Network

code: https://github.com/alokwhitewolf/Guided-Attention-Inference-Network

GAIN 是 Grad-CAM 的改进版,Grad-CAM只能可视化解释现有的网络结构的结果,却不能指导网络架构,GAIN可以指导网络修正错误,关注更正确的位置。

问题:在船识别的过程中,网络的关注点是水面而不是船。

注意力错误

实现:通过最小化遮挡图像的物体来训练。

GAIN的网络架构

整体网络架构中,只有一个网络,两个处理流都是共享同一个网络。

公式:损失函数

扩展:如果有额外的监督真值,比如分割,那么可以进行扩充网络

扩充的网络架构

2.5.1 GAIN-code

第一步:训练分类网络

FCN:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
self.conv1_1 = L.Convolution2D(3, 64, 3, 1, 1)
self.conv1_2 = L.Convolution2D(64, 64, 3, 1, 1)

self.conv2_1 = L.Convolution2D(64, 128, 3, 1, 1)
self.conv2_2 = L.Convolution2D(128, 128, 3, 1, 1)

self.conv3_1 = L.Convolution2D(128, 256, 3, 1, 1)
self.conv3_2 = L.Convolution2D(256, 256, 3, 1, 1)
self.conv3_3 = L.Convolution2D(256, 256, 3, 1, 1)

self.conv4_1 = L.Convolution2D(256, 512, 3, 1, 1)
self.conv4_2 = L.Convolution2D(512, 512, 3, 1, 1)
self.conv4_3 = L.Convolution2D(512, 512, 3, 1, 1)

self.conv5_1 = L.Convolution2D(512, 512, 3, 1, 1)
self.conv5_2 = L.Convolution2D(512, 512, 3, 1, 1)
self.conv5_3 = L.Convolution2D(512, 512, 3, 1, 1)

self.fc6 = L.Convolution2D(512, 4096, 7, 1, 0)
self.fc7 = L.Convolution2D(4096, 4096, 1, 1, 0)
self.score_fr = L.Convolution2D(4096, n_class, 1, 1, 0)

def segment(self, x, t=None):
# conv1
self.conv1_1.pad = (100, 100)
h = F.relu(self.conv1_1(x))
conv1_1 = h
h = F.relu(self.conv1_2(conv1_1))
conv1_2 = h
h = _max_pooling_2d(conv1_2)
pool1 = h # 1/2

# conv2
h = F.relu(self.conv2_1(pool1))
conv2_1 = h
h = F.relu(self.conv2_2(conv2_1))
conv2_2 = h
h = _max_pooling_2d(conv2_2)
pool2 = h # 1/4

# conv3
h = F.relu(self.conv3_1(pool2))
conv3_1 = h
h = F.relu(self.conv3_2(conv3_1))
conv3_2 = h
h = F.relu(self.conv3_3(conv3_2))
conv3_3 = h
h = _max_pooling_2d(conv3_3)
pool3 = h # 1/8

# conv4
h = F.relu(self.conv4_1(pool3))
h = F.relu(self.conv4_2(h))
h = F.relu(self.conv4_3(h))
h = _max_pooling_2d(h)
pool4 = h # 1/16

# conv5
h = F.relu(self.conv5_1(pool4))
h = F.relu(self.conv5_2(h))
h = F.relu(self.conv5_3(h))
h = _max_pooling_2d(h)
pool5 = h # 1/32

# fc6
h = F.relu(self.fc6(pool5))
h = F.dropout(h, ratio=.5)
fc6 = h # 1/32

# fc7
h = F.relu(self.fc7(fc6))
h = F.dropout(h, ratio=.5)
fc7 = h # 1/32

# score_fr
h = self.score_fr(fc7)
score_fr = h # 1/32

# score_pool3
h = self.score_pool3(pool3)
score_pool3 = h # 1/8

# score_pool4
h = self.score_pool4(pool4)
score_pool4 = h # 1/16

# upscore2
h = self.upscore2(score_fr)
upscore2 = h # 1/16

# score_pool4c
h = score_pool4[:, :,
5:5 + upscore2.shape[2],
5:5 + upscore2.shape[3]]
score_pool4c = h # 1/16

# fuse_pool4
h = upscore2 + score_pool4c
fuse_pool4 = h # 1/16

# upscore_pool4
h = self.upscore_pool4(fuse_pool4)
upscore_pool4 = h # 1/8

# score_pool4c
h = score_pool3[:, :,
9:9 + upscore_pool4.shape[2],
9:9 + upscore_pool4.shape[3]]
score_pool3c = h # 1/8

# fuse_pool3
h = upscore_pool4 + score_pool3c
fuse_pool3 = h # 1/8

# upscore8
h = self.upscore8(fuse_pool3)
upscore8 = h # 1/1

# score
h = upscore8[:, :, 31:31 + x.shape[2], 31:31 + x.shape[3]]
score = h # 1/1
self.score = score

if t is None:
assert not chainer.config.train
return

loss = F.softmax_cross_entropy(score, t, normalize=True)
if np.isnan(float(loss.data)):
raise ValueError('Loss is nan.')
chainer.report({'loss': loss}, self)
self.conv1_1.pad = (1, 1)
return loss

FCN-v1.0: 普通的分类

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
self.conv1_1 = L.Convolution2D(3, 64, 3, 1, 1)
self.conv1_2 = L.Convolution2D(64, 64, 3, 1, 1)

self.conv2_1 = L.Convolution2D(64, 128, 3, 1, 1)
self.conv2_2 = L.Convolution2D(128, 128, 3, 1, 1)

self.conv3_1 = L.Convolution2D(128, 256, 3, 1, 1)
self.conv3_2 = L.Convolution2D(256, 256, 3, 1, 1)
self.conv3_3 = L.Convolution2D(256, 256, 3, 1, 1)

self.conv4_1 = L.Convolution2D(256, 512, 3, 1, 1)
self.conv4_2 = L.Convolution2D(512, 512, 3, 1, 1)
self.conv4_3 = L.Convolution2D(512, 512, 3, 1, 1)

self.conv5_1 = L.Convolution2D(512, 512, 3, 1, 1)
self.conv5_2 = L.Convolution2D(512, 512, 3, 1, 1)
self.conv5_3 = L.Convolution2D(512, 512, 3, 1, 1)

self.fc6_cl = L.Linear(512, 4096)
self.fc7_cl = L.Linear(4096, 4096)
self.score_cl = L.Linear(4096, n_class-1) # Disregard 0 class for classification

self.final_conv_layer = 'conv5_3'
self.grad_target_layer = 'prob'
self.freezed_layers = ['fc6_cl', 'fc7_cl', 'score_cl']

def classify(self, x, is_training=True):
with chainer.using_config('train',False):
# conv1
h = F.relu(self.conv1_1(x))
h = F.relu(self.conv1_2(h))
h = _max_pooling_2d(h)

# conv2
h = F.relu(self.conv2_1(h))
h = F.relu(self.conv2_2(h))
h = _max_pooling_2d(h)

# conv3
h = F.relu(self.conv3_1(h))
h = F.relu(self.conv3_2(h))
h = F.relu(self.conv3_3(h))
h = _max_pooling_2d(h)

# conv4
h = F.relu(self.conv4_1(h))
h = F.relu(self.conv4_2(h))
h = F.relu(self.conv4_3(h))
h = _max_pooling_2d(h)

# conv5
h = F.relu(self.conv5_1(h))
h = F.relu(self.conv5_2(h))
h = F.relu(self.conv5_3(h))
h = _max_pooling_2d(h)
h = _average_pooling_2d(h)

with chainer.using_config('train',is_training):
h = F.relu(F.dropout(self.fc6_cl(h), .5))
h = F.relu(F.dropout(self.fc7_cl(h), .5))
h = self.score_cl(h)
# 1*20

return h

loss:

1
2
3
4
# cl_output=classify(image)
# cl_output: 1*20
# target: 1*20 ~ [0,1] 1表示有这个label,0表示没有这个label,用的是多分类损失函数,且类别之间不排斥,类似对每个类别做二元分类。
loss = F.sigmoid_cross_entropy(cl_output, target, normalize=True)

第二步:训练GAIN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
self.GAIN_functions = collections.OrderedDict([
('conv1_1', [self.conv1_1, F.relu]),
('conv1_2', [self.conv1_2, F.relu]),
('pool1', [_max_pooling_2d]),

('conv2_1', [self.conv2_1, F.relu]),
('conv2_2', [self.conv2_2, F.relu]),
('pool2', [_max_pooling_2d]),

('conv3_1', [self.conv3_1, F.relu]),
('conv3_2', [self.conv3_2, F.relu]),
('conv3_3', [self.conv3_3, F.relu]),
('pool3', [_max_pooling_2d]),

('conv4_1', [self.conv4_1, F.relu]),
('conv4_2', [self.conv4_2, F.relu]),
('conv4_3', [self.conv4_3, F.relu]),
('pool4', [_max_pooling_2d]),

('conv5_1', [self.conv5_1, F.relu]),
('conv5_2', [self.conv5_2, F.relu]),
('conv5_3', [self.conv5_3, F.relu]),
('pool5', [_max_pooling_2d]),

('avg_pool', [_average_pooling_2d]),

('fc6_cl', [self.fc6_cl, F.relu]),
('fc7_cl', [self.fc7_cl, F.relu]),
('prob', [self.score_cl, F.sigmoid])

])
self.final_conv_layer = 'conv5_3'
self.grad_target_layer = 'prob'
self.freezed_layers = ['fc6_cl', 'fc7_cl', 'score_cl']

通过分类结果获取mask:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
def stream_cl(self, inp, label=None):
# h: 1*3*281*500
# label: 真值 array([0, 14])
# return: gcam: mask,size(1,3,281,500); h: size(1,20), class_id: 一个数字
h = inp
for key, funcs in self.GAIN_functions.items():
for func in funcs:
h = func(h)
if key == self.final_conv_layer:
activation = h
if key == self.grad_target_layer:
break

gcam, class_id = self.get_gcam(h, activation, (inp.shape[-2], inp.shape[-1]), label=label)
return gcam, h, class_id

def get_gcam(self, end_output, activations, shape, label):
# end_output: size: 1,20
# activations: size: 1*512*18*32
# shape: (281, 500)
# label: 真值
self.cleargrads()
class_id = self.set_init_grad(end_output, label)
end_output.backward(retain_grad=True)
grad = activations.grad_var
grad = F.average_pooling_2d(grad, (grad.shape[-2], grad.shape[-1]), 1)
grad = F.expand_dims(F.reshape(grad, (grad.shape[0]*grad.shape[1], grad.shape[2], grad.shape[3])), 0)
weights = activations
weights = F.expand_dims(F.reshape(weights, (weights.shape[0]*weights.shape[1], weights.shape[2], weights.shape[3])), 0)
gcam = F.resize_images(F.relu(F.convolution_2d(weights, grad, None, 1, 0)), shape)
return gcam, class_id

$L_{cl}$

1
2
gcam, cl_scores, class_id = self._optimizers['main'].target.stream_cl(image, gt_labels)
cl_loss = F.sigmoid_cross_entropy(cl_scores, target, normalize=True)

$L_{am}$

1
2
3
masked_output = self._optimizers['main'].target.stream_am(masked_image)
masked_output = F.sigmoid(masked_output)
am_loss = masked_output[0][class_id][0]

备注: $L{cl}$和$L{am}$完全共享网络。

3. Introduction

可视化即应该满足高分辨率,也应该满足类别定位能力。

示例图像

示例图像

4. Approach

CAM
在CAM中,一个全连接层替换成GAP,参见上面的CAM图,则分类任务可以表示成

其中,$y^c$表示分类结果,$wk^c$表示第k个特征图(kxhxw)对第c个类别的贡献,即全连接层的系数,$Z$表示特征图的大小,$Z=h\cdot w$,$A{ij}^k$表示第k个特征图。

则 CAM 的输出图表示为:

Grad-CAM
在Grad-CAM中,权重系数是通过反向传播得到的。

则Grad-CAM的输出图表示为:

可以证明,Grad-CAM与CAM的公式是同一个公式的变形。

Guided Grad-CAM
Guided Grad-CAM 是将 Grad-CAM 与 Guided Backpropagation 得到的输出图简单地点乘,从而获得类区分定位的高分辨率细节图。

Guided-Grad-CAM的网络架构

同时作者还分析了CNN分类错误的样本。
Guided-Grad-CAM分类错误的样本

5. 代码

对于pytorch代码进行分析

5.1 Grad-CAM

计算Grad-CAM:

反向传播:先计算出当前图片的分类结果output(size:1*5)(假设共5类),选出最优分类结果,假设是第2类,然后令one-hot=[0,1,0,0,0],求得sum-one-hot=得到一个数字,然后反向传播。

cam: (H,W), ~(0,1)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134

class FeatureExtractor(object):
""" Class for extracting activations and
registering gradients from targetted intermediate layers """
"""
调用方式:
outputs, x = FeatureExtractor(x)
gradients = FeatureExtractor.gradients
"""

def __init__(self, model, target_layers):
self.model = model
self.target_layers = target_layers
self.gradients = []

def save_gradient(self, grad):
self.gradients.append(grad)

def __call__(self, x):
"""

:param x: N*C*H*W, a picture
:return: outputs: list, activations layer output, A in equation
x : feature map, feature of model output n*c*h*w
"""
outputs = []
self.gradients = []
for name, module in self.model._modules.items():
x = module(x)
if name in self.target_layers:
x.register_hook(self.save_gradient)
outputs += [x]
return outputs, x


class ModelOutputs(object):
""" Class for making a forward pass, and getting:
1. The network output.
2. Activations from intermeddiate targetted layers.
3. Gradients from intermeddiate targetted layers. """

"""
调用方式:
target_activations, output = ModelOutputs(x)
gradients = ModelOutputs.get_gradients()
"""

def __init__(self, model, target_layers):
self.model = model
self.feature_extractor = FeatureExtractor(self.model.features, target_layers)

def get_gradients(self):
return self.feature_extractor.gradients

def __call__(self, x):
"""

:param x: N*C*H*W, a picture
:return: target_activations: list, activations layer output, A in equation
output : tensor, classification output. N*c. y in equation.
"""
target_activations, output = self.feature_extractor(x)
output = output.view(output.size(0), -1)
output = self.model.classifier(output)
return target_activations, output


class GradCam(object):
"""
Class for making Grad-CAM, and getting:
1. Grad-CAM
"""
"""
调用方式:
mask=GradCam(input)
"""

def __init__(self, model, target_layer_names, use_cuda):
self.model = model
self.model.eval()
self.cuda = use_cuda
if self.cuda:
self.model = model.cuda()

self.extractor = ModelOutputs(self.model, target_layer_names)

def forward(self, input):
return self.model(input)

def __call__(self, input, index=None):
"""

:param input: N*C*H*W, a picture
:param index: int
:return: cam: N*C*H*W ~(0,1) L_{Grad-CAM}^c
"""
if self.cuda:
features, output = self.extractor(input.cuda())
else:
features, output = self.extractor(input)

if index == None:
index = np.argmax(output.cpu().data.numpy())

one_hot = np.zeros((1, output.size()[-1]), dtype=np.float32)
one_hot[0][index] = 1
# After test, requires_grad could be False
one_hot = Variable(torch.from_numpy(one_hot), requires_grad=True)
if self.cuda:
one_hot = torch.sum(one_hot.cuda() * output)
else:
one_hot = torch.sum(one_hot * output)

self.model.features.zero_grad()
self.model.classifier.zero_grad()
# After test, requires_grad could be False
one_hot.backward(retain_graph=True)

grads_val = self.extractor.get_gradients()[-1].cpu().data.numpy()

target = features[-1]
target = target.cpu().data.numpy()[0, :]

weights = np.mean(grads_val, axis=(2, 3))[0, :]
cam = np.zeros(target.shape[1:], dtype=np.float32)

for i, w in enumerate(weights):
cam += w * target[i, :, :]

cam = np.maximum(cam, 0)
cam = cv2.resize(cam, (224, 224))
cam = cam - np.min(cam)
cam = cam / np.max(cam)
return cam

显示Grad-CAM

1
2
3
4
5
6
7
8

def show_cam_on_image(img, mask):
heatmap = cv2.applyColorMap(np.uint8(255 * mask), cv2.COLORMAP_JET)
heatmap = np.float32(heatmap) / 255
cam = heatmap + np.float32(img)
cam = cam / np.max(cam)
cv2.imwrite("cam.jpg", np.uint8(255 * cam))

5.2 GuidedBackpropReLUModel

gb: (C,H,W) 任意值

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
class GuidedBackpropReLU(Function):

def forward(self, input):
positive_mask = (input > 0).type_as(input)
output = torch.addcmul(torch.zeros(input.size()).type_as(input), input, positive_mask)
self.save_for_backward(input, output)
return output

def backward(self, grad_output):
input, output = self.saved_tensors
grad_input = None

positive_mask_1 = (input > 0).type_as(grad_output)
positive_mask_2 = (grad_output > 0).type_as(grad_output)
grad_input = torch.addcmul(torch.zeros(input.size()).type_as(input),
torch.addcmul(torch.zeros(input.size()).type_as(input), grad_output,
positive_mask_1), positive_mask_2)

return grad_input


class GuidedBackpropReLUModel:
def __init__(self, model, use_cuda):
self.model = model
self.model.eval()
self.cuda = use_cuda
if self.cuda:
self.model = model.cuda()

# replace ReLU with GuidedBackpropReLU
for idx, module in self.model.features._modules.items():
if module.__class__.__name__ == 'ReLU':
self.model.features._modules[idx] = GuidedBackpropReLU()

def forward(self, input):
return self.model(input)

def __call__(self, input, index=None):
if self.cuda:
output = self.forward(input.cuda())
else:
output = self.forward(input)

if index == None:
index = np.argmax(output.cpu().data.numpy())

one_hot = np.zeros((1, output.size()[-1]), dtype=np.float32)
one_hot[0][index] = 1
# After test, requires_grad could be False
one_hot = Variable(torch.from_numpy(one_hot), requires_grad=True)
if self.cuda:
one_hot = torch.sum(one_hot.cuda() * output)
else:
one_hot = torch.sum(one_hot * output)

# self.model.features.zero_grad()
# self.model.classifier.zero_grad()
one_hot.backward(retain_graph=True)

output = input.grad.cpu().data.numpy()
output = output[0, :, :, :]

return output

5.3 Guided Grad-CAM

1
2
3
4
5
6
cam_mask = np.zeros(gb.shape)
for i in range(0, gb.shape[0]):
cam_mask[i, :, :] = mask

cam_gb = np.multiply(cam_mask, gb)
utils.save_image(torch.from_numpy(cam_gb), 'cam_gb.jpg')

6. 效果显示

原图

原图

Grad-CAM

Grad-CAM

Guided-Backpropagation

Guided-Backpropagation

Guided Grad-CAM

Guided Grad-CAM