STGAN

0. 前言

reference: CVPR2018_StarGAN：StarGAN Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation
code: pytorch
reference: Valse2018_AttGAN: Facial Attribute Editing by Only Changing What You Want
code: tensorflow, pytorch

这也是一篇做GAN的文章，根据论文的叙述，一直和 StarGAN, AttGAN 做对比。我只看过 StarGANs，没有看过 AttGAN.

1. Introduction

作者首先发现了 skip connection 和 target attribute vector 的缺点: skip connection 对重建有用，但是对于 attribute manipulation 有害，target attribute vector 对重建有害。

作者提出自己的创新点时论据很充分。

作者在使用GAN时，发现 skip connection 可以提高重建图片质量(图片属性和生成图片属性一致时)(reconstruction)，但是会减弱属性变化的程度(图片属性和生成图片不一致时)(attribute manipulation)。

作者从 selective transfer 角度来解决这些问题:

selective: a. 仅仅考虑改变的属性；b. 有选择地拼接 encoder feature and decoder feature
transfer: 对局部和全局属性提供一个统一的框架

Question: transfer 的解释不是很理解，看后面如何应用吧。

创新点：

输入的是 different attribute vector
z针对skip connection, 使用 selective transfer units，同时提高输入属性的影响力和图片质量

2. Proposed Method

2.1 Limitation of Skip Connection in AttGAN

可以看出，增加 skip connections 有益于重建图像(图片属性和生成属性一致)(reconstruction)，但是，会生成其他属性的图片时的精度会降低(图片属性和生成属性不一致)(attribute manipulation ability)。主要原因是直接 concatenates encoder and decoder features.

作者为了解决这个问题，使用 selective transfer units to adaptively transform encoder features guided by attributes to be changed.

2.2 Taking Difference Attribute Vector as Input

StarGAN and AttGAN 是把 target attribute vector $att_t$ 做为输入，作者通过观察 StarGAN and AttGAN 的重建图片，发现 attribute vector 会对重建图片有害。其实也不难理解，要想不发生变换，最好是 x0，但是 attribute vector 肯定不是0，卷积层也不可能是0，自然 attribute vector 会改变生成的图片，尤其是重建图片，因为此时是希望完全不变的，要求更高一些。

作者针对 target attribute vector 的缺点提出 difference attribute vector。

$att_{diff}=att_t-att_s$

2.3 Selective Transfer Units(STU)

作者使用修改版的 GRU 来建立 STUs 作为 skip connection.

设第 $l$ 层的 encoder feature 表示为 $f_{enc}^l$，第 $l+1$ 层的 hidden state 为 $s^{l+1}$，从而有下述方程，更新 hidden state $s^l$ 和 transformed encoder feature $f_t^l$:

$\begin{aligned} \hat{s}^{l+1}&=W_{t*T}[s^{l+1}, att_{diff}]\\ r^l &= \sigma(W_r * [f_{enc}^l, \hat{s}^{l+1}])\\ z^l &= \sigma(W_z * [f_{enc}^l, \hat{s}^{l+1}])\\ s^l&=r^l\circ \hat{s}^{l+1}\\ \hat{f}_t^l &= tanh(W_h*[f_{enc}^l, s^l])\\ f_t^l &= (1-z^l)\circ\hat{s}^{l+1}+z^l\circ \hat{f}_t^l \end{aligned}$

其中 $[\cdot ,\cdot]$ 表示 concatenation operation，$*T$ 表示 transposed convolution.

$*$ 表示 convolution operation，$\circ$ 表示 entry-wise product，$\sigma(\cdot)$ 表示 sigmoid function.

看着很像常规意义上的 LSTM 的方程，但是自己在这方面接触比较少。所以简单地记录下GRU的原始公式。

两者的公式基本上是一致的，但是不太懂其原理或者可以达到的效果。另外其含义也发生了一些变化，主要是 GRU 中，$f_t^l$ 是 output of hidden state，作者把 $s^l$ 作为 output of hidden state，$f_t^l$ 作为 output of transformed encoder feature。

不懂。但我觉得能提出这个，就很厉害。

2.4 Network Architecture

STGAN 分为两部分: generator G and discriminator D.

G: $G{enc}$ and $G{dec}$，都分别有5个卷积操作。STU应用在前四个 encoder layers,即

$(f_t^l, s^l)=G_{st}^l(f_{enc}^l, s^{l+1}, att_{diff})$

D: $D{adv}$ and $D{att}$. $D{adv}$ and $D{att}$ 共享前五个卷积层，分别有两个全连接层用于预测。

2.5 Loss Functions

第一步: 给定一张图片 $x$，可以得到 encoder features:

$f=\lbrace f_{enc}^1, f_{enc}^2,..., f_{enc}^5 \rbrace=G_{enc}(x)$

第二步: 进而通过 STUs 得到 transform encoder features:

$(f_t^l, s^l)=G_{st}^l(f_{enc}^l, s^{l+1}, att_{diff})$

其中，不同的 STU 之间不共享参数。
即：

$f_t=\lbrace f_t^1, f_t^2, f_t^3, f_t^4 \rbrace$

第三步: 得到结果,

$\hat{y}=G_{dec}(f_{enc}^5, f_t)$

即:

$\hat{y}=G(x, att_{diff})$

Reconstruction loss:

$L_{rec}=\parallel x-G(x,0) \parallel_1$

Adversarial loss:

$\max_{D_{adv}} L_{D_{adv}} = \mathbb{E}_x D _{adv}(x)-\mathbb{E} _{\hat{y}} D _{adv}(\hat{y})+\lambda \mathbb{E} [ (\parallel \nabla _{\hat{x}} D _{adv}(\hat{x}) \parallel_2 -1)^2 ]$ $\max_{G} L_{G_{adv}} = \mathbb{E}_{x, att _{diff}} D _{adv}(G(x, att _{diff}))$

其中 $\hat{x}$ 表示真和生成图片的线性插值，在 StarGAN 中也见到过。

Attribute manipulation loss:

$L_{D_{att}}=- \sum _{i=1}^c[ att_s^{(i)}\log D_{att}^{(i)}(x)+(1-att_s^{(i)})\log (1-D_{att}^{(i)}(x)) ]$ $L_{G_{att}}=- \sum _{i=1}^c[ att_t^{(i)}\log D_{att}^{(i)}(\hat{y})+(1-att_s^{(i)})\log (1-D_{att}^{(i)}(\hat{y})) ]$

这里应该只是一个简单的分类器。

Question: 为什么还会有第二项？

Model Objective:

$\min_D L_D= -L_{D_{adv}} + \lambda_1 L_{D_{att}}$ $\min_G L_G= -L_{G_{adv}} + \lambda_2 L_{G_{att}} + \lambda_3 L_{rec}$

4. Experiments

结果很棒，在真实性和分类准确率上都要比其它的GAN好很多。

5. Ablation Study

STGAN: original STGAN
STGAN-dst: target attribute vector, not difference attribute vector
STGAN-conv: conv(encoder feature and difference attribute vector), not STU
STGAN-conv-res: residual learning formulation to learn the convolution operator in STGAN-conv
STGAN-gru: GRU not STU
STGAN-res: residual learning formulation to learn the STU in STGAN

作者的对比实验做的是真多。

Difference attribute vector vs. target attribute vector: 可以看出，AttGAN, StarGAN, STGAN 三个模型上 difference attribute vector 都要好于 target attribute vector。

Selective Transfer Unit vs. its variants: STGAN-conv, STGAN-conv-res 性能很低，STGAN-gru, STGAN-res, STGAN 三种方法的性能差不太多，可能具体到某个属性会略有区别，因为作者没有一个最终的指标，所以说不上来到底差多少，但是从各个属性上看，STGAN最好，其他两个略微低一些。

Question: 具体 GRU, STU 的工作机制有机会的话还是需要多了解一下。

6. code

因为最近在跑实验，所以没法具体跑论文的代码，以后再看情况把。