以下链接是个人关于mmaction2(SlowFast-动作识别) 所有见解,如有错误欢迎大家指出,我会第一时间纠正。有兴趣的朋友可以加微信:17575010159 相互讨论技术。若是帮助到了你什么,一定要记得点赞!因为这是对我最大的鼓励。 文末附带 \color{blue}{文末附带} 文末附带 公众号 − \color{blue}{公众号 -} 公众号− 海量资源。 \color{blue}{ 海量资源}。 海量资源。
动作识别0-00:mmaction2(SlowFast)-目录-史上最新无死角讲解
极度推荐的商业级项目: \color{red}{极度推荐的商业级项目:} 极度推荐的商业级项目:这是本人落地的行为分析项目,主要包含(1.行人检测,2.行人追踪,3.行为识别三大模块):行为分析(商用级别)00-目录-史上最新无死角讲解
前言根据前面的博客,我们以及分析了 SlowFast 的主干网络,现在我们再来看看其头部分类网络SlowFastHead。其主要代码位于/mmaction/models/heads/slowfast_head.py。可以看到 c l a s s S l o w F a s t H e a d ( B a s e H e a d ) \color{red}{class SlowFastHead(BaseHead)} classSlowFastHead(BaseHead),很显然 classSlowFastHead 继承于 BaseHead,那么我们分别来看看 classSlowFastHead 以及BaseHead。本人注释如下:
BaseHeadclass BaseHead(nn.Module, metaclass=ABCMeta):
"""Base class for head.
All Head should subclass it.,所有的 Head 都应该继承于差
All subclass should overwrite: 所有的子类都需要重写init_weights以及forward函数
- Methods:``init_weights``, initializing weights in some modules.模型初始化函数
- Methods:``forward``, supporting to forward both for training and testing.训练测试的前向传播
Args:
num_classes (int): Number of classes to be classified.训练数据的动作类别数目
in_channels (int): Number of channels in input feature.输入特征的通道数
loss_cls (dict): Config for building loss.配置计算loss的方式,默认使用交叉损失熵
Default: dict(type='CrossEntropyLoss').
multi_class (bool): Determines whether it is a multi-class
recognition task. Default: False.确定是否使用多类别标签
label_smooth_eps (float): Epsilon used in label smooth.对标签进行平滑,默认为0
Reference: arxiv.org/abs/1906.02629. Default: 0.
loss_factor (float): Factor scalar multiplied on the loss.对loss进行缩放
Default: 1.0.
"""
def __init__(self,
num_classes,
in_channels,
loss_cls=dict(type='CrossEntropyLoss', loss_factor=1.0),
multi_class=False,
label_smooth_eps=0.0):
super().__init__()
self.num_classes = num_classes # 数据集行为动作的总类别数目
self.in_channels = in_channels # 输入特征的通道数
self.loss_cls = build_loss(loss_cls) # 根据loss_cls选择损失函数
self.multi_class = multi_class # 是否使用多标签进行训练
self.label_smooth_eps = label_smooth_eps # 对标签进行平滑
@abstractmethod # 子类需要实现该函数
def init_weights(self):
"""Initiate the parameters either from existing checkpoint or from
scratch."""
pass
@abstractmethod # 子类需要实现该函数
def forward(self, x):
"""Defines the computation performed at every call."""
pass
def loss(self, cls_score, labels):
"""Calculate the loss given output ``cls_score`` and target ``labels``.
Args:
cls_score (torch.Tensor): The output of the model. 模型输出的类别置信度
labels (torch.Tensor): The target output of the model. # 目标对应的标签
Returns:
dict: A dict containing field 'loss_cls'(mandatory) 包含字段'loss_cls'的dict(强制)
and 'top1_acc', 'top5_acc'(optional). 和“top1_acc”,“top5_acc”(可选)
"""
# 用来返回的字典
losses = dict()
if labels.shape == torch.Size([]):
labels = labels.unsqueeze(0)
# 如果没有使用多标签训练,对一个batch,计算其top1_acc以及top5_acc
if not self.multi_class:
top_k_acc = top_k_accuracy(cls_score.detach().cpu().numpy(),
labels.detach().cpu().numpy(), (1, 5))
losses['top1_acc'] = torch.tensor(
top_k_acc[0], device=cls_score.device)
losses['top5_acc'] = torch.tensor(
top_k_acc[1], device=cls_score.device)
# 如果使用标签平滑
elif self.label_smooth_eps != 0:
labels = ((1 - self.label_smooth_eps) * labels +
self.label_smooth_eps / self.num_classes)
# 计算loss
losses['loss_cls'] = self.loss_cls(cls_score, labels)
return losses
SlowFastHead
@HEADS.register_module()
class SlowFastHead(BaseHead):
"""The classification head for Slowfast.
Args:
num_classes (int): Number of classes to be classified.
in_channels (int): Number of channels in input feature.
loss_cls (dict): Config for building loss.
Default: dict(type='CrossEntropyLoss').
spatial_type (str): Pooling type in spatial dimension. Default: 'avg'.
dropout_ratio (float): Probability of dropout layer. Default: 0.8.
init_std (float): Std value for Initiation. Default: 0.01.
"""
def __init__(self,
num_classes, # 数据集的类别数目,本人设置为101
in_channels, # 输入通数目,由主干网络输出的 x_fast + x_slow 的通道数为 2048 + 256 = 2304
loss_cls=dict(type='CrossEntropyLoss'), # 计算 loss 使用 交叉损失熵
spatial_type='avg', # 对于空间维度使用平均pooling层
dropout_ratio=0.8, # 默认dropout参数
init_std=0.01): #
super().__init__(num_classes, in_channels, loss_cls)
self.spatial_type = spatial_type # 对于空间维度使用平均pooling层
self.dropout_ratio = dropout_ratio # dropout参数
self.init_std = init_std # 权重初始化参数
# 如果dropout_ratio不为0,这进行Dropout
if self.dropout_ratio != 0:
self.dropout = nn.Dropout(p=self.dropout_ratio)
else:
self.dropout = None
# 进行全连接,或者分类
self.fc_cls = nn.Linear(in_channels, num_classes)
# 进行3d avg pooling操作
if self.spatial_type == 'avg':
self.avg_pool = nn.AdaptiveAvgPool3d((1, 1, 1))
else:
self.avg_pool = None
def init_weights(self): #重写权重初始化函数
"""Initiate the parameters from scratch."""
normal_init(self.fc_cls, std=self.init_std)
def forward(self, x):
"""Defines the computation performed at every call.
Args:
x (torch.Tensor): The input data.
Returns:
torch.Tensor: The classification scores for input samples.
"""
# ([N, channel_fast, T, H, W], [(N, channel_slow, T, H, W)])
x_fast, x_slow = x
# ([N, channel_fast, 1, 1, 1], [N, channel_slow, 1, 1, 1])
x_fast = self.avg_pool(x_fast)
x_slow = self.avg_pool(x_slow)
# [N, channel_fast + channel_slow, 1, 1, 1]
x = torch.cat((x_slow, x_fast), dim=1)
if self.dropout is not None:
x = self.dropout(x)
# [N x C]
x = x.view(x.size(0), -1)
# [N x num_classes],全链接操作
cls_score = self.fc_cls(x)
print(torch.sum(cls_score))
细节分析
可以看到,SlowFastHead 在初始化的时候,需要一些参数,其主要在cdg文件中进行设置,如本人的configs/recognition/slowfast/my_slowfast_r50_4x16x1_256e_ucf101_rgb.py中有如下配置:
cls_head=dict(
type='SlowFastHead',
in_channels=2304,
#num_classes=400,
num_classes=101,
spatial_type='avg',
dropout_ratio=0.5))
还有一个比较细节的地方,就是 SlowFastHead 的前向传播 forward 最后获得的是 cls_score = self.fc_cls(x),其并不是 loss ,那么在训练的时候时候是如何计算loss,并且进行反向传播的呢? 有的朋友可能注意到了,其父类 BaseHead 中实现了 loss 函数,那么他又是在哪里被调用的呢? 在cfg文件中可以看到如下配置:
model = dict(
type='Recognizer3D',
backbone=dict(
这里的 Recognizer3D 就是模型的总体构建构成,可以 mmaction/models/recognizers/recognizer3d.py 代码中,看到如下代码:
@RECOGNIZERS.register_module()
class Recognizer3D(BaseRecognizer):
"""3D recognizer model framework."""
def forward_train(self, imgs, labels):# 训练的前向传播
"""Defines the computation performed at every call when training.定义训练时每次调用执行的计算"""
# 输入转化为 NCTHW 形式
imgs = imgs.reshape((-1, ) + imgs.shape[2:])
# 使用主干网络 ResNet3dSlowFast 进行特征提取
x = self.extract_feat(imgs)
# 使用 SlowFastHead 进行分类
cls_score = self.cls_head(x)
# 获得 ground truch labels
gt_labels = labels.squeeze()
# 调用 SlowFastHead 中的loss函数计算loss
loss = self.cls_head.loss(cls_score, gt_labels)
# 返回 loss 进行反向传播
return loss
def forward_test(self, imgs): # 测试评估的前向传播
"""Defines the computation performed at every call when evaluation and
testing."""
# 输入转化为 NCTHW 形式
imgs = imgs.reshape((-1, ) + imgs.shape[2:])
# 使用主干网络 ResNet3dSlowFast 进行特征提取
x = self.extract_feat(imgs)
# 使用 SlowFastHead 进行分类
cls_score = self.cls_head(x)
# 如果是多标签,这使用sigmod,如果为单标签则使用softmax
cls_score = self.average_clip(cls_score)
# 返回预测的结果
return cls_score.cpu().numpy()
def forward_dummy(self, imgs):# 用于计算网络的FLOPs
"""Used for computing network FLOPs.
See ``mmaction/tools/get_flops.py``.
Args:
imgs (torch.Tensor): Input images.
Returns:
Tensor: Class score.
"""
imgs = imgs.reshape((-1, ) + imgs.shape[2:])
x = self.extract_feat(imgs)
outs = (self.cls_head(x), )
return outs
结语
到这里为止,对于SlowFast的分析可以说是完成了,后面我们就是学习如何去训练自己的数据(多标签或者单标签),以及使用domo进行视频的推断等等。