动作识别0-12：mmaction2(SlowFast)-源码无死角解析（8）-SlowFastHead-分类层

以下链接是个人关于mmaction2(SlowFast-动作识别) 所有见解，如有错误欢迎大家指出，我会第一时间纠正。有兴趣的朋友可以加微信：17575010159 相互讨论技术。若是帮助到了你什么，一定要记得点赞！因为这是对我最大的鼓励。文末附带 \color{blue}{文末附带} 文末附带公众号 − \color{blue}{公众号 -} 公众号− 海量资源。 \color{blue}{ 海量资源}。海量资源。

动作识别0-00：mmaction2(SlowFast)-目录-史上最新无死角讲解

极度推荐的商业级项目： \color{red}{极度推荐的商业级项目：} 极度推荐的商业级项目：这是本人落地的行为分析项目，主要包含（1.行人检测，2.行人追踪，3.行为识别三大模块）：行为分析(商用级别)00-目录-史上最新无死角讲解

前言

根据前面的博客，我们以及分析了 SlowFast 的主干网络，现在我们再来看看其头部分类网络SlowFastHead。其主要代码位于/mmaction/models/heads/slowfast_head.py。可以看到 c l a s s S l o w F a s t H e a d ( B a s e H e a d ) \color{red}{class SlowFastHead(BaseHead)} classSlowFastHead(BaseHead)，很显然 classSlowFastHead 继承于 BaseHead，那么我们分别来看看 classSlowFastHead 以及BaseHead。本人注释如下：

BaseHead

class BaseHead(nn.Module, metaclass=ABCMeta):
    """Base class for head.

    All Head should subclass it.，所有的 Head 都应该继承于差
    All subclass should overwrite: 所有的子类都需要重写init_weights以及forward函数
    - Methods:``init_weights``, initializing weights in some modules.模型初始化函数
    - Methods:``forward``, supporting to forward both for training and testing.训练测试的前向传播

    Args:
        num_classes (int): Number of classes to be classified.训练数据的动作类别数目
        in_channels (int): Number of channels in input feature.输入特征的通道数
        loss_cls (dict): Config for building loss.配置计算loss的方式，默认使用交叉损失熵
            Default: dict(type='CrossEntropyLoss').

        multi_class (bool): Determines whether it is a multi-class
            recognition task. Default: False.确定是否使用多类别标签

        label_smooth_eps (float): Epsilon used in label smooth.对标签进行平滑，默认为0
            Reference: arxiv.org/abs/1906.02629. Default: 0.

        loss_factor (float): Factor scalar multiplied on the loss.对loss进行缩放
            Default: 1.0.
    """

    def __init__(self,
                 num_classes,
                 in_channels,
                 loss_cls=dict(type='CrossEntropyLoss', loss_factor=1.0),
                 multi_class=False,
                 label_smooth_eps=0.0):
        super().__init__()
        self.num_classes = num_classes # 数据集行为动作的总类别数目
        self.in_channels = in_channels # 输入特征的通道数
        self.loss_cls = build_loss(loss_cls) # 根据loss_cls选择损失函数
        self.multi_class = multi_class # 是否使用多标签进行训练
        self.label_smooth_eps = label_smooth_eps # 对标签进行平滑

    @abstractmethod # 子类需要实现该函数
    def init_weights(self):
        """Initiate the parameters either from existing checkpoint or from
        scratch."""
        pass

    @abstractmethod # 子类需要实现该函数
    def forward(self, x):
        """Defines the computation performed at every call."""
        pass

    def loss(self, cls_score, labels):
        """Calculate the loss given output ``cls_score`` and target ``labels``.

        Args:
            cls_score (torch.Tensor): The output of the model. 模型输出的类别置信度
            labels (torch.Tensor): The target output of the model. # 目标对应的标签

        Returns:
            dict: A dict containing field 'loss_cls'(mandatory) 包含字段'loss_cls'的dict(强制)
            and 'top1_acc', 'top5_acc'(optional). 和“top1_acc”,“top5_acc”(可选)
        """
        # 用来返回的字典
        losses = dict()

        if labels.shape == torch.Size([]):
            labels = labels.unsqueeze(0)

        # 如果没有使用多标签训练，对一个batch，计算其top1_acc以及top5_acc
        if not self.multi_class:
            top_k_acc = top_k_accuracy(cls_score.detach().cpu().numpy(),
                                       labels.detach().cpu().numpy(), (1, 5))
            losses['top1_acc'] = torch.tensor(
                top_k_acc[0], device=cls_score.device)
            losses['top5_acc'] = torch.tensor(
                top_k_acc[1], device=cls_score.device)

        # 如果使用标签平滑
        elif self.label_smooth_eps != 0:
            labels = ((1 - self.label_smooth_eps) * labels +
                      self.label_smooth_eps / self.num_classes)

        # 计算loss
        losses['loss_cls'] = self.loss_cls(cls_score, labels)
        return losses

SlowFastHead

@HEADS.register_module()
class SlowFastHead(BaseHead):
    """The classification head for Slowfast.

    Args:
        num_classes (int): Number of classes to be classified.
        in_channels (int): Number of channels in input feature.
        loss_cls (dict): Config for building loss.
            Default: dict(type='CrossEntropyLoss').
        spatial_type (str): Pooling type in spatial dimension. Default: 'avg'.
        dropout_ratio (float): Probability of dropout layer. Default: 0.8.
        init_std (float): Std value for Initiation. Default: 0.01.
    """

    def __init__(self,
                 num_classes, # 数据集的类别数目，本人设置为101
                 in_channels, # 输入通数目，由主干网络输出的 x_fast + x_slow 的通道数为 2048 + 256 = 2304
                 loss_cls=dict(type='CrossEntropyLoss'), # 计算 loss 使用 交叉损失熵
                 spatial_type='avg', # 对于空间维度使用平均pooling层
                 dropout_ratio=0.8, # 默认dropout参数
                 init_std=0.01): #

        super().__init__(num_classes, in_channels, loss_cls)
        self.spatial_type = spatial_type # 对于空间维度使用平均pooling层
        self.dropout_ratio = dropout_ratio # dropout参数
        self.init_std = init_std # 权重初始化参数

        # 如果dropout_ratio不为0，这进行Dropout
        if self.dropout_ratio != 0:
            self.dropout = nn.Dropout(p=self.dropout_ratio)
        else:
            self.dropout = None

        # 进行全连接，或者分类
        self.fc_cls = nn.Linear(in_channels, num_classes)

        # 进行3d avg pooling操作
        if self.spatial_type == 'avg':
            self.avg_pool = nn.AdaptiveAvgPool3d((1, 1, 1))
        else:
            self.avg_pool = None



    def init_weights(self): #重写权重初始化函数
        """Initiate the parameters from scratch."""
        normal_init(self.fc_cls, std=self.init_std)

    def forward(self, x):
        """Defines the computation performed at every call.

        Args:
            x (torch.Tensor): The input data.

        Returns:
            torch.Tensor: The classification scores for input samples.
        """
        # ([N, channel_fast, T, H, W], [(N, channel_slow, T, H, W)])
        x_fast, x_slow = x
        # ([N, channel_fast, 1, 1, 1], [N, channel_slow, 1, 1, 1])
        x_fast = self.avg_pool(x_fast)
        x_slow = self.avg_pool(x_slow)
        # [N, channel_fast + channel_slow, 1, 1, 1]
        x = torch.cat((x_slow, x_fast), dim=1)

        if self.dropout is not None:
            x = self.dropout(x)

        # [N x C]
        x = x.view(x.size(0), -1)
        # [N x num_classes]，全链接操作
        cls_score = self.fc_cls(x)
        print(torch.sum(cls_score))

细节分析

可以看到，SlowFastHead 在初始化的时候，需要一些参数，其主要在cdg文件中进行设置，如本人的configs/recognition/slowfast/my_slowfast_r50_4x16x1_256e_ucf101_rgb.py中有如下配置：

    cls_head=dict(
        type='SlowFastHead',
        in_channels=2304,
        #num_classes=400,
        num_classes=101,
        spatial_type='avg',
        dropout_ratio=0.5))

还有一个比较细节的地方，就是 SlowFastHead 的前向传播 forward 最后获得的是 cls_score = self.fc_cls(x)，其并不是 loss ，那么在训练的时候时候是如何计算loss，并且进行反向传播的呢? 有的朋友可能注意到了，其父类 BaseHead 中实现了 loss 函数，那么他又是在哪里被调用的呢? 在cfg文件中可以看到如下配置：

model = dict(
    type='Recognizer3D',
    backbone=dict(

这里的 Recognizer3D 就是模型的总体构建构成，可以 mmaction/models/recognizers/recognizer3d.py 代码中，看到如下代码：

@RECOGNIZERS.register_module()
class Recognizer3D(BaseRecognizer):
    """3D recognizer model framework."""

    def forward_train(self, imgs, labels):# 训练的前向传播
        """Defines the computation performed at every call when training.定义训练时每次调用执行的计算"""

        # 输入转化为 NCTHW 形式
        imgs = imgs.reshape((-1, ) + imgs.shape[2:])

        # 使用主干网络 ResNet3dSlowFast 进行特征提取
        x = self.extract_feat(imgs)

        # 使用 SlowFastHead 进行分类
        cls_score = self.cls_head(x)

        # 获得 ground truch labels
        gt_labels = labels.squeeze()
        
        # 调用 SlowFastHead 中的loss函数计算loss
        loss = self.cls_head.loss(cls_score, gt_labels)

        # 返回 loss 进行反向传播
        return loss

    def forward_test(self, imgs): # 测试评估的前向传播
        """Defines the computation performed at every call when evaluation and
        testing."""
        # 输入转化为 NCTHW 形式
        imgs = imgs.reshape((-1, ) + imgs.shape[2:])

        # 使用主干网络 ResNet3dSlowFast 进行特征提取
        x = self.extract_feat(imgs)

        # 使用 SlowFastHead 进行分类
        cls_score = self.cls_head(x)
        
        # 如果是多标签，这使用sigmod，如果为单标签则使用softmax
        cls_score = self.average_clip(cls_score)
        
        # 返回预测的结果
        return cls_score.cpu().numpy()

    def forward_dummy(self, imgs):# 用于计算网络的FLOPs
        """Used for computing network FLOPs.

        See ``mmaction/tools/get_flops.py``.

        Args:
            imgs (torch.Tensor): Input images.

        Returns:
            Tensor: Class score.
        """
        imgs = imgs.reshape((-1, ) + imgs.shape[2:])
        x = self.extract_feat(imgs)
        outs = (self.cls_head(x), )
        return outs

结语

到这里为止，对于SlowFast的分析可以说是完成了，后面我们就是学习如何去训练自己的数据（多标签或者单标签），以及使用domo进行视频的推断等等。

在这里插入图片描述

动作识别0-12：mmaction2(SlowFast)-源码无死角解析（8）-SlowFastHead-分类层

[ 申请 ]友情链接：