动作识别0-10：mmaction2(SlowFast)-源码无死角解析（6）-模型构建总览

以下链接是个人关于mmaction2(SlowFast-动作识别) 所有见解，如有错误欢迎大家指出，我会第一时间纠正。有兴趣的朋友可以加微信：17575010159 相互讨论技术。若是帮助到了你什么，一定要记得点赞！因为这是对我最大的鼓励。文末附带 \color{blue}{文末附带} 文末附带公众号 − \color{blue}{公众号 -} 公众号− 海量资源。 \color{blue}{ 海量资源}。海量资源。

动作识别0-00：mmaction2(SlowFast)-目录-史上最新无死角讲解

极度推荐的商业级项目： \color{red}{极度推荐的商业级项目：} 极度推荐的商业级项目：这是本人落地的行为分析项目，主要包含（1.行人检测，2.行人追踪，3.行为识别三大模块）：行为分析(商用级别)00-目录-史上最新无死角讲解

前言

通过前面的博客，我们已经知道训练模型的总体思路，以及数据集加载的整体过程，并且知道我们拿到的数据是什么。接下来我们就是要对模型进行分析。分析其是如何构建的，如何训练的，loss是如何计算的，网络测试又是如何进行等等。首先我们查看我们修改的configs/recognition/slowfast/my_slowfast_r50_4x16x1_256e_ucf101_rgb.py文件，可以看到如下关键代码：

model = dict(
    type='Recognizer3D', # 使用3D识别卷积（相对于2D，增加了时间维度）
    backbone=dict( # 主干网络相关配置
        type='ResNet3dSlowFast', #
        ......
        slow_pathway=dict( # 慢速路径 
			type='resnet3d', # 使用resnet3d网络  	
			......
		)
        fast_pathway=dict( # 快速路径
            type='resnet3d', # 使用resnet3d网络
            ......
        ) 
    )    
    cls_head=dict( # 头部的分类网络
    	    type='SlowFastHead',
    	    ......
    )

其上的这些参数都是比较重要的，那么我们接下来就根据这些参数，去分析模型的构建过程。

ResNet3dSlowFast

首先我们分析 backbone 这个字典，其包含参数 type=‘ResNet3dSlowFast’，我们查看mmaction/models/backbones/resnet3d_slowfast.py可以找到class ResNet3dSlowFast(nn.Module): 这个类。本人的注释如下（后续有分析带读，可以结合注释一起分析），请大家暂时不要去深究每个函数的具体实现，大致了解其功能即可：

# 注册到BACKBONES容器中
@BACKBONES.register_module()
class ResNet3dSlowFast(nn.Module):
    """Slowfast backbone.

    This module is proposed in `SlowFast Networks for Video Recognition
    `_

    Args:
        # resnet的深度，可选参数为{18, 34, 50, 101, 152}
        depth (int): Depth of resnet, from {18, 34, 50, 101, 152}.
        # 预训练模型的目录
        pretrained (str): The file path to a pretrained model.
        # tau, 其对应论文中的参数τ
        resample_rate (int): A large temporal stride ``resample_rate``
            on input frames, corresponding to the :math:`\\tau` in the paper.
            i.e., it processes only one out of ``resample_rate`` frames.
            Default: 16.
        # # alpha, 其对应论文中的参数α
        speed_ratio (int): Speed ratio indicating the ratio between time
            dimension of the fast and slow pathway, corresponding to the
            :math:`\\alpha` in the paper. Default: 8.
        channel_ratio (int): Reduce the channel number of fast pathway
            by ``channel_ratio``, corresponding to :math:`\\beta` in the paper.
            Default: 8.

        slow_pathway (dict): Configuration of slow branch, should contain
            necessary arguments for building the specific type of pathway
            and:
            type (str): type of backbone the pathway bases on.
            lateral (bool): determine whether to build lateral connection
            for the pathway.Default:

            .. code-block:: Python

                dict(type='ResNetPathway',
                lateral=True, depth=50, pretrained=None,
                conv1_kernel=(1, 7, 7), dilations=(1, 1, 1, 1),
                conv1_stride_t=1, pool1_stride_t=1, inflate=(0, 0, 1, 1))

        fast_pathway (dict): Configuration of fast branch, similar to
            `slow_pathway`. Default:

            .. code-block:: Python

                dict(type='ResNetPathway',
                lateral=False, depth=50, pretrained=None, base_channels=8,
                conv1_kernel=(5, 7, 7), conv1_stride_t=1, pool1_stride_t=1)
    """

    def __init__(self,
                 pretrained, # 是否使用预训练模型
                 resample_rate=8, # 对应论文中的参数τ
                 speed_ratio=8, # 对应论文中的参数α
                 channel_ratio=8, # 其对应论文中β的倒数
                 slow_pathway=dict( # 慢速路径相关配置
                    type='resnet3d', #使用resnet3d网络
                     depth=50, # 其深度为50
                     pretrained=None, # 是否使用预训练模型
                     lateral=True, # 是否使用侧面链接的方式
                     conv1_kernel=(1, 7, 7), # 第一层卷积层在时序维度上的步伐
                     dilations=(1, 1, 1, 1),
                     conv1_stride_t=1, # 第一层卷积层在时序维度上的步伐
                     pool1_stride_t=1, # 第一个池化层在时序方向上的步伐
                     inflate=(0, 0, 1, 1)),
                 fast_pathway=dict( # 快速路径
                     type='resnet3d', # 使用resnet3d网络
                     depth=50,
                     pretrained=None, # 是否加载预训练模型
                     lateral=False, # 是否使用侧面链接的方式
                     base_channels=8, # 基础通道数目
                     conv1_kernel=(5, 7, 7), # 第一层卷积层在时序维度上的步伐
                     conv1_stride_t=1,  # 第一个池化层在时序方向上的步伐
                     pool1_stride_t=1)): # 验证的时候是否使用正则化
        super().__init__()
        # 进行相应的赋值操作
        self.pretrained = pretrained
        self.resample_rate = resample_rate
        self.speed_ratio = speed_ratio
        self.channel_ratio = channel_ratio

        # 如果慢速路径使用侧面链接，设定其论文中的参数τ，以及论文中的参数α参数
        if slow_pathway['lateral']:
            slow_pathway['speed_ratio'] = speed_ratio
            slow_pathway['channel_ratio'] = channel_ratio

        # 构建快速路径和慢速路径
        self.slow_path = build_pathway(slow_pathway)
        self.fast_path = build_pathway(fast_pathway)

    # 对权重进行初始化，如果需要加载预训练模型
    def init_weights(self):
        """Initiate the parameters either from existing checkpoint or from
        scratch."""
        if isinstance(self.pretrained, str):
            logger = get_root_logger()
            msg = f'load model from: {self.pretrained}'
            print_log(msg, logger=logger)
            # Directly load 3D model.
            load_checkpoint(self, self.pretrained, strict=True, logger=logger)
        elif self.pretrained is None:
            # Init two branch seperately.
            self.fast_path.init_weights()
            self.slow_path.init_weights()
        else:
            raise TypeError('pretrained must be a str or None')

    def forward(self, x):
        """Defines the computation performed at every call.

        Args:
            x (torch.Tensor): The input data，
            经过预处理,图像增强的视频帧

        Returns:
            tuple[torch.Tensor]: The feature of the input
            samples extracted by the backbone.
        """
        # 以间隔为self.resample_rate(默认为8，对应论文中的τ)进行帧提取
        # x[b,3,clip_len,w,h] --> x_slow[b,3,clip_len/self.resample_rate,w,h]
        # 本人的设置为: x[4,3,16,224,224] --> x_slow[4,3,2,224,224]
        # 3代表每张图像的输入通道数.其与w,h共同表示空间维度
        # x[4,3,16,224,224]中的16，x_slow[4,3,2,224,224]中的2都表示时序维度
        x_slow = x[:, :, ::self.resample_rate, :, :]

        # [b,3,clip_len/self.resample_rate,w,h] --> [b,3,clip_len/self.resample_rate,w/2,h/2]
        # 本人的设置为:[4,3,2,224,224] --> [4,64,2,112,112]
        x_slow = self.slow_path.conv1(x_slow)

        # [b,3,clip_len/self.resample_rate,w/2,h/2] --> [b,3,clip_len/self.resample_rate,w/4,h/4]
        # [b,64,2,112,112] --> [b,64,4,56,56]
        x_slow = self.slow_path.maxpool(x_slow)


        # 以间隔为self.resample_rate*(默认为8，对应论文中的τ)/self.speed_ratio(默认为8,对应论文中的α)进行帧提取
        # x[b,3,clip_len,w,h] --> x_slow[b,3,clip_len/(self.resample_rate*self.speed_ratio),w,h]
        # 本人的设置为: x[4,3,16,224,224] --> x_slow[4,3,16,224,224]
        # 3代表每张图像的输入通道数.其与w,h共同表示空间维度
        # x[b,3,16,224,224]中的16，x_slow[b,3,16,224,224]中的16都表示时序维度
        x_fast = x[:, :, ::self.resample_rate // self.speed_ratio, :, :]

        # x_slow[b,3,clip_len/self.resample_rate*self.speed_ratio,w,h] --> x_slow[b,3,clip_len/self.resample_rate*self.speed_ratio,w/2,h/2]
        # [4,8,16,224,224] --> [4,8,16,112,112]
        x_fast = self.fast_path.conv1(x_fast)

        # x_slow[b,3,clip_len/self.resample_rate*self.speed_ratio,w,h] --> x_slow[b,3,clip_len/self.resample_rate*self.speed_ratio,w/4,h/4]
        # [4,8,16,112,112] --> [4,8,16,56,56]
        x_fast = self.fast_path.maxpool(x_fast)

        #如果慢路径使用了侧面连接，则对快速通道进行转换，然后进行连接融合
        if self.slow_path.lateral:
            # x_fast 经过快速路径的conv1_lateral获得x_fast_lateral
            # x_fast[b,8,16,56,56] --> x_fast_lateral[b,16,2,56,56]
            x_fast_lateral = self.slow_path.conv1_lateral(x_fast)
            # 连接起来;  x_slow[b,64,2,56,56] + x_fast_lateral[b,16,2,56,56] --> x_slow[b,80,2,56,56]
            x_slow = torch.cat((x_slow, x_fast_lateral), dim=1)

        # self.slow_path.res_layers = ['layer1', 'layer2', 'layer3', 'layer4']
        for i, layer_name in enumerate(self.slow_path.res_layers):

            # 每次迭代获得一个慢速路径res_layer层
            res_layer = getattr(self.slow_path, layer_name)

            # 把 x_slow 输入 res_layer 层获得新的x_slow，迭代过程如下:
            # i=0 : x_slow[b, 80,   2, 56, 56] --> [4, 256,  2, 56, 56]
            # i=1 : x_slow[b, 320,  2, 56, 56] --> [4, 512,  2, 28, 28]
            # i=2 : x_slow[b, 640,  2, 56, 56] --> [4, 1024, 2, 14, 14]
            # i=3 : x_slow[b, 1280, 2, 56, 56] --> [4, 2048, 2, 7,  7 ]            a = x_slow
            x_slow = res_layer(x_slow)


            # 每次迭代获得一个慢速路径res_layer_fast层
            res_layer_fast = getattr(self.fast_path, layer_name)

            # 把 x_fast 输入 res_layer_fast 层获得新的 x_fast
            # i=0 : x_fast[4, 8,   16, 56, 56] --> [b, 32,  16, 56, 56]
            # i=1 : x_fast[b, 32,  16, 56, 56] --> [b, 64,  2, 28, 28]
            # i=2 : x_fast[b, 64,  16, 28, 28] --> [b, 128, 2, 14, 14]
            # i=3 : x_fast[b, 128, 16, 14, 14] --> [b, 256, 2, 7,  7 ]
            x_fast = res_layer_fast(x_fast)

            # 如果不为最后一层，且慢速路径设置为侧面连接
            if (i != len(self.slow_path.res_layers) - 1
                    and self.slow_path.lateral):
                # No fusion needed in the final stage
                # 在最后阶段不需要进行融合,如果不是最后一个阶段，则调用慢速阶段的lateral_connections，
                lateral_name = self.slow_path.lateral_connections[i]
                conv_lateral = getattr(self.slow_path, lateral_name)

                # 把 x_fast 输入 conv_lateral 层获得 x_fast_lateral
                # i=0 : x_fast[b, 32,  16, 56, 56] --> x_fast_lateral[b, 64,  2, 56, 56]
                # i=1 : x_fast[b, 64,  16, 28, 28] --> x_fast_lateral[b, 128,  2, 28, 28]
                # i=2 : x_fast[b, 128, 16, 14, 14] --> x_fast_lateral[b, 256, 2, 14, 14]
                x_fast_lateral = conv_lateral(x_fast)

                # i=0 : x_slow[b,256, 2,56,56] + x_fast_lateral[b,64,2,56,56 ] --> x_slow[b,320,2,56,56]
                # i=1 : x_slow[b,512, 2,28,28] + x_fast_lateral[b,128,2,28,28] --> x_slow[b,640,2,28,28]
                # i=2 : x_slow[b,1024,2,14,14] + x_fast_lateral[b,256,2,14,14] --> x_slow[b,1280,2,14,14]
                x_slow = torch.cat((x_slow, x_fast_lateral), dim=1)

        # x_slow[4,2048,2,7,7], x_fast[b, 256, 2, 7,  7 ]
        out = (x_slow, x_fast)

        return out

论文对照

首先，我们查看如下代码(分析其前向传播过程):

        x_slow = x[:, :, ::self.resample_rate, :, :]
        x_slow = self.slow_path.conv1(x_slow)
        x_slow = self.slow_path.maxpool(x_slow)

        x_fast = x[:, :, ::self.resample_rate // self.speed_ratio, :, :]
        x_fast = self.fast_path.conv1(x_fast)
        x_fast = self.fast_path.maxpool(x_fast)

其对应论文中Table 1的如下过程（红色圈出部分）：在这里插入图片描述 def forward(self, x)中剩下的代码就对应以下部分（注意不包含）global average pool, concate, fc 以及classes这一列：通过注释，大家应该可以注意到以下几点

1.初始输入数据形状为x[b,3,clip_len,w,h]，slow_path以间隔为resample_rate=8=τ进行采样，fast_path以间隔为self.resample_rate/self.speed_ratio(α)=8/8=1=进行采样。可以知道 fast_path 在时间轴的采样数目为 slow_path的8倍。

2.在进行横向（侧向）特征融合的时候，slow_path的路径获得的特征形状一般不进行改变，主要是调整 fast_path 输出特征的形状，让其能和 slow_path 进行匹配。进行横向连接默认使用的是conv_lateral，也就是一个卷积层。

3.fast_path 和 slow_path 在空间上（长和宽）上的特征分辨率都是相等的，但是通道数，一致保持 fast_path 只有 slow_path 八分之一的状态。

结语

我相信大家看了下面的注释，已经对模型的总体架构有一定了解。但是对于细节的实现，还是存在疑问的，如下：

class ResNet3dSlowFast(nn.Module):
    def __init__(self,.......
        # 构建快速路径和慢速路径
        self.slow_path = build_pathway(slow_pathway)
        self.fast_path = build_pathway(fast_pathway)
    def forward(self, x):
		x_slow = self.slow_path.conv1(x_slow)
        x_fast = self.fast_path.conv1(x_fast)
        # self.slow_path.res_layers = ['layer1', 'layer2', 'layer3', 'layer4']
        for i, layer_name in enumerate(self.slow_path.res_layers):
   		      if (i != len(self.slow_path.res_layers) - 1 and self.slow_path.lateral):、
   		      .......

等等，都不是很了解，如 build_pathway，slow_path.conv1 的具体实现。接下来的博客，就会这些细节进行详细的分析。记得给我一个赞呀，相信大家看到这里也不容易的。拜拜，下篇博客再见。

在这里插入图片描述

动作识别0-10：mmaction2(SlowFast)-源码无死角解析（6）-模型构建总览

[ 申请 ]友情链接：