总结|ORB_SLAM2源码中字典使用细节

点击上方“3D视觉工坊”，选择“星标”

干货第一时间送达

前言

前段时间，主要对ORB-SLAM2中字典的训练与使用进行了些研究，关于字典的训练之前也写过一篇文章：VSLAM|回环检测之词袋字典如何生成？，简单讲解了如何使用我们自己的数据集进行训练字典，ORB-SLAM作者提供的是字典层数为6层，当然我们也可以训练更低层数的字典，以减小程序所占内存。

本篇文章，主要就单目ORB-SLAM2源码中使用字典的一些函数进行简单剖析。当然，笔者也刚入行VSLAM时间不长，如有不到之处，还请多批评指正。

备注：对于下述的代码注释，主要借鉴了泡泡机器人给出的中文注释

粗略统计了下，单目ORB-SLAM2中主要有四个地方涉及到了字典，以下介绍其函数细节。

一系统初始化时，加载字典bin或者txt文件

在mono_tum.cc的main函数中，对SLAM系统初始时（主要创建了SLAM系统，初始化了各个线程，为能够处理每帧图片做准备）。

ORB_SLAM2::System SLAM(argv[1],argv[2],ORB_SLAM2::System::MONOCULAR,true);

在类System()类的构造函数里，进行了字典文件的加载。

mpVocabulary = new ORBVocabulary(); bool bVocLoad = false; // chose loading method based on file extension if (has_suffix(strVocFile, ".txt")) bVocLoad = mpVocabulary->loadFromTextFile(strVocFile); else if(has_suffix(strVocFile, ".bin")) bVocLoad = mpVocabulary->loadFromBinaryFile(strVocFile); else bVocLoad = false;

接下来，我们重点分析下上述的loadFromBinaryFile()函数的实现细节。

为此，我们需要弄清楚ORBvoc.txt文件中的数据保存格式（对于ORBvoc.bin，由于为二进制文件，此处没办法展示）。

10 6 0 0 #分别表示上面的树的分支、树的深度、相似度、权重

0 0 252 188 188 242 169 109 85 143 187 191 164 25 222 255 72 27 129 215

237 16 58 111 219 51 219 211 85 127 192 112 134 34 0

...

#0 表示节点的父节点；0 表示是否是叶节点，是的话为 1，否则为 0;252-34 表示 orb 特征；最后一位是权重。

那么以上的ORBvoc.txt里的数据是如何保存的呢？这里不得不提一下saveToTextFile()函数。

template<class TDescriptor, class F> void TemplatedVocabulary::saveToTextFile(const std::string& filename) const { std::ofstream ofs; ofs.open(filename.c_str(), std::ios_base::out); if(!ofs) { throw std::string("Could not open file: ") + filename; } //注意这里的数据类型格式：

//m_k类型为int,m_L类型为int,m_scoring为枚举类型，此处的0表示相似度//计算方式L1_NORM ，m_weighting为枚举类型，此处的0表示权重TF_IDF;

ofs << m_k << " " << m_L << " " << " " << m_scoring << " " << m_weighting << std::endl; for(size_t i = 1; i < m_nodes.size(); ++i) { const Node& node = m_nodes.at(i);

//第二行开始，第一个数为节点的父节点 ofs << node.parent << " ";

//第二行的第二个数，表示是否为叶节点，如果是叶节点，则是1，否则为0 if(node.isLeaf()) { ofs << 1 << " "; } else { ofs << 0 << " "; } //第二行，接下来表示orb特征；最后一位是权重。 ofs << F::toString(node.descriptor) << " " << static_cast<double>(node.weight) << std::endl; } ofs.close(); }

而如果ORB-SLAM2系统中你使用的是bin文件，那么我们需要使用以下代码进行数据保存。

template<class TDescriptor, class F> void TemplatedVocabulary::saveToBinaryFile(const std::string& filename) const { std::ofstream ofs; ofs.open(filename.c_str(), std::ios_base::out | std::ios::binary); if(!ofs) { throw std::string("Could not open file: ") + filename; } const unsigned int n_nodes = m_nodes.size(); //n_nodes=1082073 const unsigned int node_size = sizeof(m_nodes.at(0).parent) + F::L * sizeof(char) + sizeof(float) + sizeof(bool);

//sizeof(m_nodes.at(0).parent)==4;

//F::L * sizeof(char)==32;

//sizeof(float)=4;

//sizeof(bool)==1;

//node_size== 41;

ofs.write((char*)&n_nodes, sizeof(n_nodes)); ofs.write((char*)&node_size, sizeof(node_size)); ofs.write((char*)&m_k, sizeof(m_k)); ofs.write((char*)&m_L, sizeof(m_L)); ofs.write((char*)&m_scoring, sizeof(m_scoring)); ofs.write((char*)&m_weighting, sizeof(m_weighting)); for(size_t i = 1; i < n_nodes; ++i) { const Node& node = m_nodes.at(i); ofs.write((char*)&node.parent, sizeof(node.parent)); ofs.write((char*)node.descriptor.data, F::L); const float weight = node.weight; ofs.write((char*)&weight, sizeof(weight)); const bool is_leaf = node.isLeaf(); ofs.write((char*)&is_leaf, sizeof(is_leaf)); } ofs.close(); }

而对于ORBvoc.bin文件的读取，我们的函数体为：

template<class TDescriptor, class F> void TemplatedVocabulary::loadFromBinaryFile(const std::string& filename) { std::ifstream ifs; //定义文件 ifs.open(filename.c_str(), std::ios_base::in | std::ios::binary);//打开文件 if(!ifs) { throw std::string("Could not open file: ") + filename; } //如果文件读取失败 unsigned int n_nodes, node_size; //n_nodes=1082074 node_size=41; ifs.read((char*)&n_nodes, sizeof(n_nodes)); ifs.read((char*)&node_size, sizeof(node_size)); ifs.read((char*)&m_k, sizeof(m_k)); //读取第一行树的分支数 ifs.read((char*)&m_L, sizeof(m_L)); //读取字典的层数 ifs.read((char*)&m_scoring, sizeof(m_scoring)); //读取第一行相似度 ifs.read((char*)&m_weighting, sizeof(m_weighting));//读取第一行权重值 createScoringObject(); m_words.clear(); m_words.reerve(std::pow(static_cast<double>(m_k), static_cast<double>(m_L) + 1.0)); //10000000 m_nodes.clear(); m_nodes.resize(n_nodes); m_nodes.at(0).id = 0; char* buf = new char[node_size]; unsigned int n_id = 1; while(!ifs.eof()) { ifs.read(buf, node_size); m_nodes.at(n_id).id = n_id; const int* ptr = (int*)buf; m_nodes.at(n_id).parent = *ptr; m_nodes.at(m_nodes.at(n_id).parent).children.push_back(n_id); m_nodes.at(n_id).descriptor = cv::Mat(1, F::L, CV_8U); memcpy(m_nodes.at(n_id).descriptor.data, buf + 4, F::L); //获取字典每行的描述子 m_nodes.at(n_id).weight = *reinterpret_cast<float*>(buf + 4 + F::L);//获取字典每行最后一位：单词的权重。 //F::L== 32 //sizeof(char)== 1 //sizeof(unsigned int)== 4 //sizeof(float)== 4

if(buf[8 + F::L]) { const int w_id = m_words.size(); m_words.resize(w_id + 1); m_nodes.at(n_id).word_id = w_id; m_words.at(w_id) = &m_nodes.at(n_id); } else { m_nodes.at(n_id).children.reserve(m_k); } ++n_id; if(n_id == n_nodes) { break; } } ifs.close(); delete[] buf; }

二当前帧，计算词袋

函数ComputeBoW()在ORB-SLAM2中多次被调用。

一次是在Tracking::TrackReferenceKeyFrame()里：

/** * @brief 对参考关键帧的MapPoints进行跟踪 * * 1. 计算当前帧的词包，将当前帧的特征点分到特定层的nodes上 * 2. 对属于同一node的描述子进行匹配 * 3. 根据匹配对估计当前帧的姿态 * 4. 根据姿态剔除误匹配 * @return 如果匹配数大于10，返回true */ bool Tracking::TrackReferenceKeyFrame() { // Compute Bag of Words vector // 步骤1：将当前帧的描述子转化为BoW向量 mCurrentFrame.ComputeBoW(); // We perform first an ORB matching with the reference keyframe // If enough matches are found we setup a PnP solver ORBmatcher matcher(0.7,true); vector vpMapPointMatches; // 步骤2：通过特征点的BoW加快当前帧与参考帧之间的特征点匹配 // 特征点的匹配关系由MapPoints进行维护 int nmatches = matcher.SearchByBoW(mpReferenceKF,mCurrentFrame,vpMapPointMatches);

// ignore something unimportant

}

另一次是在Tracking::Relocalization()中调用：

bool Tracking::Relocalization() { // Compute Bag of Words Vector // 步骤1：计算当前帧特征点的Bow映射 mCurrentFrame.ComputeBoW(); // Relocalization is performed when tracking is lost // Track Lost: Query KeyFrame Database for keyframe candidates for relocalisation // 步骤2：找到与当前帧相似的候选关键帧 vector vpCandidateKFs = mpKeyFrameDB->DetectRelocalizationCandidates(&mCurrentFrame); //没有找到与当前帧相似的候选关键帧？那怎么办呢，只好退出啦。 if(vpCandidateKFs.empty()) return false; const int nKFs = vpCandidateKFs.size(); // We perform first an ORB matching with each candidate // If enough matches are found we setup a PnP solver ORBmatcher matcher(0.75,true);

//ignore something unimportant

}//重定位

而上述两次调用，都是调用的Frame类里的函数：

/** * @brief Bag of Words Representation * * 计算词包mBowVec和mFeatVec * @see CreateInitialMapMonocular() TrackReferenceKeyFrame() Relocalization() */ void Frame::ComputeBoW() { //这个函数只有在当前帧的词袋是空的时候才会进行操作。 if(mBowVec.empty()) { //1、要写入词袋信息，将以OpenCV格式存储的描述子 // mvpMapPoints为std::vector vectorvCurrentDesc = Converter::toDescriptorVector(mDescriptors); mpORBvocabulary->transform(vCurrentDesc, //当前的描述子vector mBowVec, //输出，词袋向量 mFeatVec, //输出，保存有特征点索引的特征，vector 4); //获取某一层的节点索引 //@todo 这里的4表示从叶节点向前数的层数 }//判断当前帧的词袋是否是空的 }

而ComputeBoW()函数内层，transform()起着核心作用，那么接下来，我们来一起看一下ORB-SLAM2源码中的transform()。

template<class TDescriptor, class F> void TemplatedVocabulary::transform( const std::vector& features, BowVector &v, FeatureVector &fv, int levelsup) const { v.clear(); fv.clear(); if(empty()) // safe for subclasses { return; } // normalize LNorm norm; bool must = m_scoring_object->mustNormalize(norm); typename vector::const_iterator fit; if(m_weighting == TF || m_weighting == TF_IDF) { unsigned int i_feature = 0; for(fit = features.begin(); fit < features.end(); ++fit, ++i_feature) { WordId id; NodeId nid; WordValue w; // w is the idf value if TF_IDF, 1 if TF //其中id表示单词的id,w表示Value of a word,nid表示Id of nodes in the vocabulary treee //levelsup,表示节点层数 transform(*fit, id, w, &nid, levelsup); if(w > 0) // not stopped { v.addWeight(id, w); fv.addFeature(nid, i_feature); } } if(!v.empty() && !must) { // unnecessary when normalizing const double nd = v.size(); for(BowVector::iterator vit = v.begin(); vit != v.end(); vit++) vit->second /= nd; } } else // IDF || BINARY { unsigned int i_feature = 0; for(fit = features.begin(); fit < features.end(); ++fit, ++i_feature) { WordId id; NodeId nid; WordValue w; // w is idf if IDF, or 1 if BINARY transform(*fit, id, w, &nid, levelsup); if(w > 0) // not stopped { v.addIfNotExist(id, w); fv.addFeature(nid, i_feature); } } } // if m_weighting == ... if(must) v.normalize(norm); }

经过测试，对于上述函数中，我们主要进入的函数为transform(*fit, id, w, &nid, levelsup)，具体实现如下：

template<class TDescriptor, class F> void TemplatedVocabulary::transform(const TDescriptor &feature, WordId &word_id, WordValue &weight, NodeId *nid, int levelsup) const { // propagate the feature down the tree vectornodes; typename vector::const_iterator nit; // level at which the node must be stored in nid, if given const int nid_level = m_L - levelsup; if(nid_level <= 0 && nid != NULL) *nid = 0; // root NodeId final_id = 0; // root int current_level = 0; do { ++current_level; nodes = m_nodes[final_id].children; final_id = nodes[0]; double best_d = F::distance(feature, m_nodes[final_id].descriptor); for(nit = nodes.begin() + 1; nit != nodes.end(); ++nit) { NodeId id = *nit; double d = F::distance(feature, m_nodes[id].descriptor); if(d < best_d) { best_d = d; final_id = id; } } if(nid != NULL && current_level == nid_level) *nid = final_id; } while( !m_nodes[final_id].isLeaf() ); // turn node id into word id word_id = m_nodes[final_id].word_id; weight = m_nodes[final_id].weight; }

这里，简单说明下上述的几个变量格式：

1）对于vectorvCurrentDesc，这个经过程序测试，其变量格式格式如下：

vCurrentDesc[0]://表示ORB描述子：32x8=256位

[49,158,107,235,182,167,111,255,86,235,255,230,115,227,176,96,127,238,22,188,187,189，109,191,254，239,167,192,189,202,240,185]

vCurrentDesc[1]:

[49,142,107,...,168]

vBowVec==

<6890,0.00112707>,<15246,0.0013725>,<18465,0.00143206> ...

vFeatVec==

<11:[308]>,<12:[143,320]>,<13:[719,721,827,828,830,832]>,<14:[107,216,321,475],<15:[433,434,441,836,837]>,<18:[831]>,<19:[92,144,181]>,...>

备注：这里的11表示节点，[308]表示图片中相似的特征序号。

当然，对于ORB-SLAM2中作者提供的源码，我们可以进一步优化加速其计算每帧图片词袋向量，包括移位操作、修改数据结构（减少内存）等方式，此处不作详细介绍了，欢迎到我们的学术圈探讨。

三在重定位中找到与该帧相似的关键帧

ORB源码中实现此功能的主要函数为vector KeyFrameDatabase::DetectRelocalizationCandidates(Frame *F)

/* * @brief 在闭环检测中找到与该关键帧可能闭环的关键帧 * 1. 找出和当前帧具有公共单词的所有关键帧（不包括与当前帧相连的关键帧） * 2. 只和具有共同单词较多的关键帧进行相似度计算 * 3. 将与关键帧相连（权值最高）的前十个关键帧归为一组，计算累计得分 * 4. 只返回累计得分较高的组中分数最高的关键帧 * @param pKF 需要闭环的关键帧 * @param minScore 相似性分数最低要求 * @return 可能闭环的关键帧 * @see III-E Bags of Words Place Recognition */ vector KeyFrameDatabase::DetectRelocalizationCandidates(Frame *F) { // 提出所有与该pKF相连的KeyFrame，这些相连Keyframe都是局部相连，在闭环检测的时候将被剔除 //Map,Set属于标准关联容器，使用了非常高效的平衡检索二叉树：红黑树，他的插入删除效率比其他序列容器高是因为不需要做内存拷贝和内存移动，而直接替换指向节点的指针即可。 //Set和Vector的区别在于Set不包含重复的数据。Set和Map的区别在于Set只含有Key，而Map有一个Key和Key所对应的Value两个元素。 list lKFsSharingWords;// 用于保存可能与F形成回环的候选帧（只要有相同的word，且不属于局部相连帧） //这里的局部相连帧,就是和当前关键帧具有共视关系的关键帧 // Search all keyframes that share a word with current keyframes // Discard keyframes connected to the query keyframe //. 步骤1：找出和当前帧具有公共单词的所有关键帧（不包括与当前帧链接的关键帧） { unique_locklock(mMutex); // words是检测图像是否匹配的枢纽，遍历该pKF的每一个word for(DBoW2::BowVector::const_iterator vit=F->mBowVec.begin(), vend=F->mBowVec.end(); vit != vend; vit++) { // 提取所有包含该word的KeyFrame list &lKFs = mvInvertedFile[vit->first]; // 然后对这些关键帧展开遍历 for(list::iterator lit=lKFs.begin(), lend= lKFs.end(); lit!=lend; lit++) { KeyFrame* pKFi=*lit; if(pKFi->mnRelocQuery!=F->mnId)// 与pKF局部链接的关键帧不进入闭环候选帧 { pKFi->mnRelocWords=0; pKFi->mnRelocQuery=F->mnId;// pKFi标记为pKF的候选帧，之后直接跳过判断 lKFsSharingWords.push_back(pKFi); } pKFi->mnRelocWords++; // 记录pKFi与pKF具有相同word的个数 } } } // 如果没有关键帧和这个关键帧具有相同的单词,那么就返回空 if(lKFsSharingWords.empty()) return vector(); // Only compare against those keyframes that share enough words // 步骤2：统计所有闭环候选帧中与当前帧F具有共同单词最多的单词数，并以此决定阈值 int maxCommonWords=0; for(list::iterator lit=lKFsSharingWords.begin(), lend= lKFsSharingWords.end(); lit!=lend; lit++) { if((*lit)->mnRelocWords>maxCommonWords) maxCommonWords=(*lit)->mnRelocWords; } int minCommonWords = maxCommonWords*0.8f; list

总结|ORB_SLAM2源码中字典使用细节

[ 申请 ]友情链接：