从样例出发了解序列库cereal的实现

发布时间 2023-03-22 21:13:53作者: RayChenCode

按照github主页的说明,cereal 是一个只有头文件形式的C++11序列化库,可以对任意数据类型进行序列化处理,并且能反向将序列化的数据解析成不同的二进制编码形式,如XML或者JSON。设计目标是快速、轻量和易于扩展。

  • 支持序列化的类型方面,cereal基本支持每种标准库类型,比如std::vector, std::list, std::mapstd::shared_ptrstd::unique_ptr等,还支持继承(inheritance)和多态(polymorphism)类的序列化。但是,裸指针和引用的序列化是不支持的
  • 支持的编译器有g++4.7.3, clang++3.3 和 MSVC2013(或者更新的版本),可能支持老版本的编译器,但是不保证。
  • 性能方面,简单的性能测试显示,cereal通常比Boost的序列化库更快,生成的二级制序列占用更少的空间。
  • 代码相对于Boost易于理解
  • 提供了binary,XML,JSON格式的序列化类型,也可扩展其他类型的打包方法(archive)和类型
  • 代码由单元测试覆盖,代码质量得到了一定的保障

参考 https://uscilab.github.io/cereal/

本文的目标

通过样例代码的深入解读,一窥cereal底层细节,期望获得某种程度的理解。

样例分析

如下样例代码也是取自于cereal github主页,显然,代码功能是序列化结构体SomeData的成员data,以二进制形式保存到文件out.cereal中。其中,模板成员函数saveload及结构体MyRecord的模板成员函数serialize是cereal序列化典型接口。后续将展开说明它们是如何被调用到的。

#include <cereal/types/unordered_map.hpp>
#include <cereal/types/memory.hpp>
#include <cereal/archives/binary.hpp>
#include <fstream>
    
struct MyRecord
{
  uint8_t x, y;
  float z;

  template <class Archive>
  void serialize( Archive & ar )
  {
    ar( x, y, z );
  }
};
    
struct SomeData
{
  int32_t id;
  std::shared_ptr<std::unordered_map<uint32_t, MyRecord>> data;

  template <class Archive>
  void save( Archive & ar ) const
  {
    ar( data );
  }
      
  template <class Archive>
  void load( Archive & ar )
  {
    static int32_t idGen = 0;
    id = idGen++;
    ar( data );
  }
};

int main()
{
  std::ofstream os("out.cereal", std::ios::binary);
  cereal::BinaryOutputArchive archive( os );

  SomeData myData;
  archive( myData );

  return 0;
}

cereal::BinaryOutputArchive类型实例archive的构造

BinaryOutputArchive 继承于OutputArchive<BinaryOutputArchive, AllowEmptyClassElision>。编译期,编译器会特化(specialization)出为模板参数为BinaryOutputArchiveAllowEmptyClassElision的具体类的定义。BinaryOutputArchive构造过程包括两个方面的初始化:

  • 初始化成员std::ostream类型变量itsStream
  • 构造特化的父类对象,将this指针传给父类
cereal-1.3.2\include\cereal\archives\binary.hpp
      BinaryOutputArchive(std::ostream & stream) :
        OutputArchive<BinaryOutputArchive, AllowEmptyClassElision>(this),
        itsStream(stream)
      { }

进一步,OutputArchive<BinaryOutputArchive, AllowEmptyClassElision>类型实例的构造,包括以下多个成员的实例化:

cereal-1.3.2\include\cereal\cereal.hpp
      OutputArchive(ArchiveType * const derived) : self(derived), itsCurrentPointerId(1), itsCurrentPolymorphicTypeId(1)
      { }
  1. 保存子类实例的指针this
  2. 下一个指针类Id itsCurrentPointerId,在其成员函数registerSharedPointer中用到
          auto ptrId = itsCurrentPointerId++;
          itsSharedPointerMap.insert( {addr, ptrId} );
  1. itsCurrentPolymorphicTypeId 保存下一个多态类id,在其成员函数registerPolymorphicType中用到
          auto polyId = itsCurrentPolymorphicTypeId++;
          itsPolymorphicTypeMap.insert( {name, polyId} );

除此以外,还隐式构造了下面几类容器:

  1. std::unordered_map<void const *, std::uint32_t>类型变量itsSharedPointerStorage

  2. 隐式构造了std::unordered_map<char const *, std::uint32_t>类型变量itsPolymorphicTypeMap

  3. 要被序列化的所有基类的集合itsBaseClassSet

      //! A set of all base classes that have been serialized
      std::unordered_set<traits::detail::base_class_id, traits::detail::base_class_id_hash> itsBaseClassSet;
  1. 用来追踪类型版本信息的集合
    std::unordered_set<size_type> itsVersionedTypes

这里运用到了std::unordered_setstd::unordered_map 而不是std::setstd::map, 想必是出于性能的考虑。std::unordered_setstd::unordered_map是基于散列表实现的,读取时间复杂度是O(1);而std::setstd::map底层是红黑树的实现,读取的复杂度在O(logN)。显然,std::unordered_setstd::unordered_map性能更优。

仿函数archive(myData)

这里调用到operator()(...)运算符重载,即仿函数(functor),该可变参数模板形式的仿函数,可以适应任意数量和类型的参数,通过std::forward将入参转换成左值传递给成员函数process

//! Serializes all passed in data
/*! This is the primary interface for serializing data with an archive */
template <class ... Types> inline
ArchiveType & operator()( Types && ... args )
{
  self->process( std::forward<Types>( args )... );
  return *self;
}

编译期,类型SomeData,可以扩展为self->process(std::forward<SomeData>(myData)),继续扩展
void process( T && head, Other && ... tail ) --> self->process( std::forward<T>( head ) ) --> self->processImpl( head )

//! Serializes data after calling prologue, then calls epilogue
template <class T> inline
void process( T && head )
{
  prologue( *self, head )
  self->processImpl( head );
  epilogue( *self, head );
}

之后,traits PROCESS_IF出场,因为SomeData具有save的成员函数,于是,编译器会选择到类成员函数processImpl

//! Helper macro that expands the requirements for activating an overload
/*! Requirements:
      Has the requested serialization function
      Does not have version and unversioned at the same time
      Is output serializable AND
        is specialized for this type of function OR
        has no specialization at all */
#define PROCESS_IF(name)                                                             \
traits::EnableIf<traits::has_##name<T, ArchiveType>::value,                          \
                  !traits::has_invalid_output_versioning<T, ArchiveType>::value,      \
                  (traits::is_output_serializable<T, ArchiveType>::value &&           \
                  (traits::is_specialized_##name<T, ArchiveType>::value ||           \
                    !traits::is_specialized<T, ArchiveType>::value))> = traits::sfinae

//! Member split (save)
template <class T, PROCESS_IF(member_save)> inline
ArchiveType & processImpl(T const & t)
{
  access::member_save(*self, t);
  return *self;
}

继续展开access::member_save(),则找到access类静态模板成员函数,CEREAL_SAVE_FUNCTION_NAME即为save,即最终调用到SomeData内部的save模板成员函数。

#ifndef CEREAL_SAVE_FUNCTION_NAME
//! The serialization (save) function name to search for.
/*! You can define @c CEREAL_SAVE_FUNCTION_NAME to be different assuming you do so
    before this file is included. */
#define CEREAL_SAVE_FUNCTION_NAME save
#endif // CEREAL_SAVE_FUNCTION_NAME
template<class Archive, class T> inline
static auto member_save(Archive & ar, T const & t) -> decltype(t.CEREAL_SAVE_FUNCTION_NAME(ar))
{ return t.CEREAL_SAVE_FUNCTION_NAME(ar); }

此时t的类型是SomeData const&,调用此类型的模板成员函数save, 传入data
之后调用到archive(data),其中dataSomeData的成员变量

std::shared_ptr<std::unordered_map<uint32_t, MyRecord>> data;

智能指针类型数据data的archive()

智能指针archive调用仿函数archive扩展和SomeData类型的相似,不同点在于,

  1. processImpl会找到非成员函数模板
    //! Non member split (save)
    template <class T, PROCESS_IF(non_member_save)> inline
    ArchiveType & processImpl(T const & t)
    {
      CEREAL_SAVE_FUNCTION_NAME(*self, t);
      return *self;
    }
  1. 选择已实现的非成员函数save
    代码如下,这里生成对应的id,缓存id,序列化("id", id),之后序列化指针指向的内容("data", *ptr)
    NVP 即NameValuepairs
    include\cereal\types\memory.hpp
    //! Saving std::shared_ptr (wrapper implementation)
    /*! @internal */
    template <class Archive, class T> inline
    void CEREAL_SAVE_FUNCTION_NAME( Archive & ar, memory_detail::PtrWrapper<std::shared_ptr<T> const &> const & wrapper )
    {
      auto & ptr = wrapper.ptr;

      uint32_t id = ar.registerSharedPointer( ptr );
      ar( CEREAL_NVP_("id", id) );

      if( id & detail::msb_32bit )
      {
        ar( CEREAL_NVP_("data", *ptr) );
      }
    }

std::unordered_map类型ar( CEREAL_NVP_("data", *ptr) )

跳过ar( CEREAL_NVP_("id", id) ),继续往内部挖掘ar( CEREAL_NVP_("data", *ptr) ),类似的扩展,但是走到std::unordered_map对应的save模板函数

std::unordered_map<uint32_t, MyRecord>

include\cereal\types\concepts\pair_associative_container.hpp
//! Saving for std-like pair associative containers
template <class Archive, template <typename...> class Map, typename... Args, typename = typename Map<Args...>::mapped_type> inline
void CEREAL_SAVE_FUNCTION_NAME( Archive & ar, Map<Args...> const & map )
{
  ar( make_size_tag( static_cast<size_type>(map.size()) ) );

  for( const auto & i : map )
    ar( make_map_item(i.first, i.second) );
}


include\cereal\details\helpers.hpp
@internal */
template <class Key, class Value>
struct MapItem
{
  using KeyType = typename std::conditional<
    std::is_lvalue_reference::value,
    Key,
    typename std::decay::type>::type;

  using ValueType = typename std::conditional<
    std::is_lvalue_reference<Value>::value,
    Value,
    typename std::decay<Value>::type>::type;

  //! Construct a MapItem from a key and a value
  /*! @internal */
  MapItem( Key && key_, Value && value_ ) : key(std::forward(key_)), value(std::forward<Value>(value_)) {}

  MapItem & operator=( MapItem const & ) = delete;

  KeyType key;
  ValueType value;

  //! Serialize the MapItem with the NVPs "key" and "value"
  template <class Archive> inline
  void CEREAL_SERIALIZE_FUNCTION_NAME(Archive & archive)
  {
    archive( make_nvp<Archive>("key",   key),
            make_nvp<Archive>("value", value) );
  }

基本类型的序列化

一步一步深入,最终会调用到MyRecord函数serialize,进一步的是到基本类型的序列化

//! Saving for POD types to binary
template<class T> inline
typename std::enable_if<std::is_arithmetic<T>::value, void>::type
CEREAL_SAVE_FUNCTION_NAME(BinaryOutputArchive & ar, T const & t)
{
  ar.saveBinary(std::addressof(t), sizeof(t));
}

include\cereal\archives\binary.hpp 
//! Writes size bytes of data to the output stream
void saveBinary( const void * data, std::streamsize size )
{
    auto const writtenSize = itsStream.rdbuf()->sputn( reinterpret_cast<const char*>( data ), size );

    if(writtenSize != size)
      throw Exception("Failed to write " + std::to_string(size) + " bytes to output stream! Wrote " + std::to_string(writtenSize));
}

总结

本文大致对cereal序列化内部调用进行的扩展。算法上,代码实现了数据结构深度遍历的过程,出发节点是myData,一直访问到基本数据类型并序列化,层层递归,最终得到序列化的字符串。代码设计上,大量运用到模板,设计之精巧,让人叹为观止。