gccgo对于import&export的实现

发布时间 2023-10-30 20:10:41作者: tsecer

问题

C++一个经常被人诟病的地方就是编译速度,大型C++项目(例如chrome)的构建时间会很长,编译和链接都是如此。这使得大型C++项目的开发迭代比较痛苦。

作为一种新生的语言,go在设计的时候就考虑到了构建时间的问题,力图提高编译速度。

那么这种加速又是如何实现的呢?

gcc说明

当前的go编译器只有两类,一个是golang官方的编译器,一个是基于libgo对gcc/llvm扩展。由于个人对于gcc更熟悉一些,所以还是以gcc作为对象分析。

gcc文档对于导入和导出的说明比较简单,但是也简明扼要的说明了问题的关键:这些信息是直接存储在object文件中的(the export information will be stored directly in the object file)。

When gccgo compiles a package which exports anything, the export information will be stored directly in the object file. When a package is imported, gccgo must be able to find the file.

下面分析大致基于gcc 7.3.0左右版本

go compiler

从代码布局上看,go的组织结构和gcc支持的C/C++、fortran、ada等语言类似,在gcc根文件夹下有一个单独的go文件夹。和C++相比,go文件夹下的文件数量和文件大小都少很多(这可能也是大部分觉得go语言比C++语言简单很多的一个直观感受),并且有一个gofrontend子文件夹。

在执行gcc configure时,"--enable-languages="中添加对于go的支持即可(--enable-languages=go)。

构建完成之后,和C的cc1、C++的cc1plus名字类似,go的编译程序为go1(和cc1一样,只是从源代码转换为汇编代码)。

使用go1编译go源文件,和cc1一样,生成的是汇编文件(而没有额外的、单独的导出之类的文件)。

初体验

测试代码也比较简单,就是在helloworld模块中定义函数,然后在use.go中引用该定义。

  • 定义文件
///@file: helloworld.go
package harry

func tsecer(n1, n2 int) int {
    tt := n1;
    if (tt > 0) {
        _ = n1 + n2;
    }
    return tt;
}

func Tsecer(n1, n2 int) int {
    tt := n1;
    if (tt > 0) {
        _ = n1 + n2;
    }
    return tt;
}
  • 引用文件
///@file: use.go 
package use

import (
    "helloworld"
)

func ref(n1 int) int {
    return harry.Tsecer(n1, n1);
}

func use(n1 int) int {
    return harry.tsecer(n1, n1);
}
tsecer@harry: 

一个插曲:只有大写的变量才会导出,这也解释了为什么《the go programming language中helloworld栗子中看到fmt package的PrintLn接口是大写了。

import

go的导出代码在gcc的源代码中还是非常容易找到的,因为就是分别以“显眼包”的形式位于export.cc和import.cc中。由于import是位于导出/导入流程的下游,所以查看这个实现更有意义。

流程看起来甚至有些随意:就是从字节流中匹配约定好的魔幻字,找到对应的魔幻字之后就继续尝试以约定的字符串序列进行匹配和处理。

从这里的流程看,接下来需要处理的内容就是"package ","prefix ","import"等以字符串的形式进行解析。

///@file:gcc\go\gofrontend\import.cc

// Current version magic string.
const char Export::cur_magic[Export::magic_len] =
  {
    'v', '2', ';', '\n'
  };

// Magic string for previous version (still supported)
const char Export::v1_magic[Export::magic_len] =
  {
    'v', '1', ';', '\n'
  };



// Import the data in the associated stream.

Package*
Import::import(Gogo* gogo, const std::string& local_name,
	       bool is_local_name_exported)
{
  // Hold on to the Gogo structure.  Otherwise we need to pass it
  // through all the import functions, because we need it when reading
  // a type.
  this->gogo_ = gogo;

  // A stream of export data can include data from more than one input
  // file.  Here we loop over each input file.
  Stream* stream = this->stream_;
  while (!stream->at_eof() && !stream->saw_error())
    {
      // The vector of types is package specific.
      this->types_.clear();

      // Check magic string / version number.
      if (stream->match_bytes(Export::cur_magic, Export::magic_len))
	{
	  stream->require_bytes(this->location_, Export::cur_magic,
	                        Export::magic_len);
	  this->version_ = EXPORT_FORMAT_CURRENT;
	}
      else if (stream->match_bytes(Export::v1_magic, Export::magic_len))
	{
	  stream->require_bytes(this->location_, Export::v1_magic,
	                        Export::magic_len);
	  this->version_ = EXPORT_FORMAT_V1;
	}
      else
	{
	  go_error_at(this->location_,
		      ("error in import data at %d: invalid magic string"),
		      stream->pos());
	  return NULL;
	}

      this->require_c_string("package ");
      std::string package_name = this->read_identifier();
      this->require_c_string(";\n");

      std::string pkgpath;
      std::string pkgpath_symbol;
      if (this->match_c_string("prefix "))
	{
	  this->advance(7);
	  std::string unique_prefix = this->read_identifier();
	  this->require_c_string(";\n");
	  pkgpath = unique_prefix + '.' + package_name;
	  pkgpath_symbol = (Gogo::pkgpath_for_symbol(unique_prefix) + '.'
			    + Gogo::pkgpath_for_symbol(package_name));
	}
      else
	{
	  this->require_c_string("pkgpath ");
	  pkgpath = this->read_identifier();
	  this->require_c_string(";\n");
	  pkgpath_symbol = Gogo::pkgpath_for_symbol(pkgpath);
	}

      this->package_ = gogo->add_imported_package(package_name, local_name,
						  is_local_name_exported,
						  pkgpath, pkgpath_symbol,
						  this->location_,
						  &this->add_to_globals_);
      if (this->package_ == NULL)
	{
	  stream->set_saw_error();
	  return NULL;
	}

      // Read and discard priority if older V1 export data format.
      if (version() == EXPORT_FORMAT_V1)
	{
	  this->require_c_string("priority ");
	  std::string priority_string = this->read_identifier();
	  int prio;
	  if (!this->string_to_int(priority_string, false, &prio))
	    return NULL;
	  this->require_c_string(";\n");
	}

      while (stream->match_c_string("package"))
	this->read_one_package();

      while (stream->match_c_string("import"))
	this->read_one_import();

      if (stream->match_c_string("init"))
	this->read_import_init_fns(gogo);

      // Loop over all the input data for this package.
      while (!stream->saw_error())
	{
	  if (stream->match_c_string("const "))
	    this->import_const();
	  else if (stream->match_c_string("type "))
	    this->import_type();
	  else if (stream->match_c_string("var "))
	    this->import_var();
	  else if (stream->match_c_string("func "))
	    this->import_func(this->package_);
	  else if (stream->match_c_string("checksum "))
	    break;
	  else
	    {
	      go_error_at(this->location_,
			  ("error in import data at %d: "
			   "expected %<const%>, %<type%>, %<var%>, "
			   "%<func%>, or %<checksum%>"),
			  stream->pos());
	      stream->set_saw_error();
	      return NULL;
	    }
	}

      // We currently ignore the checksum.  In the future we could
      // store the checksum somewhere in the generated object and then
      // verify that the checksum matches at link time or at dynamic
      // load time.
      this->require_c_string("checksum ");
      stream->advance(Export::checksum_len * 2);
      this->require_c_string(";\n");
    }

  return this->package_;
}

.s文件

使用go1来编译文件,可以看到生成的是一个单独的.s文件,在文件的开始有一个.go_export节,节中的内容都是通过.ascii指令表示的字符串。

tsecer@harry: cat helloworld.s 
        .file   "helloworld.go"
        .section        .go_export,"",@progbits
        .ascii  "v2;\n"
        .ascii  "package "
        .ascii  "harry"
        .ascii  ";\n"
        .ascii  "prefix "
        .ascii  "go"
        .ascii  ";\n"
        .ascii  "package "
        .ascii  "harry"
        .ascii  " "
        .ascii  "go.harry"
        .ascii  " "
        .ascii  "go.harry"
        .ascii  ";\n"
        .ascii  "func "
        .ascii  "Tsecer"
        .ascii  " ("
        .ascii  "n1"
        .ascii  " "
        .ascii  "<type -11>"
        .ascii  ", "
        .ascii  "n2"
        .ascii  " "
        .ascii  "<type -11>"
        .ascii  ")"
        .ascii  " "
        .ascii  "<type -11>"
        .ascii  ";\n"
        .ascii  "checksum 167C8CF9A77D750385FD680EC6A1268A73F1F5D1;\n"
        .text

.o文件

毫不意外,在生成的obj文件中包含了asm文件中定义的.go_export节。

tsecer@harry: readelf -a helloworld.o 
ELF 头:
  Magic:  7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 
  类别:                              ELF64
  数据:                              2 补码,小端序 (little endian)
  版本:                              1 (current)
  OS/ABI:                            UNIX - System V
  ABI 版本:                          0
  类型:                              REL (可重定位文件)
  系统架构:                          Advanced Micro Devices X86-64
  版本:                              0x1
  入口点地址:              0x0
  程序头起点:              0 (bytes into file)
  Start of section headers:          2192 (bytes into file)
  标志:             0x0
  本头的大小:       64 (字节)
  程序头大小:       0 (字节)
  Number of program headers:         0
  节头大小:         64 (字节)
  节头数量:         24
  字符串表索引节头: 23

节头:
  [号] 名称              类型             地址              偏移量
       大小              全体大小          旗标   链接   信息   对齐
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .text             PROGBITS         0000000000000000  00000040
       000000000000008e  0000000000000000  AX       0     0     1
  [ 2] .rela.text        RELA             0000000000000000  000005d0
       0000000000000030  0000000000000018   I      21     1     8
  [ 3] .data             PROGBITS         0000000000000000  000000ce
       0000000000000000  0000000000000000  WA       0     0     1
  [ 4] .bss              NOBITS           0000000000000000  000000ce
       0000000000000000  0000000000000000  WA       0     0     1
  [ 5] .go_export        PROGBITS         0000000000000000  000000ce
       00000000000000a9  0000000000000000           0     0     1
  [ 6] .rodata.go.harry. PROGBITS         0000000000000000  00000178
       0000000000000008  0000000000000000   A       0     0     8
  [ 7] .rela.rodata.go.h RELA             0000000000000000  00000600
       0000000000000018  0000000000000018   I      21     6     8
  [ 8] .note.GNU-split-s PROGBITS         0000000000000000  00000180
       0000000000000000  0000000000000000           0     0     1
  [ 9] .debug_info       PROGBITS         0000000000000000  00000180
       0000000000000064  0000000000000000           0     0     1
  [10] .rela.debug_info  RELA             0000000000000000  00000618
       0000000000000108  0000000000000018   I      21     9     8

再看Import::import

从obj文件可以看到,导出数据位于".go_export"节,在实现的时候也会解析obj文件,seek到该节所在位置开始读取数据。这样即使go编译之后生成的obj文件即使很大,基于文件的操作还是可以最大限度的避免磁盘读取,并且只是读取真正需要导入的数据。

///@file: gcc\go\go-backend.c
/* The segment name we pass to simple_object_start_read to find Go
   export data.  */

#ifndef GO_EXPORT_SEGMENT_NAME
#define GO_EXPORT_SEGMENT_NAME "__GNU_GO"
#endif

/* The section name we use when reading and writing export data.  */

#ifndef GO_EXPORT_SECTION_NAME
#define GO_EXPORT_SECTION_NAME ".go_export"
#endif
///...

/* The go_read_export_data function is called by the Go frontend
   proper to read Go export data from an object file.  FD is a file
   descriptor open for reading.  OFFSET is the offset within the file
   where the object file starts; this will be 0 except when reading an
   archive.  On success this returns NULL and sets *PBUF to a buffer
   allocated using malloc, of size *PLEN, holding the export data.  If
   the data is not found, this returns NULL and sets *PBUF to NULL and
   *PLEN to 0.  If some error occurs, this returns an error message
   and sets *PERR to an errno value or 0 if there is no relevant
   errno.  */

const char *
go_read_export_data (int fd, off_t offset, char **pbuf, size_t *plen,
		     int *perr)
{
  simple_object_read *sobj;
  const char *errmsg;
  off_t sec_offset;
  off_t sec_length;
  int found;
  char *buf;
  ssize_t c;

  *pbuf = NULL;
  *plen = 0;

  sobj = simple_object_start_read (fd, offset, GO_EXPORT_SEGMENT_NAME,
				   &errmsg, perr);
  if (sobj == NULL)
    {
      /* If we get an error here, just pretend that we didn't find any
	 export data.  This is the right thing to do if the error is
	 that the file was not recognized as an object file.  This
	 will ignore file I/O errors, but it's not too big a deal
	 because we will wind up giving some other error later.  */
      return NULL;
    }

  found = simple_object_find_section (sobj, GO_EXPORT_SECTION_NAME,
				      &sec_offset, &sec_length,
				      &errmsg, perr);
  simple_object_release_read (sobj);
  if (!found)
    return errmsg;

  if (lseek (fd, offset + sec_offset, SEEK_SET) < 0)
    {
      *perr = errno;
      return _("lseek failed while reading export data");
    }

  buf = XNEWVEC (char, sec_length);
  if (buf == NULL)
    {
      *perr = errno;
      return _("memory allocation failed while reading export data");
    }

  c = read (fd, buf, sec_length);
  if (c < 0)
    {
      *perr = errno;
      free (buf);
      return _("read failed while reading export data");
    }

  if (c < sec_length)
    {
      free (buf);
      return _("short read while reading export data");
    }

  *pbuf = buf;
  *plen = sec_length;

  return NULL;
}

wrap up

在gcc编译go代码的时候,会在生成的obj文件中包含一份以(格式精简的)文本格式表示的导出信息。例如Package的名字,导出函数的名字、参数、返回值信息,导出聚合类型的各个字段和类型信息。

由于这些信息内容非常精简(格式简单,内容少),只包含了类型之类的声明信息,所以在打开一个obj文件的时候,根据obj文件格式找到导出数据在文件中的位置并seek过去,进而从这些简单的字符串中还原出声明信息。

和C/C++的include实现相比,避免了很多的文本处理(宏的展开,各种不需要信息的干扰和耦合等),从而编译速度更快。