C源码到可执行文件的preprocess/compile/assemble/link四阶段

发布时间 2023-07-12 13:55:12作者: sinferwu

 

C源码到可执行文件的preprocess/compile/assemble/link四阶段_zh_yt的博客-CSDN博客

 

 

 

C源码到可执行文件的preprocess/compile/assemble/link四阶段

参考资料

http://www.thegeekstuff.com/2011/10/c-program-to-an-executable/
http://courses.cms.caltech.edu/cs11/material/c/mike/misc/compiling_c.html
预处理器: https://en.wikipedia.org/wiki/Preprocessor
编译器: https://en.wikipedia.org/wiki/Compiler
链接器: https://en.wikipedia.org/wiki/Linker_(computing)
ELF格式: https://en.wikipedia.org/wiki/Executable_and_Linkable_Format

gcc help

$ gcc --help

-save-temps              Do not delete intermediate files
-E                       Preprocess only; do not compile, assemble or link
-S                       Compile only; do not assemble or link
-c                       Compile and assemble, but do not link
-o <file>                Place the output into <file>
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

对c文件,想要保留中间文件,执行下面命令:

$ gcc -Wall -save-temps hello.c

// 会生成文件
hello.i
hello.s
hello.o
a.out
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

下面一一说明。

(1)Pre-processing

-E    Stop after the preprocessing stage; do not run the compiler proper. The output is in the form of preprocessed source code, which is sent to the standard output.
      Input files that don't require preprocessing are ignored.
  • 1
  • 2

预处理的任务是:

  • Macro substitution 宏(#define)替换
  • Comments are stripped off 删注释
  • Expansion of the included files 展开包含文件(#include)的声明

预处理可以理解成:把你所写的源代码转换成扩展的完整源代码。

What the preprocessor does is convert the source code file you write into another source code file (you can think of it as a “modified” or “expanded” source code file).

使用 -save-temps 选项时,预处理的输出被存放进了 .i 文件。

包含文件的搜索路径-Idir和-iquotedir:

参考:man gcc

  • -Idir

Add the directory dir to the head of the list of directories to be searched for header files. This can be used to override a system header file, substituting your own version, since these directories are searched before the system header file directories.

If you use more than one -I option, the directories are scanned in left-to-right order; the standard system directories come after.

If you really need to change the search order for system directories, see the -nostdinc and/or -isystem options.

  • -iquotedir

Add the directory dir to the head of the list of directories to be searched for header files only for the case of #include "file"; they are not searched for #include <file>, otherwise just like -I.

关于双引号#include "file"的搜索路径:

参考:https://gcc.gnu.org/onlinedocs/cpp/Search-Path.html

GCC looks for headers requested with #include "file" first in the directory containing the current file, then in the directories as specified by -iquote options, then in the same places it would have looked for a header requested with angle brackets.

包含文件搜索路径小结:

  • 尖括号#include <file>-I和系统default的目录这两个地方搜索。
  • 双引号#include "file"从:1当前目录,2-iquote指定的目录,3#include <file>使用的搜索目录。

包含文件搜索路径的实验一

/*
 * 源码文件结构:
 * |-main.c
 * |-zzz
 *     |-myfile.h
 */

// main.c源码
#include <stdio.h>
#include "myfile.h"

int main(int argc, char **argv)
{
    printf("hello\n");
    return 0;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16

编译1:

$ gcc -Wall main.c

main.c:2:20: fatal error: myfile.h: 没有那个文件或目录
 #include "myfile.h"
                    ^
compilation terminated.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7

编译2:

$ gcc -v -I ./zzz main.c

#include "..." search starts here:
#include <...> search starts here:
 ./zzz
 /usr/lib/gcc/x86_64-linux-gnu/4.8/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

编译3:

$ gcc -v -iquote ./zzz main.c

#include "..." search starts here:
 ./zzz
#include <...> search starts here:
 /usr/lib/gcc/x86_64-linux-gnu/4.8/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

从实验一的打印信息可以看出-Idir和-iquotedir的区别。

包含文件搜索路径的实验二

/*
 * 源码文件结构:
 * |-main.c
 * |-myfile2.h
 * |-zzz
 *     |-myfile.h
 */

// main.c源码
#include <stdio.h>
#include "myfile2.h"
#include "myfile.h"

int main(int argc, char **argv)
{
    printf("hello\n");
    return 0;
}
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

编译4:

$ gcc -Wall main.c

main.c:3:20: fatal error: myfile.h: 没有那个文件或目录
 #include "myfile.h"
                    ^
compilation terminated.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6

编译5:

$ ls -F
main.c  myfile2.h  zzz/

----------

$ gcc -v -iquote ./zzz  main.c

#include "..." search starts here:
 ./zzz
#include <...> search starts here:
 /usr/lib/gcc/x86_64-linux-gnu/4.8/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.

----------

$ ls -F
a.out*  main.c  myfile2.h  zzz/
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21

编译6:
为了测试需要,在当前目录下新建一个空目录:

$ ls -F
main.c  myfile2.h  zzz/

$ mkdir yyy

$ ls -F
main.c  myfile2.h  yyy/  zzz/
----------

(这里命令行里故意把-I放在-iquote前面:)
$ gcc -v -I ./yyy -iquote ./zzz  main.c

#include "..." search starts here:
 ./zzz
#include <...> search starts here:
 ./yyy
 /usr/lib/gcc/x86_64-linux-gnu/4.8/include
 /usr/local/include
 /usr/lib/gcc/x86_64-linux-gnu/4.8/include-fixed
 /usr/include/x86_64-linux-gnu
 /usr/include
End of search list.

----------
$ ls -F
a.out*  main.c  myfile2.h  yyy/  zzz/

$ ./a.out 
hello
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22
  • 23
  • 24
  • 25
  • 26
  • 27
  • 28
  • 29

从实验二的打印信息可以看出,虽然没有显式地打印出使用了当前目录,但GCC实际上确实使用了当前目录来搜索双引号的头文件。并且可以看出先-iquote目录、后-I目录的搜索顺序。

(2)Compilation

-S    Stop after the stage of compilation proper; do not assemble. The output is in the form of an assembler code file for each non-assembler input file specified.
      By default, the assembler file name for a source file is made by replacing the suffix .c, .i, etc., with .s.
      Input files that don't require compilation are ignored.
  • 1
  • 2
  • 3

编译的任务是:

  • 生成汇编指令代码文件

使用 -save-temps 选项时,编译的输出被存放进了 .s 文件。

(3)Assembly

-c    Compile or assemble the source files, but do not link. The linking stage simply is not done. The ultimate output is in the form of an object file for each source file.
      By default, the object file name for a source file is made by replacing the suffix .c, .i, .s, etc., with .o.
      Unrecognized input files, not requiring compilation or assembly, are ignored.
  • 1
  • 2
  • 3
  • 4

汇编的任务是:

  • 把汇编指令翻译成二进制的目标文件(参见ELF格式)

本阶段只有本文件存在的代码才被翻译成二进制机器语言。

At this stage only the existing code is converted into machine language, the function calls like printf() are not resolved.

使用 -save-temps 选项时,汇编的输出被存放进了 .o 即目标文件。

注意:在不细分的情况下,可以把编译(compile)和汇编(assembly)两个过程合称为编译。

(4)Linking

-o file
    Place output in file file. This applies to whatever sort of output is being produced, whether it be an executable file, an object file, an assembler file or preprocessed C code.
    If -o is not specified, the default is to put an executable file in a.out, the object file for source.suffix in source.o, its assembler file in source.s, a precompiled header file in source.suffix.gch, and all preprocessed C source on standard output.
  • 1
  • 2
  • 3

链接的任务是:

  • 把生成的目标文件(.obj)和库(lib)文件等链接,生成可执行(exe)文件、库(lib)文件、其他文件(如dll)等。

在前面几个阶段,gcc可能并不知道某些函数等的定义(例如printf()),只是在函数调用处放了个占位符(place-holder )。
在本阶段,printf()的函数定义被解析出来,其真实地址被放入。

As discussed earlier, till this stage gcc doesn’t know about the definition of functions like printf(). Until the compiler knows exactly where all of these functions are implemented, it simply uses a place-holder for the function call. It is at this stage, the definition of printf() is resolved and the actual address of the function printf() is plugged in.

总结

  • preprocessor 预处理器cpp

  • compiler 编译器ccas

    It does this by turning the C source code into an object code file, which is a file ending in “.o” which contains the binary version of the source code. Object code is not directly executable, though.
    In order to make an executable, you also have to add code for all of the library functions that were #included into the file (this is not the same as including the declarations, which is what #include does). This is the job of the linker.

  • linker 链接器ld

    The job of the linker is to link together a bunch of object files (.o files) into a binary executable. This includes both the object files that the compiler created from your source code files as well as object files that have been pre-compiled for you and collected into library files. These files have names which end in .a or .so, and you normally don’t need to know about them, as the linker knows where most of them are located and will link them in automatically as needed.

  • 整个过程:

sourceFile --[preprocessor]--> newSourceFile 
--[compiler]--> objectFile 
--[linker+library]--> executableFile