Monday, November 16, 2009

Learning x86 Calling Conventions by Examples

The x86 architecture features many different calling conventions. There are three major calling conventions in use: __cdecl, __stdcall, and __fastcall. We are going to force the compiler to generate assembly with different calling conventions in order to see how they work.



Here is a simple function in C:

int f(int a, int b)
{
return a + b;
}

GCC compiler uses __cdecl by default (It is also the default calling convention for Microsoft C/C++ compiler). We will get the same result as __cdecl without extra compiler options or configurations.

Let's call the simple function with __cdecl calling convention:
int __attribute__((cdecl)) f(int a, int b)
{
return a + b;
}
void caller(void)
{
f(3, 5);
}


The assembly generated by gcc for __cdecl would be:
 1 f:
 2         pushl  %ebp
 3         movl   %esp, %ebp
 4         movl   12(%ebp), %eax
 5         addl   8(%ebp), %eax
 6         popl   %ebp
 7         ret
 8 caller:
 9         pushl  %ebp
10         movl   %esp, %ebp
11         subl   $8, %esp
12         movl   $5, 4(%esp)
13         movl   $3, (%esp)
14         call   f
15         leave
16         ret


Calling a __cdecl function line by line:

Push parameters onto the stack, from right to left. In this example, 5 first and then 3.
12         movl   $5, 4(%esp)
13         movl   $3, (%esp)


Call the function f. The process will push the content of %eip onto the stack (for the return address of function f).
14         call   f


Since we are in the new function f, a new local stack frame is needed. We push the current %ebp (which belongs to the caller's frame) onto the stack, and make it point to the top of the stack (%esp).
 2         pushl  %ebp
 3         movl   %esp, %ebp


After %ebp is changed, the arguments of the function f will be referred as 8(%ebp) and 12(%ebp). They are summed to %eax, which will be the return value to the caller. Note that 0(%ebp) is the old (caller's) base pointer, and 4(%ebp) is the old instruction pointer.
 4         movl   12(%ebp), %eax
 5         addl   8(%ebp), %eax


Restore the saved base pointer %ebp.
 6         popl   %ebp


Return from the function. It will pop the old %eip and jump to the location.
 7         ret


Note that line 9, 10 and line 2, 3 pairs are the same. It is the function prologue used for every function. Line 8 reserves space for arguments of function f. Line 9 to 11 together is actually the same as ENTER instruction. The compiler uses 3 instructions instead of ENTER instruction for performance.
 9         pushl  %ebp
10         movl   %esp, %ebp
11         subl   $8, %esp


LEAVE instruction will set %esp to %ebp, then pop %ebp. It is the epilogue used for every function. Since the stack is intact in function f, we don't have to set %esp to %ebp, popping %ebp is enough. Compiler uses LEAVE instead of 2 instructions also for performance.
15         leave


As we can see, main characteristics of __cdecl calling convention are:
  1. Arguments are passed from right to left, and placed on the stack.
  2. Stack cleanup is performed by the caller.

How about calling the function with __stdcall?
int __attribute__((stdcall)) f(int a, int b)
{
return a + b;
}
void test(void)
{
f(3, 5);
}


The assembly for __stdcall would be:
 1 f:
 2         pushl  %ebp
 3         movl   %esp, %ebp
 4         movl   12(%ebp), %eax
 5         addl   8(%ebp), %eax
 6         popl   %ebp
 7         ret    $8
 8 test:
 9         pushl  %ebp
10         movl   %esp, %ebp
11         subl   $8, %esp
12         movl   $5, 4(%esp)
13         movl   $3, (%esp)
14         call   f
15         subl   $8, %esp
16         leave
17         ret

The only difference from __cdecl is that it uses "ret 8" (line 7) for self clean up stack. Therefore, the caller needs to "subl 8, %esp" (line 15) in order to retrieve %ebp back before using leave instruction.

As we can see, main characteristics of __stdcall calling convention are:
  1. Arguments are passed from right to left, and placed on the stack.
  2. Stack cleanup is performed by the called function.

And how about calling the function with __fastcall?
int __attribute__((fastcall)) f(int a, int b)
{
return a + b;
}
void test(void)
{
f(3, 5);
}


The assembly generated by gcc for __fastcall would be:
 1 f:
 2         pushl  %ebp
 3         movl   %esp, %ebp
 4         subl   $8, %esp
 5         movl   %ecx, -4(%ebp)
 6         movl   %edx, -8(%ebp)
 7         movl   -8(%ebp), %eax
 8         addl   -4(%ebp), %eax
 9         leave
10         ret
11 test:
12         pushl  %ebp
13         movl   %esp, %ebp
14         movl   $5, %edx
15         movl   $3, %ecx
16         call   f
17         popl   %ebp
18         ret

It indicates that the arguments should be placed in registers, rather than on the stack, whenever possible. The argument first argument 3 is placed in %ecx (line 15) and the second argument 5 is in %edx (line 14). Function f copy the arguments to its own stack, and calculate with the stack.

What if calling a __fastcall function more thatn two argument?
int __attribute__((fastcall)) f(int a, int b, int c)
{
return a + b + c;
}
void test(void)
{
f(3, 5, 7);
}

 1 f:
 2         pushl  %ebp
 3         movl   %esp, %ebp
 4         subl   $8, %esp
 5         movl   %ecx, -4(%ebp)
 6         movl   %edx, -8(%ebp)
 7         movl   -8(%ebp), %eax
 8         addl   -4(%ebp), %eax
 9         addl   8(%ebp), %eax
10         leave
11         ret    $4
12 test:
13         pushl  %ebp
14         movl   %esp, %ebp
15         subl   $4, %esp
16         movl   $7, (%esp)
17         movl   $5, %edx
18         movl   $3, %ecx
19         call   f
20         subl   $4, %esp
21         leave
22         ret

The third argument 7 is pushed onto stack (line 16). And function f uses "ret 4" for cleaning up the stack. Therefore, we may conclude that main characteristics of __stdcall calling convention are:
  1. The first two function arguments that require 32 bits or less are placed into registers ECX and EDX.
  2. The rest of them are pushed on the stack from right to left.
  3. If any stack based arguments were present, the callee cleans them off of the stack
These simple examples are just for providing a basic idea of how these 3 calling convention work. For more detail, the followed links are suggested to read.

References:

No comments: