Sunday, November 8, 2009

Understand Weak Symbols by Examples

Wikipedia defines the weak symbols: "In computing, a weak symbol is a symbol definition in an object file or dynamic library that may be overridden by other symbol definitions, its value will be zero if no definition found by loader." In other words, we can define a symbol that doesn't need to be resolved at link time. It is a very well-known feature and used a lot in Linux Kernel, Glibc, and so on.



Take a look at the example, we are not able to compile it due to the 'undefined reference' error.

$ cat err.c
int main(void)
{
        f();
        return 0;
}

$ gcc err.c
/tmp/ccYx7WNg.o: In function `main':
err.c:(.text+0x12): undefined reference to `f'
collect2: ld returned 1 exit status


Try to declare 'f' as an weak symbol, and we can compile it without error.
$ cat weak.c
void __attribute__((weak)) f();
int main(void)
{
        if (f)
        f();
        return 0;
}
$ gcc weak.c


Note that the function 'f' is called inside an if statement. If not calling 'f' this way, we will get a 'Segmentation fault' error. In the weak.c example, 'f' is actually not invoked. It is because 'f' is an un-defined weak symbol and therefore will be zero when the loader cannot find it.
$ ./a.out
$ nm a.out
...
w f
08048324 T main
...


Let's define the function 'f' in another file, and link the objects together. This time, 'f' will be correctly called. (Note that puts is the optimization of printf by gcc)
$ cat f.c
#include <stdio.h>
void f(void)
{
        printf("hello from f\n");
}

$ gcc -c weak.c f.c
$ gcc -o weak weak.o f.o
$ ./weak
hello from f

$ nm weak.o
w f
00000000 T main
$ nm f.o
00000000 T f
U puts
$ nm weak
...
08048384 T f
08048354 T main
U puts@@GLIBC_2.0
...


We may even override the original weak symbol (type 'W') with a strong symbol (type 'T').
$ cat orig.c
#include <stdio.h>
void __attribute__((weak)) f()
{
        printf("original f..\n");
}
int main(void)
{
        f();
        return 0;
}
$ gcc orig.c
$ ./a.out
original f..


$ cat ovrd.c
#include <stdio.h>
void f(void)
{
        printf("overridden f!\n");
}
$ gcc -c orig.c ovrd.c
$ gcc -o ovrd orig.o ovrd.o
$ ./ovrd
overridden f!


$ nm orig.o
00000000 W f
00000014 T main
U puts
$ nm ovrd.o
00000000 T f
U puts
$ nm ovrd
...
0804838c T f
08048368 T main
U puts@@GLIBC_2.0
...


And of course, we can also override a weak object (type 'V') with a strong object (type 'D').
$ cat orig-obj.c
#include <stdio.h>
int __attribute__((weak)) x = 1;
int __attribute__((weak)) y = 1;
int main(void)
{
        printf("x = %d, y = %d\n", x, y);
        return 0;
}
$ gcc orig-obj.c
$ ./a.out
x = 1, y = 1


$ cat ovrd-obj.c
int x = 2;
void f(void)
{
}
$ gcc -c orig-obj.c ovrd-obj.c
$ gcc -o ovrd-obj orig-obj.o ovrd-obj.o
$ ./ovrd-obj
x = 2, y = 1


$ nm orig-obj.o
00000000 T main
U printf
00000000 V x
00000004 V y
$ nm ovrd-obj.o
00000000 T f
00000000 D x
$ nm ovrd-obj
...
08048394 T f
08048354 T main
U printf@@GLIBC_2.0
080495c8 D x
080495c4 V y
...


What if there are multiple symbols? Linker's symbol rules tell us that:
  1. Multiple strong symbols are not allowed
  2. Given a strong symbol and multiple weak symbols --> choose the strong symbol
  3. Given multiple weak symbols --> choose any of those weak symbols

$ cat mul.c
int main(void)
{
        f();
        return 0;
}
$ cat s1.c
#include <stdio.h>
void f(void)
{
        printf("1st strong f from %s\n", __FILE__);
}
$ cat s2.c
#include <stdio.h>
void f(void)
{
        printf("2nd strong f from %s\n", __FILE__);
}
$ cat w1.c
#include <stdio.h>
void __attribute__((weak)) f(void)
{
        printf("1st weak f from %s\n", __FILE__);
}
$ cat w2.c
#include <stdio.h>
void __attribute__((weak)) f(void)
{
        printf("2nd weak f from %s\n", __FILE__);
}
$ gcc -c mul.c s1.c s2.c w1.c w2.c


$ gcc -o test1 mul.o s1.o s2.o
s2.o: In function `f':
s2.c:(.text+0x0): multiple definition of `f'
s1.o:s1.c:(.text+0x0): first defined here
collect2: ld returned 1 exit status


$ gcc -o test2 mul.o s1.o w1.o w2.o
$ ./test2
1st strong f from s1.c


$ gcc -o test3-1 mul.o w1.o w2.o
$ ./test3-1
1st weak f from w1.c
$ gcc -o test3-2 mul.o w2.o w1.o
$ ./test3-2
2nd weak f from w2.c


Hope these examples help!

References:
  1. Wikipedia, "Weak Symbol"
  2. Embedded Bits, "Digging Deepter into Weak Symbols"
  3. gcc manual, "Declaring Attributes of Functions"
  4. binutil Document, "nm"
  5. Sandeep Grover, "Linkers & Loaders - A Programmers Perspective"

10 comments:

Anonymous said...

Very nice. Thanks.

Anonymous said...

Nice!

Anonymous said...

Very clear now, thanks!

Ramprakash Jelari said...

Good One!!!.

ender said...

A very clear explanation. Many thanks.
Ender.

Anonymous said...

Very good.

Gordon said...

It's not so simple.

Let's look at your test2 example,
but this time changing the order
in which the objects are named on
the link line:

$ gcc -o test2 mul.o w1.o s1.o w2.o
$ ./test2
1st strong f from s1.c

Here we gave the weak before the strong,
but the strong definition was selected.
This is as we expect.

Now let's do something a little
different. We're going to package
the weak and strong definitions
into library files:

$ ar qs libw.a w1.o w2.o
$ ar qs libs.a s1.o s2.o

Now we'll link, naming w before s:

$ gcc -o test2 mul.o -L. -lw -ls
$ ./test2
1st weak f from w1.c

Now change the order in which the
libraries are named:

$ gcc -o test2 mul.o -L. -ls -lw
$ ./test2
1st strong f from s1.c

Clearly the rules are different
for libraries. It's choosing the
first it sees. Can you explain?

Winfred said...

Hi Gordon,

It is a very good question.

Gcc (actually the linker) treats libraries differently.
By default, it tries use the first found symbol and does not detect duplicated symbols.

However, we have options to force it to look for all the symbols.

$ ar qs libw.a w1.o w2.o
$ ar qs libs.a s1.o s2.o
$ gcc -o test2 mul.o -L. -Wl,--whole-archive -lw -ls -Wl,--no-whole-archive
./libs.a(s2.o): In function `f':
s2.c:(.text+0x0): multiple definition of `f'
./libs.a(s1.o):s1.c:(.text+0x0): first defined here
collect2: error: ld returned 1 exit status


Now, let's re-create the libraries and link them as you wish.
$ rm libw.a libs.a
$ ar qs libw.a w1.o w2.o
$ ar qs libs.a s1.o
$ gcc -o test2 mul.o -L. -Wl,--whole-archive -lw -ls -Wl,--no-whole-archive
$ ./test2
1st strong f from s1.c


See. It searches the whole archive and use the strong symbol even it found weak ones before.

Very simple, isn't it. :)

Unknown said...

nice,thx

Joaquim said...

these code(__attribute__ (weak)) isn't compatible with Visual Studio.
can you advice me more, please?
#if defined __GNUC__
#define EVENT [[gnu::weak]]
#elif defined __clang__
#define EVENT [[llvm::weak]]
#endif
but isn't ANSI, so i can't use on Visual Studio too :(
can you advice me more about it, please?