Do you ever look at the report of 'cat /proc/meminfo' and wonder why the reported MemTotal is not equal to the amount of memory you have for the system? The followed is an example report from one of my systems. It shows the total memory is 60860kB (59.4MB) from line 2:
1 # cat /proc/meminfo 2 MemTotal: 60860 kB 3 MemFree: 47448 kB 4 Buffers: 0 kB 5 Cached: 7180 kB 6 SwapCached: 0 kB 7 ...
which is different from the amount of memory that bootloader gave to kernel, 64MB.
1 ... 2 ## Transferring control to Linux (at address 800042f0) ... 3 ## Giving linux memsize in MB, 64
The total physical memory is 64MB (0x4000000) according to the kernel log line 4. And the memory available (line 20) is 59176k because 6360k bytes is reserved. As we know that available = total - reserved. (59176k = 65536k - 6360k).
1 # dmesg 2 ... 3 Determined physical RAM map: 4 memory: 04000000 @ 00000000 (usable) 5 Initrd not found or empty - disabling initrd 6 Zone PFN ranges: 7 Normal 0x00000000 -> 0x00004000 8 Movable zone start PFN for each node 9 early_node_map[1] active PFN ranges 10 0: 0x00000000 -> 0x00004000 11 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 16256 12 Kernel command line: console=ttyS1,57600n8 root=/dev/ram0 rootfstype=squashfs 13 PID hash table entries: 256 (order: -2, 1024 bytes) 14 Dentry cache hash table entries: 8192 (order: 3, 32768 bytes) 15 Inode-cache hash table entries: 4096 (order: 2, 16384 bytes) 16 Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes. 17 Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes 18 Writing ErrCtl register=00000000 19 Readback ErrCtl register=00000000 20 Memory: 59176k/65536k available (2905k kernel code, 6360k reserved, 980k data, 1684k init, 0k highmem) 21 ... 22 Freeing unused kernel memory: 1684k freed 23 ...
However, you may notice that MemTotal and the available memory are not equal. It is because kernel can free memory reserved for initialization during booting, if the memory is not going to used again. Developers mark those codes with a specific attribute, and link them into the .init section. Kernel will release the memory by invoking free_initmem() when initial bootup is done.
Thus, MemTotal = total - reserved + .init
In the example, 60860k = 65536k - 6360k + 1684k
The question is what does kernel reserved those memory for? Most of reserved memory is used to store kernel image. Take a look to the information read from vmlinux as the followed, sections with number within 0 and 13 are required, including the well known text, data, and bss sections. We can conclude that more than 0x5a7b90 bytes of memory could be used. The minimum unit of reserved memory is 1 page (4K bytes by default). Hense 0x5a7b90 will be 4K aligned instead, which is 0x5a8000.
1 $ readelf -S vmlinux 2 There are 19 section headers, starting at offset 0x576694: 3 4 Section Headers: 5 [Nr] Name Type Addr Off Size ES Flg Lk Inf Al 6 [ 0] NULL 00000000 000000 000000 00 0 0 0 7 [ 1] .text PROGBITS 80000000 002000 2d66f8 00 AX 0 0 2048 8 [ 2] __ex_table PROGBITS 802d6700 2d8700 001b20 00 A 0 0 4 9 [ 3] .rodata PROGBITS 802d9000 2db000 0b09fc 00 A 0 0 32 10 [ 4] __ksymtab PROGBITS 803899fc 38b9fc 004d20 00 A 0 0 4 11 [ 5] __ksymtab_gpl PROGBITS 8038e71c 39071c 0022c8 00 A 0 0 4 12 [ 6] __ksymtab_strings PROGBITS 803909e4 3929e4 00f55e 00 A 0 0 1 13 [ 7] __param PROGBITS 8039ff44 3a1f44 0010bc 00 A 0 0 4 14 [ 8] .data PROGBITS 803a2000 3a4000 0299c0 00 WA 0 0 8192 15 [ 9] .data..shared_ali PROGBITS 803cb9c0 3cd9c0 000080 00 WA 0 0 32 16 [10] .init.text PROGBITS 803cc000 3ce000 024270 00 AX 0 0 4 17 [11] .init.data PROGBITS 803f0270 3f2270 17f6f2 00 WA 0 0 8 18 [12] .exit.text PROGBITS 8056f964 571964 0015d8 00 AX 0 0 4 19 [13] .bss NOBITS 80571000 572f3c 036b90 00 WA 0 0 4096 20 [14] .mdebug.abi32 PROGBITS 805a7b90 572f3c 000000 00 0 0 1 21 [15] .comment PROGBITS 00000000 572f3c 0036a2 00 0 0 1 22 [16] .shstrtab STRTAB 00000000 5765de 0000b3 00 0 0 1 23 [17] .symtab SYMTAB 00000000 57698c 063290 10 18 18308 4 24 [18] .strtab STRTAB 00000000 5d9c1c 077ce3 00 0 0 1 25 Key to Flags: 26 W (write), A (alloc), X (execute), M (merge), S (strings) 27 I (info), L (link order), G (group), x (unknown) 28 O (extra OS processing required) o (OS specific), p (processor specific)
To dig deeper, let's turn the bootmem debugging on by adding "bootmem_debug=1" to the kernel command line. We focus on MIPS in this example, and implementation of bootmem allocator for other architectures could be slightly different. But the basic idea is the same. Bootmem allocator uses a bitmap structure representing the use of memory. It allows us to allocate memory by simply setting or clearing the corresponding bit, before getting the zone allocator ready.
1 # dmesg 2 ... 3 Determined physical RAM map: 4 memory: 04000000 @ 00000000 (usable) 5 bootmem::init_bootmem_core nid=0 start=0 map=5a8 end=4000 mapsize=800 6 bootmem::mark_bootmem_node nid=0 start=5a8 end=4000 reserve=0 flags=0 7 bootmem::__free nid=0 start=5a8 end=4000 8 bootmem::mark_bootmem_node nid=0 start=5a8 end=5a9 reserve=1 flags=0 9 bootmem::__reserve nid=0 start=5a8 end=5a9 flags=0 10 Initrd not found or empty - disabling initrd 11 Zone PFN ranges: 12 Normal 0x00000000 -> 0x00004000 13 Movable zone start PFN for each node 14 early_node_map[1] active PFN ranges 15 0: 0x00000000 -> 0x00004000 16 bootmem::alloc_bootmem_core nid=0 size=80000 [128 pages] align=20 goal=1000000 limit=0 17 bootmem::__reserve nid=0 start=1000 end=1080 flags=1 18 bootmem::alloc_bootmem_core nid=0 size=8 [1 pages] align=20 goal=1000000 limit=0 19 bootmem::__reserve nid=0 start=1080 end=1081 flags=1 20 bootmem::alloc_bootmem_core nid=0 size=200 [1 pages] align=20 goal=1000000 limit=0 21 bootmem::__reserve nid=0 start=1081 end=1081 flags=1 22 bootmem::alloc_bootmem_core nid=0 size=1c [1 pages] align=20 goal=1000000 limit=0 23 bootmem::__reserve nid=0 start=1081 end=1081 flags=1 24 bootmem::alloc_bootmem_core nid=0 size=49 [1 pages] align=20 goal=1000000 limit=0 25 bootmem::__reserve nid=0 start=1081 end=1081 flags=1 26 bootmem::alloc_bootmem_core nid=0 size=49 [1 pages] align=20 goal=1000000 limit=0 27 bootmem::__reserve nid=0 start=1081 end=1081 flags=1 28 Built 1 zonelists in Zone order, mobility grouping on. Total pages: 16256 29 Kernel command line: console=ttyS1,57600n8 root=/dev/ram0 rootfstype=squashfs bootmem_debug=1 30 bootmem::alloc_bootmem_core nid=0 size=400 [1 pages] align=20 goal=1000000 limit=0 31 bootmem::__reserve nid=0 start=1081 end=1081 flags=1 32 PID hash table entries: 256 (order: -2, 1024 bytes) 33 bootmem::alloc_bootmem_core nid=0 size=8000 [8 pages] align=20 goal=1000000 limit=0 34 bootmem::__reserve nid=0 start=1081 end=1089 flags=1 35 Dentry cache hash table entries: 8192 (order: 3, 32768 bytes) 36 bootmem::alloc_bootmem_core nid=0 size=4000 [4 pages] align=20 goal=1000000 limit=0 37 bootmem::__reserve nid=0 start=1089 end=108d flags=1 38 Inode-cache hash table entries: 4096 (order: 2, 16384 bytes) 39 Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes. 40 Primary data cache 32kB, 4-way, PIPT, no aliases, linesize 32 bytes 41 Writing ErrCtl register=00000000 42 Readback ErrCtl register=00000000 43 bootmem::free_all_bootmem_core nid=0 start=0 end=4000 aligned=1 44 bootmem::free_all_bootmem_core nid=0 released=39cb 45 Memory: 59176k/65536k available (2905k kernel code, 6360k reserved, 980k data, 1684k init, 0k highmem) 46 ... 47 Freeing unused kernel memory: 1684k freed 48 ...
bootmem_init() initialises the bootmem allocator and setup initrd related data if needed. It calls init_bootmem_node() and calls init_bootmem_core(), and reserves all the pages intaially. In function bootmem_init(), the variable mapstart comes from the symbol _end. It is the first free PFN (page frame number) and states the location of bootmem bitmap.
the _end from symbol table:
1 $ tail System.map 2 805a7ab0 b sit_net_id 3 805a7ac0 b ip6_tnl_net_id 4 805a7ad0 B br_should_route_hook 5 805a7ae0 b br_fdb_cache 6 805a7ae4 b fdb_salt 7 805a7af0 b br_mac_zero_aligned 8 805a7b00 B vlan_net_id 9 805a7b04 b vlan_group_hash 10 805a7b90 A __bss_stop 11 805a7b90 A _end
Line 5 of kernel log tells us that: it iniailises node ID 0 (This is an UMA system and only 1 node exists.), PFN range is from 0 to 0x3999, and the bootmem bitmap is located at PFN 0x5a8 and of size 0x800 bytes. 64MB memory needs pages of number 64M/4K = 16K = 0x4000, and 16KB pages needs bootmem bitmap of size 16K/8 = 2K = 0x800 bytes.
Line 6~7: bootmem_init() then calls free_bootmem() and calls mark_bootmem(). It marks pages after the location of kernel image usable, and the ranage is within PFN 0x5a8 and 0x3999.
Line 8~9: bootmem_init() then calls reserve_bootmem() and calls mark_bootmem(). It reserve memory for bootmem bitmap. The bootmap size is 0x800 bytes, less than 1 page, so PFN 0x5a8 is enough.
Line 11~15: kernel initialises paging system, calls free_area_init_nodes() to initialise all pg_data_t and zone data, and prints information of all zones, movable zones, and early node map.
Line 16~17: free_area_init_nodes() calls free_area_init_node() for each node. It calculate total pages of this node, and calls alloc_node_mem_map() to allocate memory for memory map. The size of one page structure is 32 bytes, and we need 0x4000 pages totally. So the allocation size would be 0x32 * 0x4000 = 0x80000 bytes, which is 128 pages (0x80000 / 0x1000). The goal 0x1000000 means that it allocates memory in normal zone. It skips the first 16MB of DMA addresses. Because the address 0x1000000 is PFN 0x1000, alloc_bootmem_core() will allocate pages starting from PFN 0x1000.
Line 18~21: free_area_init_node() calls free_area_init_core() to built the memory map, and initialise freelists and buddy bitmaps of every zone in the node. It calls setup_usemap() to allocate bootmem for pageblock_flags. The usemap size is 0x4000 / 0x400 * 3, rounded up 32, then / 8 = 8 bytes (line 18). During zone data initialization, free_area_init_core() also calls zone_wait_table_init() to initialise wait queue hash table. Size of wait_queue_head_t is 8 bytes and number of wait queue is 64, so the allocation size would be 8 * 0x40 = 0x200 bytes (line 20).
Line 21: Why would the start and end PFN be the same? Line 19 reserves less than 1 page, and alloc_bootmem_core() will attemp to use a free fragment of the last allocated page. It calculates allocation size each time called, and sums up the information in bdata->last_end_off. When last_end_off is less than a page, a free fragment exists. Next time allocating bootmem, it will try to use the fragment if its size is enough. We may see that line 21, 23, 25, 27, and 31 shared the same page.
Line 22~23: Kernel tries to allocate bootmem for resource in resource_init(). The size of resource structure is 28 (0x1c) bytes.
Line 24~27: Kernel tries to allocate bootmem for saved_command_line and static_command_line in setup_command_line(), and both of them are of length 72 (0x49) bytes. The command line is also printed in line 29.
Line 28: Number of pages in this zone is 16384 (0x4000), and 128 of them are used for memory map. Therefore, the present_pages is 16384 - 128 = 16256.
Line 30~32: Kernel tries to allocate bootmem for PID hash table in pidhash_init(). pid_hash is 4 bytes, and the table is 256 * 4 = 1024 (0x400) bytes.
Line 33~38: Kernel tries to allocate bootmem for Dentry cache and Inode cache hash table in vfs_caches_init_early(). 1 hash list structure is 4 bytes. 8192 Dentry cache hash entries woul be 8192 * 4 = 0x8000 bytes. And 4096 Inode cache hash entries would be 0x4000 bytes.
Line 43~44: When the memory sub system is ready, it calls free_all_bootmem() that calls free_all_bootmem_core() for each bootmem bitmap to release all free pages to the buddy allocator. It examines the node_bootmem_map for all pages in the node (PFN 0 ~ 0x4000), calls __free_pages_bootmem() to release free either 32 pages or 1 page each time (depending on vec is all 1s or not). Free pages include PFN 0x5a9 ~ 0xfff and 0x108d ~ 0x3fff, and the total number of them is 0x39ca. And we can check the size of reserved memory: (0x4000 - 0x39ca) * 4096 = 6360K bytes. After freeing those pages, the pages storing bootmem bitmap can also be freed. It is 1 page in this example. Thus, the total number of released pages is 0x39cb (line 44).
Line 47: When kernel completes initial bootup and ready to start user-mode stuff, it calls free_initmem() to free .init section between symbol __init_begin and __init_end.
In conclusion,
- available memory = total - reserved
- MemTotal = total - reserved + .init
- Bootmem allocator uses a simple bitmap representing the use of memory.
- Reserved memory is mainly used to store kernel image.
- The rest of reserved memory is for memory map, various hash tables, and some miscellaneous data.
- After the memory sub system and zone allocator get ready, it will release the memory used for bootmem bitmap (bootmem retires).
- When initial bootup is done, it will release the .init section that will never be used again.
3 comments:
Thanks learnt something new today
Could you take down the picture show on the left site of your page?
As you wish.
Post a Comment