Skip to content

Latest commit

 

History

History
550 lines (431 loc) · 23.3 KB

File metadata and controls

550 lines (431 loc) · 23.3 KB

Advanced Programming Techniques

A. DEFINING THE BASIC MEMORY CONCEPTS

In this mini post-series we’ll explore the memory management layout of the c++ objects according to the total knowledge and experience that we have gained by far. To begin with, we will include some advanced techniques which is very low level and i dont believe that they are exist in many future posts.

Initially, for our exploration, we will need a development environment, in this case we have used CLion. In addition, we will need the GCC compiler, to combine some of the functionality that provided during debugging with the GDB compiler. We briefly make sure that the Unix machine we are using has the following versions installed.

  • Cmake or make
  • GCC - G++
  • Libc++-dev

Figure 1.1

Figure 1.2

Figure 1.3

Firstly, if we want to understand how objects are structured in memory, we need to understand how much space is allocated and which is the difference between a simple and a virtual method call. As we can see from the example above, the size of NonVirtualClass is 1, because the size in C ++ classes cannot be zero. However, the size of VirtualClass is 8 (if someone has 32-bit Architecture he will see 4) although practically one could ask why the size is different, since both classes have one method without any private or public values. This difference is due to the fact that in the second case there is a hidden pointer in the internal memory object layout with cost of 4 bytes + 4 bytes of some padding which is added. This pointer points to a Virtual Table. This static table is created for each Virtual Class and contains the virtual methods of the class. More specifically, it is an implementation of dynamic dispatch pattern. In order to have a deeper sense of what is the role of a VTable and how it is structured into memory, we will consider the overall course of our exploration for the classes under the directory VBasicUnderstand through the Execution and Debugging Screens. That is, the classes Parent and Derived.

We compile our code with the following flags and start debugging using GDB as shown in the Figure 1.4.

clang++ -std=c++14 -stdlib=libc++ -g main.cpp && gdb ./a.out

Snow

Figure 1.4

Let's analyze Figure 1.4 and let's briefly look at the most important conclusions which can be drawn.

  • As we said before, and as we can see from debugging, there are hidden pointers in VTables, so the size of the class is larger than expected although the classes have no private or public values.
  • The Vtable is the same for the objects p1, p2. The reason is that, a Vtable is static for each different object created in memory of the same type. The compiler recognizes only the static part of the code, which is created, during the setting process. So, if an assignment or a conversion is dynamically performed, we will then see how the pointer will be adjusted correctly so that the argument is dynamically passed.
  • All vtables point to an offset of space 16 in the decimal system or else in the (0x10) BYTE hexadecimal system (Let’s not forget that the main factor is the architecture of the compiler that runs our virtual machine 32 or 64). The compiler is smart enough to set a space between offsets to reduce the distance of the methods offsets in vtables. We will see a further analysis on this piece in the next section.

Continuing our exploration now we will use the command

sh Χ/size xb

of GDΒ in order to obtain a representation of the memory for the first 300 byte of memory. In this way we can further detect the memory for each object layout. For the class Derived we start from the first memory position of Derived

(gdb) x/300xb 0x8201d18

The reason we chose this memory to start is not accidental. As we can see below the pointer of Vtable is shifted in the offset 16=0x10 to 0x8201d28.

(gdb) x/300xb 0x8201d18
$5 = (Derived) {<Parent> = {_vptr.Parent = 0x8201d28 <vtable for Derived+16>}

So the header of the offset will be located in the memory address -0x10 of 0x8201d28 which means 0x8201d18. As shown in Figure 1.5.

Snow

Figure 1.5

According to the Figure 1.5 we can come to the following information as presented in Table 1.1 and Table 1.2

Address Value Description
0x8201d18 0x0 top_offset
0x8201d20 0x8201d58 Pointer to typeinfo for Parent
0x8201d28 0x8000c8e Pointer to Parent::Foo(). The pointerof vptr points here
0x8201d30 0x8000c82 Pointer to Parent::FooNotOverridden()

Table 1.1

Here is the memory layout for object Derived:

Address Value Description
0x8201d38 0x0 top_offset
0x8201d40 0x8201d70 Pointer to typeinfo for Derived
0x8201d48 0x8000c76 Pointer to Derived:Foo(). The pointer ofου vptr points here
0x8201d50 0x8000c82 Pointer to Derived:FooNotOverridden()

Table 1.2

If we wanted to see the same results programmatically, we could use this short code of the example as shown below.

#include <cstdint>

uintptr_t p = 0x0001FBDC;
int value = *reinterpret_cast<int *>(p);

Let’s remember that the pointer to Vtable for the class Derived point to +16 bytes offset into vtable. So the address of the third pointer in Table 1.2 for the layout Derived contains the pointer of the first virtual method point to a virtual table. For example, if we wanted to find the pointer of the third method for the table Vtable then we could surely write something which look like this one

2 * sizeof (void*)vptr

With a simpler way, we could give, for example, a similar command through the compiler as shown in Figure 1.6. This option was selected for finding the fields located in Table 1.1 and Table 1.2

Snow

Figure 1.6

Furthermore, in Figure 1.7 it is more understandable and we can discover the final file which includes the layout of the entire memory of our program. It can be easily created by running our program with the following option (always using GCC due to the fact that is it is a predetermined option) and contains equally useful information about the overall structure of the memory layout.

g++ -fdump-class-hierarchy -c main.cpp

Snow

Figure 1.7

Or through the Cmake file:

set(CMAKE_CXX_FLAGS  " -std=c++11 -Wall -Wno-unused-variable  -fdump-class-hierarchy -o main.cpp")

B. MULTIPLE INHERITANCE

In the previous section we saw an example with a single inheritance. In this subsection we will study an example using multiple inheritance and we will explore how the memory objects layout behaves in this case. Generally speaking the process that we will follow will be the same as before, that means using GDB. Running the previous commands we observe in Figure 2.1 the new memory layout for the classes of the objects located under the directory MultipleInheritance.

Snow

Figure 2.1

Figure 2.2

Here is the memory object layout of Child:

Address Value Description
0x8201c80 0x0 top_offset
0x8201c88 0x8201d28 Pointer to typeinfo for Child
0x8201c90 0x8000ea2 Pointr to Mother::MotherMethod() The pointer of vptr points here
0x8201c98 0x8000ec6 Child::ChildMethod()
0x8201ca0 0x8000ed2 Child::FatherFoo()
0x8201ca8 -16 top_offset
0x8201cb0 0x8201d28 Pointer to typeinfo for Child
0x8201cb8 0x8000eae Father: FatherMethod () The pointer of vptr points here
0x8201cc0 0x8000edd non-virtual thunk to Child:FatherFoo()

Table 2.1

Let’s explore the objects of type Child. The first difference we note here compared to the single inheritance, is that in the case of multiple inheritance we have objects of type Child with two vptr pointers , which is equally as the different paths which are inherited. More Specifically, in our case we saw two pointers point in two different vtable for each case, one for the class Mother and one for the class Father according to Figure 2.1. One might wonder why we should waste so much space and not having a vptr for both cases?

The answer is that practically it is not possible to happen in the multiple inheritance. Let’s take for example an object of type Child which is passed to a function that accepts a pointer to Father or Mother. Both functions wait for the pointer "this" to be adjusted at a proper offset. But this setting is practically impossible to happen with only one vptr because a method cannot simultaneously playing the role of two or more different dynamic objects (Father and mother). So, the solution to our problem is that there should be two vptr pointers . Thus, the parameter of this to be set correctly in each offset in order to call the corresponding method based on the dynamic type of the object (Father or Mother).

Our second comment now that we conclude from the Figure 2.1 and the Table 2.1 as we have exported their addresses and their values to pointers in the appropriate offset is why the fields of Child object should be placed close and sequentially after the class Mother and why they should have the same pointer 0x8201c90?

More specifically, one might wonder why there are no 3 vptr pointers in Vtables?

The answer is very simple. The compiler is smart enough to combine and merge the fields of the Child in continuous flow with the fields of the Mother saving the cost of a pointer. In order to do this, however, it must set a space between the offsets of the vtables methods in such a way that not create any overlap. Hence, we can deduce the following.

  • On the one hand, we have a negative that we add several empty positions, having a quite large VTable.
  • On the other hand, however, the compiler takes advantage of the fact that it sets big spaces, something that it enables it to easily merge methods of VTable without overlapping the offsets. (For example this may be necessary for faster Serialization).

As a last and most important remark one might wonder is the following occasion. As we can see from the source code (the classes under the directory MultipleInheritence), there is a method FatherFoo() of the class Father which is overridden by the Child. Let's suppose, for example, that we have a function that gets a pointer to Father and its implementation calls the FatherFoo then then we normally expect to call Father: FatherFoo(). But if we get an instance of Father but with a dynamic value points to Child which is like the following:

Child child;
Father *father = &child;
father->FatherFoo();
//Child::FatherFoo() will be called

due to the fact that the FatherFoo is overridden by the Child we expect to call Child: FatherFoo(). The key question now is how to properly adjust the offset with the proper pointer to "this" adjustment so as finally to call and to determine the right method of the VTable.

According to the above code. This means that statically the compiler will find the offset of the pointer for the VTable Father but before calling the method Father: FatherFoo() it will pass from a dynamic piece of code which is called thunk adjustment (is the dynamic call which adjust the pointer to this only if there is a problem just like in our example). As we can see from Table 2.1 and as shown in Figure 2.3 from the column "values", the memory position 0x8000edd, will switch directly and jump with not any special overhead at the correct memory position 0x8000ed2 in order to make the correct offset adjustment and finally call the correct method Child: FatherFoo() which overrides Father: FatherFoo().

Figure 2.3

Attention!! Note here that the problem of this special dynamic translation which is done through thunk for correcting the offset would not exist if the class Child would override from Mother and made the cast because the compiler puts the methods of child close and under the same vtable of Mother so through the pointer to "this" there would be the right method for the child without having to move to a dynamic piece of code and switch to another memory address.

C. VIRTUAL INHERITANCE

In the previous sections we have seen how the memory layout of objects exists in memory in the simple cases. The one with the single and the other with multiple inheritance. In this section we will deepen further to explore what is happening in the case of Virtual inheritance. The source code for this example is under the subdirectory Virtual_Inheritance. However, before we begin the further deepening, let's see how how we separate intuitively this two different occasions. This difference appears by defining the keyword virtual when we inherit as shown below.

Figure 3.1

Practically this means, that in the memory management layout of the VChild as shown in the following figure, we expect the Grandparent instance to be uniquely identified onle once and play the role both for the Parent 1 or 2. As opposite, with the multiple inheritance where each parent should have his own GrandParent. As a result, its offset of the virtual methods must be adjusted appropriately hence, we can visit the correct methods when they react as Parent1 or Parent2

In Figure 3.2 we observe the memory layout in the case of virtual inheritance. We are impressed by the fact that there is enough new material to study that we had not met before.

Initially our first point about this example is that we have 3 Vptr pointers which point to 3 different VTables. The reason is that the class VChild has only one unique instance of Grandparent type as we explained before. In addition, we observe a further new remarkable facts. We observe that they are exist two new tables for the VChild representation. The first one is a VTT table and the other one is a Construction vtable for Parent1-in-Child and Construction vtable for Parent2-in-Child. We will explain their role afterwards, but for now let’s continue and see something very interesting in Table 3.1 for the VChild object.

Snow

Figure 3.2

Here is the memory object layout of VChild:

Address Value Description
0x8201b30 0x20(32) virtual-base offset
0x8201b38 0x0 top_offset
0x8201b40 0x8201ce8 Pointer to typeinfo for Child
0x8201b48 0x8001226 Pointer to Parent1::parent1_foo() The Pointer of vptr points here
0x8201b50 0x8001232 Child::child_foo()
0x8201b58 0x10(16) virtual-base offset
0x8201b60 0xfffffffffffffff0(-16) top_offset
0x8201b68 0x8201ce8 Pointr to typeinfo for Child
0x8201b70 0x800121a Pointer to Parent2::parent2_foo() The Pointer of vptr points here
0x8201b78 0x0 virtual-base offset
0x8201b80 0xffffffffffffffe0(-32) top_offset
0x8201b88 0x8201ce8 Pointer to typeinfo for Child
0x8201b90 0x800120e Pointer to GrandParent::grandparent_foo(). The pointer of vptr points here

Table 3.1

Snow

Figure 3.3

A very impressive fact that not arise from memory layout of VChild is the following. In the case of Virtual inheritance when the Child object inherits from Parent and that in its turn inherits virtual the GrandParent, then the constructor of the second one is called directly. This means that the GrandParent constructor will be called firstly and after this, the constructor of Parent1 or 2 will be called. A fact that is opposed to multiple inheritance where the Parent 1 is first called, because in that case each Parent has its own Hyperclass. Furthermore, the question that now arises, is how the compiler will know where it will find the data and methods in order to got access on the Grandparent instance when we build a Parent1 or a Parent 2, since the object Grandparent has already been built before

The answer here is that the compiler needs to make the proper offset adjustment differentially from before. As a result, this lead to enters into the construction table of Parent1-child or construction table of Parent2-child (Example Table 3.2 for Parent1-child). More specifically it retains a different offset which is called Virtual-base-Offset and it is used when we inherit from virtual classes. With this way, the compiler looks at the construction tables so it can be informed about the location of the objects(parent 1 or 2), in order to jump in front of this pointer of the Grandparent instance and got access to those methods.

Here is the memory layout for the construction vtable for Parent1 in VTChild (Similar applies to Parent2):

Address Value Description
0x8201bd0 0x20(32) virtual-base offset
0x8201bd8 0x0 top_offset
0x8201be0 0x8201d20 typeinfo for Parent1
0x8201be8 0x8001226 Parent1::parent1_foo()
0x8201bf0 0x0 virtual-base offset
0x8201bf8 0xffffffffffffffe0(-32) top_offset
0x8201c00 0x8201d20 typeinfo for Parent1
0x8201c08 0x800120e Grandparent::grandparent_foo()

Table 3.2

Here is the memory layout for the VTT:

Address Value Description
0x8201b98 0x8201b48 vtable for VTChild+24
0x8201ba0 0x8201be8 construction vtable for Parent1-in-VTChild+24
0x8201ba8 0x8201c08 construction vtable for Parent1-in-VTChild+56
0x8201bb0 0x8201c28 construction vtable for Parent2-in-VTChild+24
0x8201bb8 0x8201c48 construction vtable for Parent2-in-VTChild+56
0x8201bc0 0x8201b90 vtable for VTChild+96
0x8201bc8 0x8201b70 vtable for VTChild+64

Table 3.3

From the Figure 3.3 we observe the structure of VTT. This table is a table from VTables and its basically role is to make the right translation and switch to the corresponding virtual table. In order for the compiler to switch to the corresponding offset of the construction table, it first looks at the VTT. VTT contains all the available VTables and, depending on each case, it can calculate the correct offset and jump there as shown in Table 3.3. For example when the constructor is called for Parent 1 the compiler, through VTT, will be transferred correctly through virtual offset into table Parent 1, hence it knows that it will access the methods of GrandParent through Parent 1 construction table.