Skip to content

Latest commit

 

History

History
210 lines (155 loc) · 13.3 KB

cpp_intro.md

File metadata and controls

210 lines (155 loc) · 13.3 KB

Chromium C++ Primer for C++98 Folks

lvalues, rvalues, C++11/14 & Chromium({}, auto, constructor: var_name{}, yada, yada, yada ... )

Introduction

C++ powers most of the V8 engine. All the memory management logic, Ignition Interpreter, Turbofan (Optimizing compiler) are written in C++. Thus, it's very much required for anyone beginning with V8 exploitation to brush up their C++ skills enough to understand V8's source code.

While a lot of us have run into C++ at some point in our lives, if it wasn't in a recent production code (which isn't older than you are) there's a great chance you never ran into the new, expansive, and a little different world of new features from C++11/14. At the time of writing (3rd Sept, 2020), Chromium's most codebase is based in C++11/14 and being actively migrated for C++17 with a target for 2021. While there's a lot of change, the changes from C++11/14 and onwards are pretty incremental. However, there's quite a leap in new things from C++98 to C++11/14 as we'll see.

Wait, I don't need to read no C++11/14. I know C++:

  • Templates
  • Multiple Inheritance
  • Namespaces
  • .... another 10K stuff

I know that as the back of my hand. I mean,

To my surprise, not as easy as I expected it to be. But becomes way trival once you have the correct readings.

Motivation

The journey starts when I tried getting started looking into V8's source code. I started with @danbev's great repo. I hit a roadblock once I started looking into taggedimpl which is the basis of most memory structures that V8 uses. I cloned the repo, pulled up the code. I see roughly this:

template <HeapObjectReferenceType kRefType, typename StorageType>
class TaggedImpl {
 public:
  static_assert(std::is_same<StorageType, Address>::value ||
                    std::is_same<StorageType, Tagged_t>::value,
                "StorageType must be either Address or Tagged_t");
                
  static const bool kIsFull = sizeof(StorageType) == kSystemPointerSize;

  static const bool kCanBeWeak = kRefType == HeapObjectReferenceType::WEAK;

  constexpr TaggedImpl() : ptr_{} {}
  explicit constexpr TaggedImpl(StorageType ptr) : ptr_(ptr) {}

  // Make clang on Linux catch what MSVC complains about on Windows:
  operator bool() const = delete;

  template <typename U>
  constexpr bool operator==(TaggedImpl<kRefType, U> other) const {
    static_assert(
        std::is_same<U, Address>::value || std::is_same<U, Tagged_t>::value,
        "U must be either Address or Tagged_t");
    return static_cast<Tagged_t>(ptr_) == static_cast<Tagged_t>(other.ptr());
  }
// yada yada yada.....

In a few words, below are the things I don't quite understand:

  • Line 1: HeapObjectReferenceType kRefType what's with the fixed looking enum getting passed into the template.
  • Line 4: static_assert, well that's something I've never seen before. I won't what that does.
  • Line 12: constexpr TaggedImpl() : ptr_{} {}, what's this weird constructor which is initialized with curly braces? What is this _ptr thing?
  • Line 23: static_cast<>, what's this weird cast?

Me by this point,

I guess we need to go do some brushup on our C++ Skills. Upon more research, most of it turned out to be pretty simple C++11 stuff. We'll be discussing a little more things than what's required to answer the questions above.

If you're a smarty pant (@mckade) and already know what's going in these lines, feel free to skim, skip over and move to the next article. For the rest of us, let's get going. Here's a link you might want to skim through just to sanity check.

Understanding the new parts of C++11/14

I referred to articles and videos from all over the internet which I will link.

Before I start, below are three recomendations (in order):

I highly recommend reading the first two readings before moving ahead. Now let's take a look at a few new features and then we'll circle back to the things we initially didn't understand well.

Rvalues-Lvalues, Rvalue References & Perfect Forwarding

Rvalues Lvalues have secretly existed for a long in C++. While the definition of Rvalues and Lvalues is a little tricky, you can identify them as:

  • Rvalues: Objects that usually have named aliases and live in memory.
  • Lvalues: Temporary objects usually created for computation, assignment, or return (usually unnamed).

I agree that there are a ton of usually(s) here, but these are dependent on multiple things and is determined during compile time. To add to that, this can be cased with operators like std::move etc. So when in doubt, ask the compiler.

I went ahead and experimented with different kinds of move semantics. However, they are not strictly related to V8 and more of a general concept. If you're interested, check out my experiments on Move constructors and assignment operators.

As far as knowledge for C++'s usage of these constructs is concerned, I recommend viewing these links (in order):

Constructor{}, initialization lists

Initialization Lists are another interesting thing that was introduced in C++11. The initializer lists are used to directly initialize data members of a class (inline of the document). Example:

// Try Running on ccp.sh
#include <iostream>

using namespace std;

class Line {
    // An initializer list starts after the constructor name 
    // and its parameters. The list begins with a colon ( : )
    // and is followed by the list of variables that are to be
    // initialized
    
    public:
      int getLength( void ){ return ref_len; };
      // All of​ the variables are separated by a comma
      // with their values in curly brackets
      Line( int len ): ref_len{len} {cout << "Initializer List constructor called!" << endl;};
     
    private:
      int ref_len;
};

int main(){
    Line line_1(10);
    cout << line_1.getLength() << endl;
}

Basically, data_member{value} is a initialization shorthand. So much simpler than initially thought. The code above is equalent to:

List::List (int len){
    ref_len = len;
    cout << "Initializer List constructor called!" << endl;
}

More details are available at Initialization Lists Intro.

Second Look at TaggedImpl

Let's repaste the code down here and do it together.

template <HeapObjectReferenceType kRefType, typename StorageType>
class TaggedImpl {
 public:
  static_assert(std::is_same<StorageType, Address>::value ||
                    std::is_same<StorageType, Tagged_t>::value,
                "StorageType must be either Address or Tagged_t");
                
  static const bool kIsFull = sizeof(StorageType) == kSystemPointerSize;

  static const bool kCanBeWeak = kRefType == HeapObjectReferenceType::WEAK;

  constexpr TaggedImpl() : ptr_{} {}
  explicit constexpr TaggedImpl(StorageType ptr) : ptr_(ptr) {}

  // Make clang on Linux catch what MSVC complains about on Windows:
  operator bool() const = delete;

  template <typename U>
  constexpr bool operator==(TaggedImpl<kRefType, U> other) const {
    static_assert(
        std::is_same<U, Address>::value || std::is_same<U, Tagged_t>::value,
        "U must be either Address or Tagged_t");
    return static_cast<Tagged_t>(ptr_) == static_cast<Tagged_t>(other.ptr());
  }
// yada yada yada.....
  • Line 1: HeapObjectReferenceType kRefType what's with the fixed looking enum getting passed into the template.

This kRefType in a Enum of HeapObjectReferenceType and defines if an object is weak or strong. It's part of the template class. That's nothing new.

  • Line 4: static_assert, well that's something I've never seen before. I won't what that does.

aaha! So, here we have a static_assert. These lines are computed by the compiler, and serves as ways to make sure that the templating, memory sizes and things like that are getting deduced as expected. Here it's used to make sure that the deduced template types match with the expected types for the == operator.

  • Line 12: constexpr TaggedImpl() : ptr_{} {}, what's this weird constructor which is initialized with curly braces? What is this _ptr thing?

Similar to static_assert (yet a little different), constexpr expressions are expressions which are expanded by the compiler when possible.This depends on various things, like if all the parameters support it (among others which you can find here.) Here, it is used to allow the comparison to happen during compile time!

This functions in a way very similar to MACROS, however here the compiler can deduce the result of the operation itself and place the result instead of say a call!.

  • Line 23: static_cast<>, what's this weird cast?

Finally, we have a static_cast, which can be used to change the interpretation of an object during compilation. More here.

Finally, towards the end of the class definition, we have:

StorageType ptr_;

If we consider the line 183 along with 12&13 (from above), we see that this class (among a ton of other functions), have two main constructors, one which accepts a StorageType and another which is empty

And it finally Makes sense!

Looking back, now you'd be like Hmmm...

That was so obvious and easy. I agree! We didn't need to read 90% of the links in order to make sense of our initial function. However, these articles give us a holistic understanding of the features and places to refer back to!

Initially, nothing made sense but after reading a few articles things fit in!

Where to next?

Once you're good with the basics of C++11/14, I would recommend the following docs from chromium's & V8's devbase:

More References (In no Particular Order)

  • Pointers VS References -> Use this as a refresher for how references differ from pointers, what's legal and what's not. Also, mentions how const functions can accept non-const variables, but not the other way around.
  • Rvalues & Lvalues