Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Project summary #1

Open
rvagg opened this issue Jun 26, 2014 · 29 comments
Open

Project summary #1

rvagg opened this issue Jun 26, 2014 · 29 comments

Comments

@rvagg
Copy link
Contributor

rvagg commented Jun 26, 2014

Huh?

goingnative is a workshopper for learning how to write native Node.js addons, it's slated for release & use at NodeConf (next week!). I was supposed to start this long ago but ... you know how it goes.

Who?

I've added a bunch of people to this repo, as far as I'm concerned this will be an open project (well, it's technically "closed" until next week) and I'm happy to share ownership and leadership with anyone that has significant contributions to make.

However, I know I've just thrown a bunch of you in here without asking so please if you don't have time or interest in this at the moment and don't want to be spammed then I'd be happy to take you off the collaborators list, just let me know.

@ceejbot and @tjfontaine are here because they are down as running this workshop at NodeConf with me, unfortunately I know they are both super busy with their respective employment, I'm hoping to be able to squeeze some blood out of that stone though!

@wblankenship and @expr are here because I know, through my work with them at NodeSource that they are awesome and should have a lot to offer here in terms of the learning experience.

@trevnorris and @thlorenz are here because I know they have a ton to offer on the C++ side, I don't know realistically how much I can expect to squeeze out of them in terms of time though and @thlorenz is scheduled to help with another workshop at NodeConf anyway.

@TooTallNate and @kkoopa are here because I know from experience that they have a ton of skill and knowledge in this area that would be hugely valuable if they are able to contribute.

What?

The goal here is pretty ambitious, particularly for NodeConf. We need to teach Node/JS programmers how to write native addons. Unfortunately, there is a strong allergy to C++ amongst that cohort in general so we'll have to do lots of handholding and provide lots of up-front boilerplate to point people in the right direction. The success of this will come from the right balance of boilerplate and document to actual code challenges.

learnyounode is the foundational example for building this type of thing, it's basically a terminal-based self-driven programming learning tool that provides you with challenges that ramp up in complexity and you have to complete those challenges successfully to proceed. The expectation is not that people will be able to finish this in a sitting, or even want to finish it all, it's just to give them a taste and get them started and provide enough of a learning path for those wanting to dig deeper.

I'm also thinking of ditching hopes of Windows support, for NodeConf at least. I have a docker build script that can build a dev environment that I can spin up enough instances of and hand out ssh access to separate containers to people in the workshop (@ NodeConf) who don't have suitable dev environments, or who would waste the whole time trying to get their dev environment ready. Unfortunately this assumes you're comfortable with dev over SSH and using a unixy code editor!

Another complication is the V8 variability, so I'm going to lean on nan to get this done in a way that doesn't involve having to explain all that junk. Unfortunately this is quite heavy so it'll almost make it a "nan workshopper" but I don't see a good way around that, and besides, a good portion of current native addons are using nan so it's not an isolated ecosystem. It might be possible to offer a raw 0.10 and another 0.12 version that doesn't use nan without too much hassle, but that's too complex for now.

Structure

Here's my current thoughts on an exercise structure, this is really rough and will likely change as we develop and I'm hoping that you all have ideas on how to improve this:

  1. AM I READY?: There's no coding involved in this one, typing goingnative verify for this will invoke a checker that will determine if you have: a compiler, node-gyp, python 2.x, and anything else that might be needed to get started.
  2. LET'S MAKE SOMETHING COMPILE!: Provide the user with a directory containing the basics needed to make an addon, including a package.json, binding.gyp and even an addon.cc that is partially complete but won't compile without adding some really basic code. We might even want to have package.json and binding.gyp incomplete too and they have to complete it all to pass the exercise. I'm thinking that the code can just use a simple printf to print something to stdout so there's very little complex C++ or V8 involved in making this work. Validation will be tricky but it'll have to involve at least invoking a build and running the resulting addon to make sure it works as expected.
  3. IT'S A TWO-WAY STREET: Get the user to extend the previous exercise to accept a method argument and then return something back to the calling function. This could be split up into two exercises ("IT'S A ONE-WAY STREET"?).
  4. IT'S ALL ABOUT SCOPE: It might be interesting to leave NanScope() (HandleScope) off the previous exercises completely, perhaps leaving a note in the boilerplate file saying something like // this code is intentionally incomplete and requires a Scope and point them to a later exercise that will introduce the concept of a scope. This exercise could have provide them with an addon that has no scope and when run is observed leaking memory. Their job is to stop the leak by adding a NanScope(). Maybe too simple?
  5. CALL ME MAYBE: Invoking a callback argument using MakeCallback (well, NanMakeCallback anyway)
  6. OBJECTIFICATION: Create a v8::Object and populate it with something, a String property, a Number property and maybe even a Function?
  7. OBJECTIFY ALL THE THINGS: Using ObjectWrap to wrap up a C++ object for JS use.
  8. OFFLOADING THE GRUNT WORK: A precursor to the next exercise, I'm thinking that we could get them to do some CPU intensive work in C++ and pass the result back to JS.
  9. TEAM GRUNT WORK: Take their previous exercise and split the work off into a worker thread and provide the result via a callback. Pi estimation is the example I keep on using for "CPU intensive" work and could be applicable here.

Validation in most existing workshoppers is done via running the solution code in parallel to the submission code and using stdout to compare the results. Sometimes this involves hijacking stdout to replace it with some kind of reporter. The latest workshopper incarnation (v1) doesn't force this as a requirement and I don't imagine we'll have use for it here. Validation will take the form of a script that performs a series of actions to confirm that individual components of the work are complete and correct. The user gets feedback in the form of ticks and crosses to their console for each of these so they can clearly see where they messed up if they get a failure.

Action

I'm diving right in to this, but I'm going to be doing a rough job of each of them as I go and come back and perfect later. Help would be appreciated with coding and also wording the questions, coding the solutions and making the validations fine-grained enough so that the user gets clear feedback about what they've done wrong and it's very difficult to cheat the system.

Could you please let me know if you don't want to be involved, and if you do want to be involved, in what capacity do you think you could be helpful and what would you like to try and tackle?

@retrohacker
Copy link
Contributor

The layout of the course is great. As you go through and start getting rough outlines of code put together, I can come behind and clean things up.

My C++ game is really rusty, writing code from scratch may take me a bit.

@osslate
Copy link

osslate commented Jun 26, 2014

I can also help clean things up; unfortunately, I don't know much C++.

@rvagg
Copy link
Contributor Author

rvagg commented Jun 26, 2014

@expr I know you're more than capable of picking up enough of this to be able to add value!

@kkoopa
Copy link

kkoopa commented Jun 26, 2014

Sounds nice. I'll help out, starting with some thoughts I had straight away:

Exercise 2 should also include cpplint (or some other linter, but I don't know of any). Apart from whitespace-nonsense, following a strict style guide is essential for producing readable, maintainable and less faulty C++, this should be instilled from the beginning, as it is hard to unlearn bad habits.

Exercise 4 might not work too well if using NAN, as all v8-exposed methods require NanScope for old versions in order to use NanReturnValue. Scopes can only be missing in internal methods and functions, else it won't compile on 0.10.

Exercise 9 is a good idea, but I don't know if that pi calculator is the best way of showing it, mostly because it seems to do the same work n_worker times. If you compare the sync and async output, the async one is way off from the true value. This is because it does less computation than the sync method per worker and the results of workers don't add up to give more precision. It's akin to an oranges to grapefruits comparison. I'm thinking the standard map-reduce word count sample might work better, provided that the text corpus is large enough so it actually takes some time.

General debugging of an addon is a tough problem in itself, but essential for any non-trivial development. What I usually end up doing as a first resort is adding sprintf(stderr, ...) statements to get some trace output. The (sort of) tricky part here is that you have to write to stderr to actually see any output, as stdout doesn't show up, so a normal printf() won't do you any good. This ties in with exercise 3 a bit too, as the suggested printf() won't produce any visible output.

@rvagg
Copy link
Contributor Author

rvagg commented Jun 26, 2014

fantastic input @kkoopa! I might have to get you to explain the map-reduce word count sample though, I don't think I'm familiar with that and tbh finding good simple and understandable exercises to do for "cpu intensive" work is tricky, so ideas like this are more than welcome.

@kkoopa
Copy link

kkoopa commented Jun 26, 2014

You have a bunch of text and want to count how many times each word occurs (a histogram). Easiest way is to split the input domain (say there are 100 lines of roughly 80 columns, words are separated by spaces, no punctuation or weird abbreviations or other crap; with 10 workers, each gets 10 lines). Each worker makes a dictionary of word:count pairs of their subset, then these partial counts are combined into the final dictionary.

This is a common example, because it is trivially parallelizable, as there are no dependencies among computations; it is also easy to grasp intuitively.

function map(String name, String document):
  // name: document name
  // document: document contents
  for each word w in document:
    emit (w, 1)

function reduce(String word, Iterator partialCounts):
  // word: a word
  // partialCounts: a list of aggregated partial counts
  sum = 0
  for each pc in partialCounts:
    sum += ParseInt(pc)
  emit (word, sum)

@rvagg
Copy link
Contributor Author

rvagg commented Jun 26, 2014

great, and that's simple to understand so we'll roll with that

@rvagg
Copy link
Contributor Author

rvagg commented Jun 26, 2014

first "exercise" implemented and even compiles a test package, needs attention to wording in problem.md, it's just a skeleton atm if someone wants to have a go at it while I move on.

@ceejbot
Copy link
Contributor

ceejbot commented Jun 26, 2014

I'll take a crack at wordsmithing.

@trevnorris
Copy link

TBH I have never used nan. Instead I prefer the masochistic approach and use the V8 API directly.

Only few things I can think of.

  • Properly define when a user does/doesn't need a HandleScope/EscapableHandleScope. I think has to be clearly explained. IMO it's abstract and subtle enough that the user would easily miss it w/o very clear explanation.
  • When to use ->To*() and .As<T>(), and how that affects the need to use HandleScope/EscapableHandleScope.
  • How to extract data off an Object (e.g. a Buffer), work with it, create a new Buffer and return that from the function.

If I think of any others I'll bring them up.

@rvagg
Copy link
Contributor Author

rvagg commented Jun 27, 2014

Great points @trevnorris! I wasn't going to delve too deeply into HandleScope and EscapableHandleScope mainly because I still don't have a handle (har har) on the rules myself, I just have some vague reasoning in my head that tells me when I need to use them. But I agree that being able to communicate the rules for when to use them and the differences between the two would be fantastic.

Buffers are something I overlooked and would be good to get in if we have time now, if we don't have time for NodeConf we should try and schedule something afterwards at least.

@TooTallNate
Copy link

Off topic: would anyone happen to have a spare ticket to nodeconf? I'd love to attend :D

@rvagg
Copy link
Contributor Author

rvagg commented Jun 27, 2014

@TooTallNate Mikeal says if you can camp he'll open up a spot for you. Then you could come and help us deliver this workshop! DM him if you're serious.

@trevnorris
Copy link

@rvagg The rule for when you need to use {Escapable}HandleScope is actually pretty simple, but also directly relates to the point about when to use ->To*() and .As<T>().

Here are a couple examples to clarify:

First, let's say we allow the user to call directly into C++. This will require type checking and possibly coercion.

void DirectCall(const FunctionCallbackInfo<Value>& args) {
  HandleScope scope(args.GetIsolate());

  // Check if the second argument is a function.
  assert(args[1]->IsFunction());
  // Now force first argument to be a string, and grab the function.
  Local<String> str = args[0]->ToString();
  Local<Function> fn = args[1]->ToFunction();
}

The need for HandleScope above is that ->To*() creates a new Local<T> handle, even if the argument was the type expected, and will need to be cleaned up at the end of the function call.

Now let's instead create a JS wrapper around the C++ functions that does all the proper type checking.

function jsCall(str, fn) {
  if (typeof fn !== 'function')
    throw new Error('Expected function');
  // Calling the internal C++ class and forcing proper types.
  internal.indirectCall(''+str, fn);
}

Now the C++ side can look like the following:

void IndirectCall(const FunctionCallbackInfo<Value>& args) {
  Local<String> str = args[0].As<String>();
  Local<Function> fn = args[1].As<Function>();
}

This is because the .As<T>() is no more than a reinterpret_cast<T>() so no new handles are actually being created.

There are additional rules when it comes to the use of Persistent<T>'s and such, but I'd say that covers the general part.

Hence why doing arithmetic operations in C++ is actually so fast. Using the new smalloc API, we'll write an example that shows how fast it can be to call into C++ and perform simple stuff.

var smalloc = require('smalloc');
// Create an array of external uint32_t.
var data = smalloc.alloc(1024, {}, smalloc.Types.Uint32);
// Sum them up.
var sum = internal.sum(data);

Since we know the types it is simple and fast to extract the data:

void Sum(const FunctionCallbackInfo<Value>& args) {
  Local<Object> obj = args[0].As<Object>();
  uint32_t data = obj->GetIndexedPropertiesExternalArrayData();
  size_t len = obj->GetIndexedPropertiesExternalArrayDataLength();
  uint64_t dsum = 0;
  for (size_t i = 0; i < len; i++)
    dsum += data[i];
  // Slight conversion here since uint64_t isn't supported.
  args.GetReturnValue.Set(static_cast<double>(dsum));
}

Here we never created a new handle, and since the new V8 API allows the return of several natives (bool, double, int32_tanduint32_t`) it's unnecessary to create a new handle for the return value.

You'll find the above example to be extremely efficient. So much in fact that it'll out perform a JS sum loop with as few as a hundred elements.

@rvagg
Copy link
Contributor Author

rvagg commented Jun 29, 2014

That final Sum() example without a need for a HandleScope is pretty impressiven given the complexity of work. Thanks for the info @trevnorris, I'll see if I can come up with something that we can work in, obviously it has to be an order of magnitude simpler than your Sum()!

@rvagg
Copy link
Contributor Author

rvagg commented Jun 29, 2014

@trevnorris, @kkoopa (and others?)

I've split up the 3rd exercise into two so that the first of them deals with receiving arguments and the second will return values from C++.

The new exercise, called FOR THE SAKE OF ARGUMENT requires that you take process.argv[2] and pass it down into C++ for a printf() of the string: https://github.com/rvagg/goingnative/blob/master/exercises/for_the_sake_of_argument/solution/myaddon.cc

I'm doing a *String::Utf8Value(args[0].As<String>()) so shouldn't need a new HandleScope, it might be neat to find an extension point for this for the exercise dealing with HandleScope that requires you to do a cast with To<T>(). Any thoughts on a something really simple that would force a new handle from an argument that can't be done with an As<T>() cast?

@kkoopa
Copy link

kkoopa commented Jun 29, 2014

parseInt or such?

On June 29, 2014 9:32:52 PM EEST, Rod Vagg [email protected] wrote:

@trevnorris, @kkoopa (and others?)

I've split up the 3rd exercise into two so that the first of them deals
with receiving arguments and the second will return values from C++.

The new exercise, called FOR THE SAKE OF ARGUMENT requires that you
take process.argv[2] and pass it down into C++ for a printf() of
the string:
https://github.com/rvagg/goingnative/blob/master/exercises/for_the_sake_of_argument/solution/myaddon.cc

I'm doing a *String::Utf8Value(args[0].As<String>()) so shouldn't
need a new HandleScope, it might be neat to find an extension point
for this for the exercise dealing with HandleScope that requires you
to do a cast with To<T>(). Any thoughts on a something really simple
that would force a new handle from an argument that can't be done with
an As<T>() cast?


Reply to this email directly or view it on GitHub:
#1 (comment)

@rvagg
Copy link
Contributor Author

rvagg commented Jun 29, 2014

Force them to use printf("A string: [%s], an integer: [%d]\n", ..., ...); perhaps? Can that be achieved just using As<String>() and As<Number>() or can we force a segfault or similar error that makes it necessary to use another construct? i.e. code where we could say: "here's the code to do this, but it crashes under these conditions, you have to fix it by ....".

@kkoopa
Copy link

kkoopa commented Jun 30, 2014

Something else to point out: Scope, lifetime, garbage collection, memory management, persistent, weak and other handles.

@rvagg
Copy link
Contributor Author

rvagg commented Jun 30, 2014

ack, I forgot about persistent handles, that's kind of important but also kind of a mess

@rvagg
Copy link
Contributor Author

rvagg commented Jun 30, 2014

all: I don't know how I'm going to do HandleScope as a separate exercise, as much as I'd love to. Would really appreciate help or suggestions but right now I've removed it from the list and am moving on.

Exercise 3, FOR THE SAKE OF ARGUMENT, gets away with this:

NAN_METHOD(Print) {
  printf("%s\n", *String::Utf8Value(args[0].As<String>()));
  NanReturnUndefined();
}

And then to deal with return values in exercise 4, IT'S A TWO WAY STREET, I have to move up to this:

NAN_METHOD(Length) {
  NanEscapableScope();

  int len = strlen(*String::Utf8Value(args[0].As<String>()));
  Local<Number> v8len = NanNew(len);

  NanReturnValue(v8len);
}

Perhaps something could go in between to explain the scope but it needs something that we can force the use of scope, by demonstrating a memory leak or something like that which demonstrates why we have scopes in the code. I just don't know what to do for this that doesn't introduce too many other complexities that need explaining.

@kkoopa
Copy link

kkoopa commented Jun 30, 2014

That's an exposed method, so you should not use an escapable scope.

On June 30, 2014 11:14:58 PM EEST, Rod Vagg [email protected] wrote:

all: I don't know how I'm going to do HandleScope as a separate
exercise, as much as I'd love to. Would really appreciate help or
suggestions but right now I've removed it from the list and am moving
on.

Exercise 3, FOR THE SAKE OF ARGUMENT, gets away with
this:

NAN_METHOD(Print) {
 printf("%s\n", *String::Utf8Value(args[0].As<String>()));
 NanReturnUndefined();
}

And then to deal with return values in exercise 4, IT'S A TWO WAY
STREET, I have to move up to
this:

NAN_METHOD(Length) {
 NanEscapableScope();

 int len = strlen(*String::Utf8Value(args[0].As<String>()));
 Local<Number> v8len = NanNew(len);

 NanReturnValue(v8len);
}

Perhaps something could go in between to explain the scope but it needs
something that we can force the use of scope, by demonstrating a
memory leak or something like that which demonstrates why we have
scopes in the code. I just don't know what to do for this that doesn't
introduce too many other complexities that need explaining.


Reply to this email directly or view it on GitHub:
#1 (comment)

@rvagg
Copy link
Contributor Author

rvagg commented Jun 30, 2014

@ceejbot I'm unlikely to be at the pre-NodeConf thing at your office so you're probably going to have to represent this workshop if anyone actually wants to run it (I have my doubts!). Is that going to work for you?

@ceejbot
Copy link
Contributor

ceejbot commented Jun 30, 2014

I will do my best! I plan to invest tomorrow in getting up to speed with it.

@rvagg
Copy link
Contributor Author

rvagg commented Jul 2, 2014

FYI this is open source now and it's in npm. Far from complete but it's usable. The docs are either way too verbose or kind of missing for each of the problem statements but that's being worked on (you're welcome to contribute to that!!).

@trevnorris
Copy link

@rvagg First excuse my ignorance of nan. I'm following nan.h and quickly surmising what's going on.

In the following:

NAN_METHOD(Length) {
  NanEscapableScope();

  int len = strlen(*String::Utf8Value(args[0].As<String>()));
  Local<Number> v8len = NanNew(len);

  NanReturnValue(v8len);
}

This is a call from JS to C++. As such, there's no need for an EscapeHandleScope() since that's guaranteed by ReturnValue. Only need a HandleScope.

Next, and I only bring this up because I don't see the accompanying JS, is that .As<String>() should only be used if you know the incoming value is a string (i.e. the value has already been coerced in JS before passed to C++).

Lastly, in latest v8 at least, there's no need to create a Local<Number> for the return value. Instead you can simply NanReturnValue(static_cast<uint32_t>(len));. This is a large performance win in terms of native code performance.

And in reality it could have been boiled down to this:

NAN_METHOD(Length) {
  NanReturnValue(static_cast<uint32_t>((args[0].As<String>())->Utf8Length()));
}

@trevnorris
Copy link

I guess for backwards compatibility users are still required to do a NanReturnUndefined()?

NAN_METHOD(Print) {
  printf("%s\n", *String::Utf8Value(args[0].As<String>()));
  NanReturnUndefined();
}

@rvagg
Copy link
Contributor Author

rvagg commented Jul 3, 2014

There's a significant amount of polish in there now, this is what we have so far:

goingnative

It's pretty rough for people who haven't had much C++ experience, even if you have, there's V8 to get used to! Even if you've done V8 you have NAN to get used to! So I'm confident that it's not too little for NodeConf but it really does need more and I'll be working on more over the next couple of days at the conference.

If you have time, please sudo npm install goingnative -g and run through it, file any bug reports or suggestions for improvement back on this repo!

@rvagg
Copy link
Contributor Author

rvagg commented Jul 4, 2014

I just split the mammoth second exercise into 3 separate exercises, so it's a 3 step process to building their first addon. Thanks for @ceejbot for the suggestion.

Unfortunately I'm dead tired and need to go to sleep and this has to be delivered tomorrow at NodeConf. If you have time, please, someone, test it and see if it works and that the wording makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants