Skip to content

Commit

Permalink
add balloon statsq support, implement a retry timer when num_pages ca…
Browse files Browse the repository at this point in the history
…nnot be fully met on inflation, and add some simple usage documentation
  • Loading branch information
wjhun committed Mar 23, 2021
1 parent 91f86ef commit 8895774
Show file tree
Hide file tree
Showing 10 changed files with 321 additions and 20 deletions.
138 changes: 138 additions & 0 deletions doc/virtio-balloon.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
The physical memory footprint of a Nanos instance can be managed through the
use of a balloon driver. Here are some quick notes to get the virtio-balloon
device up and running under Nanos.

The virtio balloon driver is built into the Nanos kernel by default. When
starting qemu, enable the device by specifying "-device virtio-balloon-pci"
(or ENABLE_BALLOON=1 on the commandline if booting via make).

To manually manage the balloon properties and inspect memory statistics
reported through the balloon "statsq", enable the QEMU Machine Protocol (QMP)
interface by specifying "-qmp unix:qmp-sock,server,nowait" if using a unix
socket interface (which we'll use in the example below) or "-qmp
tcp:localhost:<port>,server,nowait" if using the telnet interface. Specifying
ENABLE_QMP=1 on the commandline will invoke qemu with the former option.

The following example will use the "qmp-shell" utility provided with qemu. You
can find it in scripts/qmp/qmp-shell in the qemu source tree. You may need to
first install the prerequisite qemu python package (in python/qemu). Or you
may wish to forego qmp-shell and instead use the aforementioned telnet
interface - see docs/virtio-balloon-stats.txt in the QEMU tree.

First run webg on Nanos with virtio-balloon and QMP enabled:

$ make ENABLE_BALLOON=1 ENABLE_QMP=1 TARGET=webg run

[...]
en1: assigned 10.0.2.15
Server started on port 8080

Then start qmp-shell:

$ qmp-shell ./qmp-sock
Welcome to the QMP low-level shell!
Connected to QEMU 3.1.0

(QEMU)

Query the balloon device:

(QEMU) query-balloon
{"return": {"actual": 2147483648}}

This reports the entire 2GB allocated for the VM on initialization, as the
balloon is currently empty. Before we inflate the balloon, let's instruct QEMU
to begin polling for memory stats from the balloon device.

First validate the path of the virtio-balloon device:

(QEMU) qom-list path=/machine/peripheral-anon/

{"return": [{"name": "type", "type": "string"}, {"name": "device[0]",
"type": "child<virtio-balloon-pci>"}, {"name": " device[1]", "type":
"child<scsi-hd>"}, {"name": "device[2]", "type": "child<isa-debug-exit>"},
{"name": "device[3]", " type": "child<virtio-net-pci>"}]}

Here we see the path is "/machine/peripheral-anon/device[0]". Now enable
polling at 2 second intervals:

(QEMU) qom-set path=/machine/peripheral-anon/device[0] \
property=guest-stats-polling-interval value=2

And let's see a snapshot of the latest stats:

(QEMU) qom-get path=/machine/peripheral-anon/device[0] property=guest-stats

{"return": {"stats": {"stat-htlb-pgalloc": 0, "stat-swap-out": 0,
"stat-available-memory": 2053791744, "stat-htlb-pgf ail": 0,
"stat-free-memory": 2053791744, "stat-minor-faults": 212,
"stat-major-faults": 22, "stat-total-memory": 2139226112, "stat-swap-in":
0, "stat-disk-caches": 9216000}, "last-update": 1616532297}}

Now let's alter the balloon value and look at the effects:

(QEMU) balloon value=1000000000
{"return": {}}
(QEMU) qom-get path=/machine/peripheral-anon/device[0] property=guest-stats
{"return": {"stats": {"stat-htlb-pgalloc": 0, "stat-swap-out": 0,
"stat-available-memory": 902307840, "stat-htlb-pgfa il": 0,
"stat-free-memory": 902307840, "stat-minor-faults": 212,
"stat-major-faults": 22, "stat-total-memory": 2139226112, "stat-swap-in":
0, "stat-disk-caches": 9216000}, "last-update": 1616532413}}

We can see here that the available / free memory shrank accordingly. If we set
the balloon value back to its original value, we should see the effects of the
balloon deflating:

(QEMU) balloon value=2147483648
{"return": {}}

(QEMU) qom-get path=/machine/peripheral-anon/device[0] property=guest-stats
{"return": {"stats": {"stat-htlb-pgalloc": 0, "stat-swap-out": 0,
"stat-available-memory": 2051547136, "stat-htlb-pgf ail": 0,
"stat-free-memory": 2051547136, "stat-minor-faults": 212,
"stat-major-faults": 22, "stat-total-memory": 2139226112, "stat-swap-in":
0, "stat-disk-caches": 9216000}, "last-update": 1616532581}}

The available memory is back to the original value, save for some balloon page
structures which have been cached in the virtio_balloon driver.

Let's try something more aggressive:

(QEMU) balloon value=1
{"return": {}}
(QEMU) query-balloon
{"return": {"actual": 115343360}}
(QEMU) qom-get path=/machine/peripheral-anon/device[0] property=guest-stats
{"return": {"stats": {"stat-htlb-pgalloc": 0, "stat-swap-out": 0,
"stat-available-memory": 17760256, "stat-htlb-pgfai l": 0,
"stat-free-memory": 17760256, "stat-minor-faults": 214,
"stat-major-faults": 20, "stat-total-memory": 2139226112, "stat-swap-in":
0, "stat-disk-caches": 1384448}, "last-update": 1616533279}}

The balloon is now inflated to the maximum extent, save for a minimum amount
of free memory as defined by BALLOON_MEMORY_MINIMUM in src/config.h.

If we apply some pressure on the memory system by sending web requests, we can
see the effects of Nanos deflating the balloon to maintain a minimum amount of
free memory:

$ ab -n 1000 -c 100 http://127.0.0.1:8080/
[...]

(QEMU) query-balloon
{"return": {"actual": 121634816}}
(QEMU) qom-get path=/machine/peripheral-anon/device[0] property=guest-stats
{"return": {"stats": {"stat-htlb-pgalloc": 0, "stat-swap-out": 0,
"stat-available-memory": 18685952, "stat-htlb-pgfai l": 0,
"stat-free-memory": 18685952, "stat-minor-faults": 1009,
"stat-major-faults": 20, "stat-total-memory": 2139226112, "stat-swap-in":
0, "stat-disk-caches": 1384448}, "last-update": 1616533311}}

Note the increase in "actual" memory to maintain BALLOON_DEFLATE_THRESHOLD
amount of free memory.

Related links:

https://wiki.qemu.org/Documentation/QMP
https://github.com/qemu/qemu/blob/master/docs/virtio-balloon-stats.txt
7 changes: 5 additions & 2 deletions platform/pc/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -361,14 +361,17 @@ endif
QEMU_TAP= -netdev tap,id=n0,ifname=tap0,script=no,downscript=no
QEMU_NET= -device $(NETWORK)$(NETWORK_BUS),mac=7e:b8:7e:87:4a:ea,netdev=n0 $(QEMU_TAP)
QEMU_USERNET= -device $(NETWORK)$(NETWORK_BUS),netdev=n0 -netdev user,id=n0,hostfwd=tcp::8080-:8080,hostfwd=tcp::9090-:9090,hostfwd=udp::5309-:5309
ifneq ($(ENABLE_BALLOON),)
QEMU_BALLOON= -device virtio-balloon-pci
endif
ifneq ($(ENABLE_QMP),)
QEMU_QMP= -qmp unix:$(ROOTDIR)/qmp-sock,server
QEMU_QMP= -qmp unix:$(ROOTDIR)/qmp-sock,server,nowait
#QEMU_QMP= -qmp tcp:localhost:4444,server,nowait
endif
#QEMU_USERNET+= -object filter-dump,id=filter0,netdev=n0,file=/tmp/nanos.pcap
QEMU_FLAGS=
#QEMU_FLAGS+= -smp 4
QEMU_FLAGS+= -d trace:balloon_event,trace:virtio_balloon_bad_addr,trace:virtio_balloon_get_config,trace:virtio_balloon_handle_output,trace:virtio_balloon_set_config,trace:virtio_balloon_to_target -D $(ROOTDIR)/trace
#QEMU_FLAGS+= -d int -D int.log
#QEMU_FLAGS+= -s -S

QEMU_COMMON= $(QEMU_MACHINE) $(QEMU_MEMORY) $(QEMU_BALLOON) $(QEMU_DISPLAY) $(QEMU_PCI) $(QEMU_SERIAL) $(QEMU_STORAGE) -device isa-debug-exit -no-reboot $(QEMU_FLAGS) $(QEMU_QMP)
Expand Down
5 changes: 4 additions & 1 deletion platform/virt/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -297,9 +297,12 @@ QEMU_TAP= -netdev tap,id=n0,ifname=tap0,script=no,downscript=no
#QEMU_NET= -device $(NETWORK)$(NETWORK_BUS),mac=7e:b8:7e:87:4a:ea,netdev=n0,modern-pio-notify $(QEMU_TAP)
QEMU_NET= -device $(NETWORK)$(NETWORK_BUS),mac=7e:b8:7e:87:4a:ea,netdev=n0 $(QEMU_TAP)
QEMU_USERNET= -device $(NETWORK)$(NETWORK_BUS),netdev=n0 -netdev user,id=n0,hostfwd=tcp::8080-:8080,hostfwd=tcp::9090-:9090,hostfwd=udp::5309-:5309 -object filter-dump,id=filter0,netdev=n0,file=/tmp/nanos.pcap
ifneq ($(ENABLE_BALLOON),)
QEMU_BALLOON= -device virtio-balloon-pci
endif
ifneq ($(ENABLE_QMP),)
QEMU_QMP= -qmp unix:$(ROOTDIR)/qmp-sock,server
QEMU_QMP= -qmp unix:$(ROOTDIR)/qmp-sock,server,nowait
#QEMU_QMP= -qmp tcp:localhost:4444,server,nowait
endif

# for enabling the ARM Angel interface and passing exit codes from the program
Expand Down
1 change: 1 addition & 0 deletions src/kernel/kernel.c
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
resuming a kernel context is the exception, not the norm. */

static kernel_context spare_kernel_context;
struct mm_stats mm_stats;

context allocate_frame(heap h)
{
Expand Down
18 changes: 18 additions & 0 deletions src/kernel/kernel.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,14 @@ typedef struct cpuinfo {

extern struct cpuinfo cpuinfos[];

/* subsume with introspection */
struct mm_stats {
word minor_faults;
word major_faults;
};

extern struct mm_stats mm_stats;

static inline cpuinfo cpuinfo_from_id(int cpu)
{
assert(cpu >= 0 && cpu < MAX_CPUS);
Expand Down Expand Up @@ -87,6 +95,16 @@ static inline __attribute__((always_inline)) void *stack_from_kernel_context(ker
return ((void*)c->stackbase) + KERNEL_STACK_SIZE - STACK_ALIGNMENT;
}

static inline void count_minor_fault(void)
{
fetch_and_add(&mm_stats.minor_faults, 1);
}

static inline void count_major_fault(void)
{
fetch_and_add(&mm_stats.major_faults, 1);
}

void runloop_internal() __attribute__((noreturn));

static inline boolean this_cpu_has_kernel_lock(void)
Expand Down
7 changes: 6 additions & 1 deletion src/kernel/pagecache.c
Original file line number Diff line number Diff line change
Expand Up @@ -1268,11 +1268,16 @@ void *pagecache_get_zero_page(void)
return global_pagecache->zero_page;
}

int pagecache_get_page_order()
int pagecache_get_page_order(void)
{
return global_pagecache->page_order;
}

u64 pagecache_get_occupancy(void)
{
return global_pagecache->total_pages << pagecache_get_page_order();
}

pagecache_volume pagecache_allocate_volume(u64 length, int block_order)
{
pagecache pc = global_pagecache;
Expand Down
6 changes: 4 additions & 2 deletions src/kernel/pagecache.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,11 @@ void pagecache_sync_node(pagecache_node pn, status_handler complete);

void pagecache_sync_volume(pagecache_volume pv, status_handler complete);

void *pagecache_get_zero_page();
void *pagecache_get_zero_page(void);

int pagecache_get_page_order();
int pagecache_get_page_order(void);

u64 pagecache_get_occupancy(void);

u64 pagecache_drain(u64 drain_bytes);

Expand Down
1 change: 0 additions & 1 deletion src/kernel/stage3.c
Original file line number Diff line number Diff line change
Expand Up @@ -180,7 +180,6 @@ closure_function(3, 0, void, startup,
rprintf("Debug http server started on port 9090\n");
}
#endif

value p = table_find(root, sym(program));
assert(p);
tuple pro = resolve_path(root, split(general, p, '/'));
Expand Down
4 changes: 4 additions & 0 deletions src/unix/mmap.c
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,7 @@ boolean do_demand_page(u64 vaddr, vmap vm, context frame)
}

map_and_zero(vaddr & ~MASK(PAGELOG), paddr, PAGESIZE, pageflags_from_vmflags(vm->flags));
count_minor_fault();
} else if (mmap_type == VMAP_MMAP_TYPE_FILEBACKED) {
u64 page_addr = vaddr & ~PAGEMASK;
u64 node_offset = vm->node_offset + (page_addr - vm->node.r.start);
Expand Down Expand Up @@ -131,6 +132,7 @@ boolean do_demand_page(u64 vaddr, vmap vm, context frame)
true /* complete on bhqueue */);
if (kernel_demand_page_completed) {
pf_debug(" immediate completion\n");
count_minor_fault();
return true;
}
faulting_kernel_context = suspend_kernel_context();
Expand All @@ -140,6 +142,7 @@ boolean do_demand_page(u64 vaddr, vmap vm, context frame)
page, but we can't allocate anything, fill a page or start a storage operation. */
if (pagecache_map_page_if_filled(vm->cache_node, node_offset, page_addr, flags)) {
pf_debug(" immediate completion\n");
count_minor_fault();
return true;
}

Expand All @@ -151,6 +154,7 @@ boolean do_demand_page(u64 vaddr, vmap vm, context frame)
init_closure(&t->demand_file_page_complete, thread_demand_file_page_complete, t, frame, vaddr);
enqueue(runqueue, &t->demand_file_page);
}
count_major_fault();

/* suspending */
context f = frame_from_kernel_context(get_kernel_context(ci));
Expand Down
Loading

0 comments on commit 8895774

Please sign in to comment.