Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

virtio-balloon driver #1439

Merged
merged 2 commits into from
Mar 29, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
138 changes: 138 additions & 0 deletions doc/virtio-balloon.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
The physical memory footprint of a Nanos instance can be managed through the
use of a balloon driver. Here are some quick notes to get the virtio-balloon
device up and running under Nanos.

The virtio balloon driver is built into the Nanos kernel by default. When
starting qemu, enable the device by specifying "-device virtio-balloon-pci"
(or ENABLE_BALLOON=1 on the commandline if booting via make).

To manually manage the balloon properties and inspect memory statistics
reported through the balloon "statsq", enable the QEMU Machine Protocol (QMP)
interface by specifying "-qmp unix:qmp-sock,server,nowait" if using a unix
socket interface (which we'll use in the example below) or "-qmp
tcp:localhost:<port>,server,nowait" if using the telnet interface. Specifying
ENABLE_QMP=1 on the commandline will invoke qemu with the former option.

The following example will use the "qmp-shell" utility provided with qemu. You
can find it in scripts/qmp/qmp-shell in the qemu source tree. You may need to
first install the prerequisite qemu python package (in python/qemu). Or you
may wish to forego qmp-shell and instead use the aforementioned telnet
interface - see docs/virtio-balloon-stats.txt in the QEMU tree.

First run webg on Nanos with virtio-balloon and QMP enabled:

$ make ENABLE_BALLOON=1 ENABLE_QMP=1 TARGET=webg run

[...]
en1: assigned 10.0.2.15
Server started on port 8080

Then start qmp-shell:

$ qmp-shell ./qmp-sock
Welcome to the QMP low-level shell!
Connected to QEMU 3.1.0

(QEMU)

Query the balloon device:

(QEMU) query-balloon
{"return": {"actual": 2147483648}}

This reports the entire 2GB allocated for the VM on initialization, as the
balloon is currently empty. Before we inflate the balloon, let's instruct QEMU
to begin polling for memory stats from the balloon device.

First validate the path of the virtio-balloon device:

(QEMU) qom-list path=/machine/peripheral-anon/

{"return": [{"name": "type", "type": "string"}, {"name": "device[0]",
"type": "child<virtio-balloon-pci>"}, {"name": " device[1]", "type":
"child<scsi-hd>"}, {"name": "device[2]", "type": "child<isa-debug-exit>"},
{"name": "device[3]", " type": "child<virtio-net-pci>"}]}

Here we see the path is "/machine/peripheral-anon/device[0]". Now enable
polling at 2 second intervals:

(QEMU) qom-set path=/machine/peripheral-anon/device[0] \
property=guest-stats-polling-interval value=2

And let's see a snapshot of the latest stats:

(QEMU) qom-get path=/machine/peripheral-anon/device[0] property=guest-stats

{"return": {"stats": {"stat-htlb-pgalloc": 0, "stat-swap-out": 0,
"stat-available-memory": 2053791744, "stat-htlb-pgf ail": 0,
"stat-free-memory": 2053791744, "stat-minor-faults": 212,
"stat-major-faults": 22, "stat-total-memory": 2139226112, "stat-swap-in":
0, "stat-disk-caches": 9216000}, "last-update": 1616532297}}

Now let's alter the balloon value and look at the effects:

(QEMU) balloon value=1000000000
{"return": {}}
(QEMU) qom-get path=/machine/peripheral-anon/device[0] property=guest-stats
{"return": {"stats": {"stat-htlb-pgalloc": 0, "stat-swap-out": 0,
"stat-available-memory": 902307840, "stat-htlb-pgfa il": 0,
"stat-free-memory": 902307840, "stat-minor-faults": 212,
"stat-major-faults": 22, "stat-total-memory": 2139226112, "stat-swap-in":
0, "stat-disk-caches": 9216000}, "last-update": 1616532413}}

We can see here that the available / free memory shrank accordingly. If we set
the balloon value back to its original value, we should see the effects of the
balloon deflating:

(QEMU) balloon value=2147483648
{"return": {}}

(QEMU) qom-get path=/machine/peripheral-anon/device[0] property=guest-stats
{"return": {"stats": {"stat-htlb-pgalloc": 0, "stat-swap-out": 0,
"stat-available-memory": 2051547136, "stat-htlb-pgf ail": 0,
"stat-free-memory": 2051547136, "stat-minor-faults": 212,
"stat-major-faults": 22, "stat-total-memory": 2139226112, "stat-swap-in":
0, "stat-disk-caches": 9216000}, "last-update": 1616532581}}

The available memory is back to the original value, save for some balloon page
structures which have been cached in the virtio_balloon driver.

Let's try something more aggressive:

(QEMU) balloon value=1
{"return": {}}
(QEMU) query-balloon
{"return": {"actual": 115343360}}
(QEMU) qom-get path=/machine/peripheral-anon/device[0] property=guest-stats
{"return": {"stats": {"stat-htlb-pgalloc": 0, "stat-swap-out": 0,
"stat-available-memory": 17760256, "stat-htlb-pgfai l": 0,
"stat-free-memory": 17760256, "stat-minor-faults": 214,
"stat-major-faults": 20, "stat-total-memory": 2139226112, "stat-swap-in":
0, "stat-disk-caches": 1384448}, "last-update": 1616533279}}

The balloon is now inflated to the maximum extent, save for a minimum amount
of free memory as defined by BALLOON_MEMORY_MINIMUM in src/config.h.

If we apply some pressure on the memory system by sending web requests, we can
see the effects of Nanos deflating the balloon to maintain a minimum amount of
free memory:

$ ab -n 1000 -c 100 http://127.0.0.1:8080/
[...]

(QEMU) query-balloon
{"return": {"actual": 121634816}}
(QEMU) qom-get path=/machine/peripheral-anon/device[0] property=guest-stats
{"return": {"stats": {"stat-htlb-pgalloc": 0, "stat-swap-out": 0,
"stat-available-memory": 18685952, "stat-htlb-pgfai l": 0,
"stat-free-memory": 18685952, "stat-minor-faults": 1009,
"stat-major-faults": 20, "stat-total-memory": 2139226112, "stat-swap-in":
0, "stat-disk-caches": 1384448}, "last-update": 1616533311}}

Note the increase in "actual" memory to maintain BALLOON_DEFLATE_THRESHOLD
amount of free memory.

Related links:

https://wiki.qemu.org/Documentation/QMP
https://github.com/qemu/qemu/blob/master/docs/virtio-balloon-stats.txt
10 changes: 9 additions & 1 deletion platform/pc/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@ SRCS-kernel.elf= \
$(SRCDIR)/unix/vdso.c \
$(SRCDIR)/unix/pipe.c \
$(SRCDIR)/virtio/virtio.c \
$(SRCDIR)/virtio/virtio_balloon.c \
$(SRCDIR)/virtio/virtio_mmio.c \
$(SRCDIR)/virtio/virtio_net.c \
$(SRCDIR)/virtio/virtio_pci.c \
Expand Down Expand Up @@ -360,13 +361,20 @@ endif
QEMU_TAP= -netdev tap,id=n0,ifname=tap0,script=no,downscript=no
QEMU_NET= -device $(NETWORK)$(NETWORK_BUS),mac=7e:b8:7e:87:4a:ea,netdev=n0 $(QEMU_TAP)
QEMU_USERNET= -device $(NETWORK)$(NETWORK_BUS),netdev=n0 -netdev user,id=n0,hostfwd=tcp::8080-:8080,hostfwd=tcp::9090-:9090,hostfwd=udp::5309-:5309
ifneq ($(ENABLE_BALLOON),)
QEMU_BALLOON= -device virtio-balloon-pci
endif
ifneq ($(ENABLE_QMP),)
QEMU_QMP= -qmp unix:$(ROOTDIR)/qmp-sock,server,nowait
#QEMU_QMP= -qmp tcp:localhost:4444,server,nowait
endif
#QEMU_USERNET+= -object filter-dump,id=filter0,netdev=n0,file=/tmp/nanos.pcap
QEMU_FLAGS=
#QEMU_FLAGS+= -smp 4
#QEMU_FLAGS+= -d int -D int.log
#QEMU_FLAGS+= -s -S

QEMU_COMMON= $(QEMU_MACHINE) $(QEMU_MEMORY) $(QEMU_DISPLAY) $(QEMU_PCI) $(QEMU_SERIAL) $(QEMU_STORAGE) -device isa-debug-exit -no-reboot $(QEMU_FLAGS)
QEMU_COMMON= $(QEMU_MACHINE) $(QEMU_MEMORY) $(QEMU_BALLOON) $(QEMU_DISPLAY) $(QEMU_PCI) $(QEMU_SERIAL) $(QEMU_STORAGE) -device isa-debug-exit -no-reboot $(QEMU_FLAGS) $(QEMU_QMP)

run: image
$(QEMU) $(QEMU_COMMON) $(QEMU_USERNET) $(QEMU_ACCEL) || exit $$(($$?>>1))
Expand Down
4 changes: 2 additions & 2 deletions platform/pc/pci.c
Original file line number Diff line number Diff line change
Expand Up @@ -147,9 +147,9 @@ void pci_bar_write_8(struct pci_bar *b, u64 offset, u64 val)
out64(b->addr + offset, val);
}

void pci_setup_non_msi_irq(pci_dev dev, int idx, thunk h, const char *name)
void pci_setup_non_msi_irq(pci_dev dev, thunk h, const char *name)
{
pci_plat_debug("%s: idx %d, h %F, name %s\n", __func__, idx, h, name);
pci_plat_debug("%s: h %F, name %s\n", __func__, h, name);

/* For maximum portability, the GSI should be retrieved via the ACPI _PRT method. */
unsigned int gsi = pci_cfgread(dev, PCIR_INTERRUPT_LINE, 1);
Expand Down
2 changes: 2 additions & 0 deletions platform/pc/service.c
Original file line number Diff line number Diff line change
Expand Up @@ -525,4 +525,6 @@ void detect_devices(kernel_heaps kh, storage_attach sa)

/* misc / platform */
init_acpi(kh);

init_virtio_balloon(kh);
}
16 changes: 12 additions & 4 deletions platform/virt/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -64,11 +64,12 @@ SRCS-kernel.elf= \
$(SRCDIR)/unix/pipe.c \
$(SRCDIR)/unix/vdso.c \
$(SRCDIR)/virtio/virtio.c \
$(SRCDIR)/virtio/virtio_storage.c \
$(SRCDIR)/virtio/virtio_scsi.c \
$(SRCDIR)/virtio/virtio_net.c \
$(SRCDIR)/virtio/virtio_balloon.c \
$(SRCDIR)/virtio/virtio_mmio.c \
$(SRCDIR)/virtio/virtio_net.c \
$(SRCDIR)/virtio/virtio_pci.c \
$(SRCDIR)/virtio/virtio_scsi.c \
$(SRCDIR)/virtio/virtio_storage.c \
$(SRCDIR)/virtio/virtqueue.c \
$(SRCDIR)/virtio/scsi.c \
$(VDSO_OBJDIR)/vdso-image.c \
Expand Down Expand Up @@ -296,6 +297,13 @@ QEMU_TAP= -netdev tap,id=n0,ifname=tap0,script=no,downscript=no
#QEMU_NET= -device $(NETWORK)$(NETWORK_BUS),mac=7e:b8:7e:87:4a:ea,netdev=n0,modern-pio-notify $(QEMU_TAP)
QEMU_NET= -device $(NETWORK)$(NETWORK_BUS),mac=7e:b8:7e:87:4a:ea,netdev=n0 $(QEMU_TAP)
QEMU_USERNET= -device $(NETWORK)$(NETWORK_BUS),netdev=n0 -netdev user,id=n0,hostfwd=tcp::8080-:8080,hostfwd=tcp::9090-:9090,hostfwd=udp::5309-:5309 -object filter-dump,id=filter0,netdev=n0,file=/tmp/nanos.pcap
ifneq ($(ENABLE_BALLOON),)
QEMU_BALLOON= -device virtio-balloon-pci
endif
ifneq ($(ENABLE_QMP),)
QEMU_QMP= -qmp unix:$(ROOTDIR)/qmp-sock,server,nowait
#QEMU_QMP= -qmp tcp:localhost:4444,server,nowait
endif

# for enabling the ARM Angel interface and passing exit codes from the program
QEMU_FLAGS+= -semihosting
Expand All @@ -309,7 +317,7 @@ QEMU_FLAGS+= -semihosting

#QEMU_FLAGS+= -monitor telnet:127.0.0.1:9999,server,nowait

QEMU_COMMON= $(QEMU_MACHINE) $(QEMU_MEMORY) $(QEMU_KERNEL) $(QEMU_DISPLAY) $(QEMU_PCI) $(QEMU_SERIAL) $(QEMU_STORAGE) -no-reboot $(QEMU_FLAGS)
QEMU_COMMON= $(QEMU_MACHINE) $(QEMU_MEMORY) $(QEMU_BALLOON) $(QEMU_KERNEL) $(QEMU_DISPLAY) $(QEMU_PCI) $(QEMU_SERIAL) $(QEMU_STORAGE) -no-reboot $(QEMU_FLAGS) $(QEMU_QMP)

run: image
$(QEMU) $(QEMU_COMMON) $(QEMU_USERNET) $(QEMU_ACCEL)
Expand Down
5 changes: 2 additions & 3 deletions platform/virt/pci.c
Original file line number Diff line number Diff line change
Expand Up @@ -97,12 +97,11 @@ MK_PCI_BAR_WRITE(1, 8)
MK_PCI_BAR_WRITE(2, 16)
MK_PCI_BAR_WRITE(4, 32)

void pci_setup_non_msi_irq(pci_dev dev, int idx, thunk h, const char *name)
void pci_setup_non_msi_irq(pci_dev dev, thunk h, const char *name)
{
/* queue index ignored; virtio ints are shared */
u64 v = GIC_SPI_INTS_START + VIRT_PCIE_IRQ_BASE + (dev->slot % VIRT_PCIE_IRQ_NUM);
pci_plat_debug("%s: dev %p, idx %d, irq %d, handler %F, name %s\n",
__func__, dev, idx, v, h, name);
pci_plat_debug("%s: dev %p, irq %d, handler %F, name %s\n", __func__, dev, v, h, name);
register_interrupt(v, h, name);
}

Expand Down
1 change: 1 addition & 0 deletions platform/virt/service.c
Original file line number Diff line number Diff line change
Expand Up @@ -293,4 +293,5 @@ void detect_devices(kernel_heaps kh, storage_attach sa)
init_virtio_network(kh);
init_virtio_blk(kh, sa);
init_virtio_scsi(kh, sa);
init_virtio_balloon(kh);
}
6 changes: 6 additions & 0 deletions src/config.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,12 @@
#define PAGECACHE_DRAIN_CUTOFF (64 * MB)
#define PAGECACHE_SCAN_PERIOD_SECONDS 5

/* don't go below this minimum amount of physical memory when inflating balloon */
#define BALLOON_MEMORY_MINIMUM (16 * MB)

/* attempt to deflate balloon when physical memory is below this threshold */
#define BALLOON_DEFLATE_THRESHOLD (16 * MB)

/* must be large enough for vendor code that use malloc/free interface */
#define MAX_MCACHE_ORDER 16

Expand Down
18 changes: 17 additions & 1 deletion src/kernel/init.c
Original file line number Diff line number Diff line change
Expand Up @@ -117,17 +117,33 @@ closure_function(3, 2, void, fsstarted,
#define mm_debug(x, ...) do { } while(0)
#endif

static balloon_deflater mm_balloon_deflater;

void mm_register_balloon_deflater(balloon_deflater deflater)
{
mm_balloon_deflater = deflater;
}

void mm_service(void)
{
heap phys = (heap)heap_physical(init_heaps);
u64 free = heap_total(phys) - heap_allocated(phys);
u64 free = heap_free(phys);
mm_debug("%s: total %ld, alloc %ld, free %ld\n", __func__,
heap_total(phys), heap_allocated(phys), free);
if (free < PAGECACHE_DRAIN_CUTOFF) {
u64 drain_bytes = PAGECACHE_DRAIN_CUTOFF - free;
u64 drained = pagecache_drain(drain_bytes);
if (drained > 0)
mm_debug(" drained %ld / %ld requested...\n", drained, drain_bytes);
free = heap_free(phys);
}

if (mm_balloon_deflater && free < BALLOON_DEFLATE_THRESHOLD) {
u64 deflate_bytes = BALLOON_DEFLATE_THRESHOLD - free;
mm_debug(" requesting %ld bytes from deflater\n", deflate_bytes);
u64 deflated = apply(mm_balloon_deflater, deflate_bytes);
mm_debug(" deflated %ld bytes\n", deflated);
(void)deflated;
}
}

Expand Down
1 change: 1 addition & 0 deletions src/kernel/kernel.c
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
resuming a kernel context is the exception, not the norm. */

static kernel_context spare_kernel_context;
struct mm_stats mm_stats;

context allocate_frame(heap h)
{
Expand Down
21 changes: 21 additions & 0 deletions src/kernel/kernel.h
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,14 @@ typedef struct cpuinfo {

extern struct cpuinfo cpuinfos[];

/* subsume with introspection */
struct mm_stats {
word minor_faults;
word major_faults;
};

extern struct mm_stats mm_stats;

static inline cpuinfo cpuinfo_from_id(int cpu)
{
assert(cpu >= 0 && cpu < MAX_CPUS);
Expand Down Expand Up @@ -87,6 +95,16 @@ static inline __attribute__((always_inline)) void *stack_from_kernel_context(ker
return ((void*)c->stackbase) + KERNEL_STACK_SIZE - STACK_ALIGNMENT;
}

static inline void count_minor_fault(void)
{
fetch_and_add(&mm_stats.minor_faults, 1);
}

static inline void count_major_fault(void)
{
fetch_and_add(&mm_stats.major_faults, 1);
}

void runloop_internal() __attribute__((noreturn));

static inline boolean this_cpu_has_kernel_lock(void)
Expand Down Expand Up @@ -219,6 +237,9 @@ void kern_unlock(void);
void init_scheduler(heap);
void mm_service(void);

typedef closure_type(balloon_deflater, u64, u64);
void mm_register_balloon_deflater(balloon_deflater deflater);

kernel_heaps get_kernel_heaps(void);

tuple get_root_tuple(void);
Expand Down
7 changes: 6 additions & 1 deletion src/kernel/pagecache.c
Original file line number Diff line number Diff line change
Expand Up @@ -1268,11 +1268,16 @@ void *pagecache_get_zero_page(void)
return global_pagecache->zero_page;
}

int pagecache_get_page_order()
int pagecache_get_page_order(void)
{
return global_pagecache->page_order;
}

u64 pagecache_get_occupancy(void)
{
return global_pagecache->total_pages << pagecache_get_page_order();
}

pagecache_volume pagecache_allocate_volume(u64 length, int block_order)
{
pagecache pc = global_pagecache;
Expand Down
6 changes: 4 additions & 2 deletions src/kernel/pagecache.h
Original file line number Diff line number Diff line change
Expand Up @@ -12,9 +12,11 @@ void pagecache_sync_node(pagecache_node pn, status_handler complete);

void pagecache_sync_volume(pagecache_volume pv, status_handler complete);

void *pagecache_get_zero_page();
void *pagecache_get_zero_page(void);

int pagecache_get_page_order();
int pagecache_get_page_order(void);

u64 pagecache_get_occupancy(void);

u64 pagecache_drain(u64 drain_bytes);

Expand Down
2 changes: 1 addition & 1 deletion src/kernel/pci.h
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ void pci_enable_io_and_memory(pci_dev dev);
u64 pci_setup_msix(pci_dev dev, int msi_slot, thunk h, const char *name);
void pci_teardown_msix(pci_dev dev, int msi_slot);
void pci_disable_msix(pci_dev dev);
void pci_setup_non_msi_irq(pci_dev dev, int idx, thunk h, const char *name);
void pci_setup_non_msi_irq(pci_dev dev, thunk h, const char *name);

static inline u64 pci_msix_table_addr(pci_dev dev)
{
Expand Down
5 changes: 5 additions & 0 deletions src/runtime/heap/heap.h
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,11 @@ static inline u64 heap_total(heap h)
return h->total ? h->total(h) : INVALID_PHYSICAL;
}

static inline u64 heap_free(heap h)
{
return heap_total(h) - heap_allocated(h);
}

heap wrap_freelist(heap meta, heap parent, bytes size);
heap allocate_objcache(heap meta, heap parent, bytes objsize, bytes pagesize);
boolean objcache_validate(heap h);
Expand Down
Loading