Skip to content

Commit

Permalink
Merge pull request #2287 from frappe/feat-server-volumes
Browse files Browse the repository at this point in the history
feat: Support multiple volumes on virtual machines
  • Loading branch information
adityahase authored Dec 9, 2024
2 parents 37d240f + 89ffd7e commit 5f66677
Show file tree
Hide file tree
Showing 41 changed files with 1,239 additions and 165 deletions.
3 changes: 2 additions & 1 deletion dashboard/src2/components/server/ServerPlansDialog.vue
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,8 @@ export default {
url: 'press.api.server.plans',
params: {
name: this.serverType,
cluster: this.$server.doc.cluster
cluster: this.$server.doc.cluster,
platform: this.$server.doc.current_plan.platform
},
auto: true,
initialData: []
Expand Down
9 changes: 7 additions & 2 deletions press/api/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -386,7 +386,7 @@ def options():


@frappe.whitelist()
def plans(name, cluster=None):
def plans(name, cluster=None, platform="x86_64"):
return Plan.get_plans(
doctype="Server Plan",
fields=[
Expand All @@ -401,7 +401,12 @@ def plans(name, cluster=None):
"instance_type",
"premium",
],
filters={"server_type": name, "cluster": cluster} if cluster else {"server_type": name},
filters={"server_type": name, "platform": platform, "cluster": cluster}
if cluster
else {
"server_type": name,
"platform": platform,
},
)


Expand Down
96 changes: 96 additions & 0 deletions press/infrastructure/doctype/virtual_machine_migration/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# Explaining Choices

Most commits/comments already explain the decisions. Just putting them here for sanity.

## Mounts

Going forward the data (mostly machine-independent directories) will be kept on a separate volume.

For the migration we

1. Shut down the machine
2. Start a new machine (with a new ARM image)
3. Attach the root volume from the old machine to the new machine
4. Do some mount magic so all services find data where they expect it to be

### AWS Quirks

1. We can't attach a volume at boot. The VM must be in the Running state.
2. You can't rely on device_name provided during run_instance.
3. The device will have an alias that looks something like "/dev/disk/by-id/...<volume-id>..."

### Bind Mounts

Instead of directly mounting the volume to the target mount point, we

1. Mount the volume to /opt/volumes/<mariadb/docker>/
2. Bind mount the relative location from this path. /opt/volumes/mariadb/a/b/c to /a/b/c

This gives us the ability to

1. Have two different mounts (/etc/mysql and /var/lib/mysql)
2. Use the same old volumes as-is without any custom mounting scheme.

### Mount Dependency

We don't want MariaDB / Docker to start unless the data volume is mounted correctly.
Add a systemd mount dependency (BindsTo) so the services start if and only if the data volume is mounted.

Note: We define the dependency only on the bind mount. /opt/volumes... is left out as convenience.

### Relabeling

The base images are configured to mount partitions labeled UEFI and cloudimg-rootfs

1. We change these labels so the new machine doesn't accidentally boot from these
2. We update fstab so the old machine can still boot with the modified labels

Note: EFI partitions have a dirty bit set on them. fatlabel messes this up. We need to run fsck to fix this.

### UUID

When we spawn a new machine from the base image, all volumes get their own volume-id. If we rely on volume-id to determine the data volume then we'll have to do some extra work after the first boot. (To tell the machine about the volume)

When we format the data volume we get a new UUID. This UUID remains the same (since it's part of the data itself) across boots (unless we reformat the volume). This is the easiest way to recognize a volume in fstab.

During the migration, we need to do the extra step of updating fstab to use the old UUID (from the old root volume).

We could have modified the UUID of the old root volume (so we don't need to do any work after the migration). But

1. e2label needs a freshly checked disk (fsck)
2. fsck needs an unmounted partition. We can't unmount the root partition.

## Misc

### Hardcoded values

This is only going to be used for app and db servers.

- App servers will have /home/frappe/benches stored on the the data volume
- DB servers will have /var/lib/mysql and /etc/mysql stored on the data volume

### Wait for ping + cloud init

During the first boot we

1. Delete old host keys (to avoid collisions between multiple hosts)
2. Update SSH config
3. Restart SSHD

During this restart, for a short period, we can't start a new SSH session. (Sometimes we get lucky).
To avoid this. Explicitly wait for cloud-init to finish (and then check if sshd is running).

---

## TODO

#### Disk Usage Alerts

We'll need to add alerts for the modified mount points (old alerts rely on /)

#### Disk Resize

Resize logic resizes the first volume listed in the volumes table.

1. This ordering isn't guaranteed to be [root, data]
2. We need a way to specify exactly which volume we need to resize
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@
frappe.ui.form.on('Virtual Machine Migration', {
refresh(frm) {
[
[__('Start'), 'execute', frm.doc.status === 'Pending'],
[__('Force Continue'), 'force_continue', frm.doc.status === 'Failure'],
[__('Force Continue'), 'force_continue', frm.doc.status === 'Failure'],
[__('Force Fail'), 'force_fail', frm.doc.status === 'Running'],
].forEach(([label, method, condition]) => {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
"field_order": [
"virtual_machine",
"status",
"new_plan",
"column_break_pega",
"virtual_machine_image",
"machine_type",
Expand All @@ -17,6 +18,9 @@
"duration",
"section_break_pplo",
"volumes",
"mounts",
"raw_devices",
"parsed_devices",
"section_break_mjhg",
"steps"
],
Expand All @@ -27,6 +31,7 @@
"in_list_view": 1,
"in_standard_filter": 1,
"label": "Virtual Machine",
"link_filters": "[[\"Virtual Machine\",\"status\",\"not in\",[\"Draft\",\"Terminated\",null]]]",
"options": "Virtual Machine",
"reqd": 1,
"set_only_once": 1
Expand Down Expand Up @@ -80,6 +85,7 @@
"fieldtype": "Link",
"in_list_view": 1,
"label": "Virtual Machine Image",
"link_filters": "[[\"Virtual Machine Image\",\"status\",\"=\",\"Available\"]]",
"options": "Virtual Machine Image",
"reqd": 1,
"set_only_once": 1
Expand Down Expand Up @@ -113,11 +119,40 @@
"fieldtype": "Duration",
"label": "Duration",
"read_only": 1
},
{
"fieldname": "raw_devices",
"fieldtype": "Code",
"hidden": 1,
"label": "Raw Devices",
"options": "JSON",
"read_only": 1
},
{
"fieldname": "parsed_devices",
"fieldtype": "Code",
"hidden": 1,
"label": "Parsed Devices",
"options": "JSON",
"read_only": 1
},
{
"fieldname": "mounts",
"fieldtype": "Table",
"label": "Mounts",
"options": "Virtual Machine Migration Mount"
},
{
"fieldname": "new_plan",
"fieldtype": "Link",
"label": "New Plan",
"options": "Server Plan",
"read_only": 1
}
],
"index_web_pages_for_search": 1,
"links": [],
"modified": "2024-09-20 15:27:50.984335",
"modified": "2024-12-09 16:31:29.250443",
"modified_by": "Administrator",
"module": "Infrastructure",
"name": "Virtual Machine Migration",
Expand Down
Loading

0 comments on commit 5f66677

Please sign in to comment.