This document explains the necessary processes for building a RPI4 kubernetes cluster from scrath using alpine with f2fs process.
This part is a little tricky. Basic Steps:
-
Get the RPI 4 image from alpine, here be sure to get the aarch64 version!
-
go to this link which shows basics of getting things on the working.
-
Finish up until the 'date' part. I didn't bother silencing the fan.
-
Add the f2fs utils to the image
-
Make sure the image boots after everthing is done.
At this point you should have an image ready for coversion to f2fs (its entirely optional, but WAY better than ext4).
Basic steps:
- On a dedicated debian system, you will need to do these steps:
- Create raw image of working Sd card from part 1 this will take about 20 minutes or so
- Copy the image to a new sd card (to avoid erasing your original)
- make a new f2fs partition using the mkfs.f2fs /dev/mmcblk0p2 (overwriting the ext4 version)
- Loopback mount the 'origial' that you dded via losetup
losetup -v -f /mnt/myimage.img
losetup -a
partx -v --add /dev/loop0
mkdir /media/from
mkdir /media/to
mount /dev/loop0p2 /media/from
mount /dev/mmcblk0p2 /media/to
cp -a /media/from/ /media/to
apt-get install vim
vi /etc/fstab
Make sure you edit the FSTAB on the copy!!!!
Your fstab should look thus:
/dev/mmcblk0p2 / f2fs rw,noatime,discard 0 1
/dev/cdrom /media/cdrom iso9660 noauto,ro 0 0
/dev/usbdisk /media/usb vfat noauto 0 0
/dev/mmcblk0p1 /media/mmcblk0p1 vfat defaults 0 0
There is a very nasty bug with fsck that I had to iron out (found it in a single post!) when running fsck: you need to add the forcefsck paramater to the boot (cmdline.txt). For quicker (if somewhat unsafer) reboots you could just do 0 0 which would avoid fsck alltogether. I like to be 'safe' and since i don't reboot that often, its not a problem.
mkdir /media/dos
mount /dev/mmcblk0p1 /media/dos
vim /media/dos/cmdline.txt
Your cmdline.txt needs to look like this with the fix in it:
modules=loop,squashfs,sd-mod,usb-storage quiet console=tty1 root=/dev/mmcblk0p2 forcefsck
The PI has a terrible bug in it that it uses a 95% threshold for using reasonable 'cpu bursting'. This means that your cpus will be at 600Mhz until you hit 95% for a sustained period of time. By making the cpu on demand a more reasonable 10% before we kick up, we can still idle at 600Mhz when we aren't busy, but if we get to 10% cpu our system will kick into overdrive.
This has the rather terrible effect (the default that is) of making the PI have a max of 20MBPS instead of 80MBPS or more when doing network traffic. When the cpu is at a proper level, it will do 80-90MBPS. So here is the script.
#!/usr/bin/env ash
logger "Starting CPU FIX"
echo "ondemand" | tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor | logger &
logger "Sleeping for 3 seconds"
sleep 3
logger "Starting on demand threshold to 10%"
echo "10" | tee /sys/devices/system/cpu/cpufreq/ondemand/up_threshold | logger &
In order for this to work, the script must be placed in the ```/etc/local.d`` directory. Additionally, the following commands have to be issued to make sure things 'start':
#!/usr/bin/env ash
logger "Starting CPU FIX"
echo "ondemand" | tee /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor | logger &
logger "Sleeping for 3 seconds"
sleep 3
logger "Starting on demand threshold to 10%"
echo "10" | tee /sys/devices/system/cpu/cpufreq/ondemand/up_threshold | logger &
Edit this file in your favorite directory. The filename should be something like fixcpu.sh.start
the important part is the .start
suffix on the end of the file as this tells the system to 'start'. Logger sends nice logs to our syslog server to tell us that things are O.K.
Unmount the disks and you are ready to now have a 'pristine' image to clone. Be sure to test. For the tryly impatient (and not afraid of heat death) the default governer can be set to 'performance', but this is a topic for a different day.
echo "Turning on burn your processor mode"
echo performance | tee /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
Just replace these inside of your script instead of the top ones.
Finally, to enable things to work after editing is done:
cd /etc/local.d
chmod +x fixprocessor.sh.start
rc-update add local default
Now you are ready to shut down the system in an orderly fashion:
poweroff
As you probably noticed from the first 'dd' it takes a really long time to do an entire 32gb of disks. I found a way to get a 'pristine' image burned (assuming you have an sd card writer on your linux box!) using clonezilla. I am not going to go through everything, just the necessary tools to get clonezilla working on our debian machine. Also I include a 'fix' for sd cards not properly detecting ejects on my duplicator appliance.
These are the minimum pre-requisites for things to work
- Sd card reader recognized by system
- large disk attached for images to live on (can be network or otherwise)
- Debian installed and working
Install the following packages:
apt-get install clonezilla
apt-get install f2fs-utils
apt-get install f2fs-tools
apt-get install ncdu
apt-get install parted
apt-get install dosfstools
apt-get install lzop
apt-get install zstd
apt-get install bmap-tools
apt-get install partclone
apt-get install apt-file
apt-get install lvm2
Before you clone, its wise to 'check' the dos partition for errors on a windows box or mac using 'repair' option. If the file system is marked as 'unclean' the system will not properly clone and you will get an error.
Assuming that everything is up, you need to run clonezilla. I am not going to determine what command you should use, just offer some basic pointers.
-
You should have a large mount point called /home/partimag where all images will be stored. This can be an attached disk, nfs share, etc.
-
You will run clonezilla to get a nice gui. I am going to assume that you have already mounted the disk to /home/partimag doing so will fill up root and you will be sad
-
choose "device/image" when imaging. This will get you a full disk. Also the thing to be duplicated should NOT be mounted (/dev/mmcp0).
-
I then choose 'skip' for the /home/partimag location you could optionally mount a remote share here etc.
-
choose 'beginer'
-
choose 'save disk'
-
give it a name
-
choose appropriate unmounted disk (in my case mmcblk0)
-
skip checking
-
yes check the saved image
-
not to encypt the image
-
SAVE the command if you ever want to do this again! in my case:
/usr/sbin/ocs-sr -q2 -c -j2 -z1p -i 4096 -p true savedisk masterdiskf2fsalpinerpi4 mmcblk0
-
my nice one with compression etc:
/usr/sbin/ocs-sr -q2 -j2 -z3 -i 10000 -fsck-src-part-y -p true savedisk interactivetestclonef2fswithtimecorrected mmcblk0
just say yes as its giong to image now
-
More advanced options are possible (suhc as automatic compression etc buut I won't cover here)
-
Now that the image has been duplicated, restoring is just a matter of pressing keys. I am not going to cover this. Just one thing: if you find that mmc cards won't eject properly w/o a reboot: I have a fix (for fitlets at least)
#!/usr/bin/env bash
rmmod sdhci_pci
modprobe sdhci_pci
create this file as 'fixsd' in /bin and make executable. remove disk, put in new disk, and run. should work w/o a reboot.
/usr/sbin/ocs-sr -g auto -e1 auto -e2 -r -j2 -p true restoredisk rpi-alpine-f2fs-master mmcblk0
You could optionally install a clonezilla server cd and use that. this was 'manual' method with straight debian.
This section describes how to setup an nfs v4.2 server with optimal networking and other settings. I am able to get line rate (about 95Megaytes/sec) with a $54.00 hc2 from hardkernel. I am not going to describe everything, but the highlights.
- Get yourself the 'armbian' variant for this board (you could of course, use ANY computer capable of running a linux server)
- Get yourself a nice fast SSD of 512GB or 1TB and format with f2fs
- Install the nfs utils and disable Portmapper
apt-get update
apt-get install nfs-kernel-server
portmap
systemctl mask rpcbind.socket
systemctl mask rpcbind.service
- I found an EVEN BETTER way to get rid of all the nasty daemons from this link. The file is /etc/default/nfs-kernel-server.
RPCNFSDCOUNT=8
# Runtime priority of server (see nice(1))
RPCNFSDPRIORITY=0
# Options for rpc.mountd.
# If you have a port-based firewall, you might want to set up
# a fixed port here using the --port option. For more information,
# see rpc.mountd(8) or http://wiki.debian.org/SecuringNFS
# To disable NFSv4 on the server, specify '--no-nfs-version 4' here
RPCMOUNTDOPTS="--no-nfs-version 2 --no-nfs-version 3 --nfs-version 4 --no-udp"
RPCNFSDOPTS="--no-nfs-version 2 --no-nfs-version 3 --nfs-version 4 --no-udp"
# Do you want to start the svcgssd daemon? It is only required for Kerberos
# exports. Valid alternatives are "yes" and "no"; the default is "no".
NEED_SVCGSSD=""
# Options for rpc.svcgssd.
RPCSVCGSSDOPTS=""
The important bits are the 'RPCMOUNTD opts'. When we do this we only have to worry about port 2049 no nasty mountd or nfslock crap. Very nice. This file is /etc/common/nfs-common
- This one may or may not be necessary, adding in case world ends:
# If you do not set values for the NEED_ options, they will be attempted
# autodetected; this should be sufficient for most people. Valid alternatives
# for the NEED_ options are "yes" and "no".
# Do you want to start the statd daemon? It is not needed for NFSv4.
NEED_STATD="no"
# Options for rpc.statd.
# Should rpc.statd listen on a specific port? This is especially useful
# when you have a port-based firewall. To use a fixed port, set this
# this variable to a statd argument like: "--port 4000 --outgoing-port 4001".
# For more information, see rpc.statd(8) or http://wiki.debian.org/SecuringNFS
STATDOPTS=
# Do you want to start the idmapd daemon? It is only needed for NFSv4.
NEED_IDMAPD="yes"
# Do you want to start the gssd daemon? It is required for Kerberos mounts.
NEED_GSSD=
RPCNFSDOPTS="-N 2 -N 3"
RPCMOUNTDOPTS="--manage-gids -N 2 -N 3"
After I do this I am getting 95 Megabytes/sec on $59.00 hardware. Its really quite impressive. I may or may not add kerberos authentication at some point, but I am relatively happy with this.
In order to get better network performance, its important to increase the size of our recieve and send buffers for nfs:
net.core.wmem_max=262144
net.core.rmem_max=262144
net.core.wmem_default=26144
net.core.rmem_default=26144
Simply cat these values to /etc/sysctl.conf and reboot.
The ansible scripts are basically set to fix a lot of this automatically. They are mostly for the 'cpu' issues mentioed. Here are approxymate purposes of each, and explanation of directory structure.
docs=Documentation
nfs=nfs server setup
scripts=ansible scripts
scripts/binary = k0s binary for creating our cluster
fixcpucopy.yml= fixes the cpu issues by changing defualt governor to on demand and changes the threshold to 10% if cpu is busy, also copies the k0s binary
inventory.file= inventory file for ansible, populate appropriate node ip / names here. I assume you have working DNS. Node ips necessary for 'groupings'
pi1/2/3/4.yml = change hostnames as appropiate pi4m is the 'conductor node'
fixcpu.sh.start = rc script to 'fix' ondemand governor.
Need to download k0s current, and put in the 'scripts/binary' with name k-s. Be sure to get the arm64 version. Also for a sightly smaller binary (about 40 megs or so) you can upx --lzma k0s
. UPx is the 'ultimate packer for executables' its completely optional.
ansible-playbook -i inventory.file fixcpucopyk0s.yml -K
This lets you specify the 'sudo' password.
You will need to setup passwordless ssh access via modifying the authorized_keys
file in appropriate users directory. Also this user will need sudo permission.
1.2.3.4 pi1 pi1.your.domain
1.2.3.5 pi2 pi2.your.domain
1.2.3.6 pi3 pi3.your.domain
1.2.3.7 pi4m pi4m.your.domain
First three are worker nodes, pi4m is the main node. It does the orchestration etc. I generally make sure my DNS is working with these, but this will make sure the ansible scripts are happy.
This serves as the 'storage' option inside of kubernetes using the 'nfs' driver. I use nfsv4 because its WAY faster than the sd cards (and doesn't wear them down as fast!). It also separates my 'data' from my 'system'. I use a hardkernel HC2 running Arabian.
Once everything is done, simply follow this skip the 'download' part since you have already done this with the 'ansible' prepare script. I like having 'everything' done w/o running curl | sh as root. It scares me (I am an old security nerd, so I just can't do that!)
If you aren't fanatical about filesystem performance, you can skip the f2fs conversion. I just like it because it makes 'life' in system 'seem faster' since its an atomic file (non journaling) filesystem.
Be sure to 'set' and not forget your root password. Also create a 'non' privileged user who can login as 'non root' and then sudo/doas to perform privileged ops.
Copy your 'authorized' public ssh key to the user you create PRIOR to cloning! This will save setup 4 times! Make sure your user has 'sudo' permissions to become 'root'.
Fix your 'ssh' so its securly configured, I use ssh-audit to do this. You also might want to enable 'auto patching' to keep your system up to date.
#!/bin/sh
# SPDX-License-Identifier: GPL-2.0-only
#
# cron job for automatic software updates
# Copyright (c) 2014 Kaarle Ritvanen
set -eu
sleep $(expr $RANDOM % 7200)
exec apk -U upgrade
- Put this file in /etc/periodic/daily
- Reboot once per week:
#!/usr/bin/env sh
/sbin/reboot
- put in /etc/periodic/weekly
- This is also in the ansible. More of reference if anyone is interested.
- Rack for Pi4
- Cables for pi4 hdmi lan passthrough
- 4x POE Hats for pi4
- 3x PI4 with 8gb (worker)
- 1x pi4 with 4gb (boss)
- cheap 8 port rack-mountable kvm switch
- Nice managed switch with POE (and after fan replacement quiet!)
- Hardkernel HC2 nfs server
- good sd card
- sd card 'usb-c/usb-3 reader/writer'
- good burning program (light)
- good burning program (fat)