diff --git a/docs/age_of_empires.md b/docs/age_of_empires.md index 0e7396a27ca..5a7fbd322f5 100644 --- a/docs/age_of_empires.md +++ b/docs/age_of_empires.md @@ -185,6 +185,8 @@ Build new houses when you're 2 of population down to the limit ## Tournaments +- 2024 Hidden Cup 5: + - [Semifinal Viper vs Lierey](https://yewtu.be/watch?v=Ol-mqMeQ7OQ) - 2023 Masters of Arena 7 Final Tatoh vs Vinchester: - [Casted by T90](https://www.youtube.com/watch?v=3qg4Xwm8CAo&t=1211s) - [Pov by Tatoh](https://www.youtube.com/watch?v=AI_JRA_nCpw&t=8854) diff --git a/docs/ansible_snippets.md b/docs/ansible_snippets.md index 087beb76bb0..a62bc4e2195 100644 --- a/docs/ansible_snippets.md +++ b/docs/ansible_snippets.md @@ -4,6 +4,123 @@ date: 20220119 author: Lyz --- +# [Filter json data](https://docs.ansible.com/ansible/latest/collections/community/general/docsite/filter_guide_selecting_json_data.html) +To select a single element or a data subset from a complex data structure in JSON format (for example, Ansible facts), use the `community.general.json_query` filter. The `community.general.json_query` filter lets you query a complex JSON structure and iterate over it using a loop structure. + + +This filter is built upon jmespath, and you can use the same syntax. For examples, see [jmespath examples](http://jmespath.org/examples.html). + +A complex example would be: + +```yaml +"{{ ec2_facts | json_query('instances[0].block_device_mappings[?device_name!=`/dev/sda1` && device_name!=`/dev/xvda`].{device_name: device_name, id: ebs.volume_id}') }}" +``` + +This snippet: + +- Gets all dictionaries under the block_device_mappings list which `device_name` is not equal to `/dev/sda1` or `/dev/xvda` +- From those results it extracts and flattens only the desired values. In this case `device_name` and the `id` which is at the key `ebs.volume_id` of each of the items of the block_device_mappings list. + +# [Do asserts](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/assert_module.html) +```yaml +- name: After version 2.7 both 'msg' and 'fail_msg' can customize failing assertion message + ansible.builtin.assert: + that: + - my_param <= 100 + - my_param >= 0 + fail_msg: "'my_param' must be between 0 and 100" + success_msg: "'my_param' is between 0 and 100" +``` + +# [Split a variable in ansible ](https://www.middlewareinventory.com/blog/ansible-split-examples/) + +```yaml +{{ item | split ('@') | last }} + +``` + +# Get a list of EC2 volumes mounted on an instance an their mount points +Assuming that each volume has a tag `mount_point` you could: + +```yaml +- name: Gather EC2 instance metadata facts + amazon.aws.ec2_metadata_facts: + +- name: Gather info on the mounted disks + delegate_to: localhost + block: + - name: Gather information about the instance + amazon.aws.ec2_instance_info: + instance_ids: + - "{{ ansible_ec2_instance_id }}" + register: ec2_facts + + - name: Gather volume tags + amazon.aws.ec2_vol_info: + filters: + volume-id: "{{ item.id }}" + # We exclude the root disk as they are already mounted and formatted + loop: "{{ ec2_facts | json_query('instances[0].block_device_mappings[?device_name!=`/dev/sda1` && device_name!=`/dev/xvda`].{device_name: device_name, id: ebs.volume_id}') }}" + register: volume_tags_data + + - name: Save the required volume data + set_fact: + volumes: "{{ volume_tags_data | json_query('results[0].volumes[].{id: id, mount_point: tags.mount_point}') }}" + + - name: Display volumes data + debug: + msg: "{{ volumes }}" + + - name: Make sure that all volumes have a mount point + assert: + that: + - item.mount_point is defined + - item.mount_point|length > 0 + fail_msg: "Configure the 'mount_point' tag on the volume {{ item.id }} on the instance {{ ansible_ec2_instance_id }}" + success_msg: "The volume {{ item.id }} has the mount_point tag well set" + loop: "{{ volumes }}" +``` +# [Create a list of dictionaries using ansible ](https://www.middlewareinventory.com/blog/ansible-dict/) + +```yaml +- name: Create and Add items to dictionary + set_fact: + userdata: "{{ userdata | default({}) | combine ({ item.key : item.value }) }}" + with_items: + - { 'key': 'Name' , 'value': 'SaravAK'} + - { 'key': 'Email' , 'value': 'sarav@gritfy.com'} + - { 'key': 'Location' , 'value': 'Coimbatore'} + - { 'key': 'Nationality' , 'value': 'Indian'} +``` +# [Merge two dictionaries on a key ](https://stackoverflow.com/questions/70627339/merging-two-list-of-dictionaries-according-to-a-key-value-in-ansible) + +If you have these two lists: + +```yaml +"list1": [ + { "a": "b", "c": "d" }, + { "a": "e", "c": "f" } +] + +"list2": [ + { "a": "e", "g": "h" }, + { "a": "b", "g": "i" } +] +``` +And want to merge them using the value of key "a": + +```yaml +"list3": [ + { "a": "b", "c": "d", "g": "i" }, + { "a": "e", "c": "f", "g": "h" } +] +``` + +If you can install the collection community.general use the filter lists_mergeby. The expression below gives the same result + +```yaml +list3: "{{ list1|community.general.lists_mergeby(list2, 'a') }}" +``` # [Avoid arbitrary disk mount](https://forum.ansible.com/t/aws-determine-ebs-volume-physical-name-in-order-to-format-it/2510) Instead of using `/dev/sda` use `/dev/disk/by-id/whatever` diff --git a/docs/anticolonialism.md b/docs/anticolonialism.md index 6544ed681f9..f57fd4c99fe 100644 --- a/docs/anticolonialism.md +++ b/docs/anticolonialism.md @@ -1,5 +1,9 @@ # References +## Poems + + +- [Rafeef Ziadah - "Nosotros enseñamos vida, señor"](https://www.youtube.com/watch?v=neYO0kJ-6XQ) ## Music - [We Will Not Go Down (Song for Gaza Palestine) - Michael Heart](https://yewtu.be/watch?v=dlfhoU66s4Y) diff --git a/docs/bash_snippets.md b/docs/bash_snippets.md index fe2414739a7..e0f5bb651e4 100644 --- a/docs/bash_snippets.md +++ b/docs/bash_snippets.md @@ -4,6 +4,27 @@ date: 20220827 author: Lyz --- +# [Compare two semantic versions](https://www.baeldung.com/linux/compare-dot-separated-version-string) + +[This article](https://www.baeldung.com/linux/compare-dot-separated-version-string) gives a lot of ways to do it. For my case the simplest is to use `dpkg` to compare two strings in dot-separated version format in bash. + +```bash +Usage: dpkg --compare-versions +``` + +If the condition is `true`, the status code returned by `dpkg` will be zero (indicating success). So, we can use this command in an `if` statement to compare two version numbers: +```bash +$ if $(dpkg --compare-versions "2.11" "lt" "3"); then echo true; else echo false; fi +true +``` + +# [Exclude list of extensions from find command ](https://stackoverflow.com/questions/44030071/exclude-list-of-file-extensions-from-find-in-bash-shell) + + +```bash +find . -not \( -name '*.sh' -o -name '*.log' \) +``` + # [Self delete shell script ](https://stackoverflow.com/questions/8981164/self-deleting-shell-script) Add at the end of the script diff --git a/docs/coding/python/pydantic.md b/docs/coding/python/pydantic.md index ca2b8262181..5788f31d6df 100644 --- a/docs/coding/python/pydantic.md +++ b/docs/coding/python/pydantic.md @@ -735,5 +735,5 @@ Or if it fails, add to the line `# pylint: extension-pkg-whitelist`. # References - [Docs](https://pydantic-docs.helpmanual.io/) -- [Git](https://github.com/samuelcolvin/pydantic/) +- [Source](https://github.com/samuelcolvin/pydantic/) [![](not-by-ai.svg){: .center}](https://notbyai.fyi) diff --git a/docs/coding/python/python_snippets.md b/docs/coding/python/python_snippets.md index 4d650cd5b92..249b617c415 100644 --- a/docs/coding/python/python_snippets.md +++ b/docs/coding/python/python_snippets.md @@ -4,6 +4,32 @@ date: 20200717 author: Lyz --- +# [Investigate a class attributes](https://docs.python.org/3/library/inspect.html) + +# [Expire the cache of the lru_cache](https://stackoverflow.com/questions/31771286/python-in-memory-cache-with-time-to-live) + +The `lru_cache` decorator caches forever, a way to prevent it is by adding one more parameter to your expensive function: `ttl_hash=None`. This new parameter is so-called "time sensitive hash", its the only purpose is to affect lru_cache. For example: + +```python +from functools import lru_cache +import time + + +@lru_cache() +def my_expensive_function(a, b, ttl_hash=None): + del ttl_hash # to emphasize we don't use it and to shut pylint up + return a + b # horrible CPU load... + + +def get_ttl_hash(seconds=3600): + """Return the same value withing `seconds` time period""" + return round(time.time() / seconds) + + +# somewhere in your code... +res = my_expensive_function(2, 2, ttl_hash=get_ttl_hash()) +# cache will be updated once in an hour +``` # [Fix variable is unbound pyright error](https://github.com/microsoft/pyright/issues/3041) You may receive these warnings if you set variables inside if or try/except blocks such as the next one: diff --git a/docs/coding/sql/sql.md b/docs/coding/sql/sql.md index 501bb1d02f0..1cfe107fb43 100644 --- a/docs/coding/sql/sql.md +++ b/docs/coding/sql/sql.md @@ -89,9 +89,13 @@ DELETE FROM table_name WHERE condition; ``` -## [Update host permissions for a mysql user](https://serverfault.com/questions/483339/changing-host-permissions-for-mysql-users) - +## [Get the last row of a table ](https://stackoverflow.com/questions/5191503/how-to-select-the-last-record-of-a-table-in-sql) +```sql +SELECT * FROM Table ORDER BY ID DESC LIMIT 1 +``` +## [Get all columns but one in SELECT](url) +## [Update host permissions for a mysql user](https://serverfault.com/questions/483339/changing-host-permissions-for-mysql-users) # [Table relationships](https://launchschool.com/books/sql/read/table_relationships) ## [One to One](https://launchschool.com/books/sql/read/table_relationships#onetoone) @@ -192,7 +196,6 @@ CREATE TABLE checkouts ( FOREIGN KEY (book_id) REFERENCES books(id) ON DELETE CASCADE ); ``` - # [Joins](https://www.w3schools.com/sql/sql_join.asp) A `JOIN` clause is used to combine rows from two or more tables, based on diff --git a/docs/ecc.md b/docs/ecc.md new file mode 100644 index 00000000000..f351ff461a5 --- /dev/null +++ b/docs/ecc.md @@ -0,0 +1,63 @@ +[Error Correction Code](https://www.memtest86.com/ecc.htm) (ECC) is a mechanism used to detect and correct errors in memory data due to environmental interference and physical defects. ECC memory is used in high-reliability applications that cannot tolerate failure due to corrupted data. + +# Installation + Due to additional circuitry required for ECC protection, specialized ECC hardware support is required by the CPU chipset, motherboard and DRAM module. This includes the following: + +- Server-grade CPU chipset with ECC support (Intel Xeon, AMD Ryzen) +- Motherboard supporting ECC operation +- ECC RAM + +Consult the motherboard and/or CPU documentation for the specific model to verify whether the hardware supports ECC. Use vendor-supplied list of certified ECC RAM, if provided. + +Most ECC-supported motherboards allow you to configure ECC settings from the BIOS setup. They are usually on the Advanced tab. The specific option depends on the motherboard vendor or model such as the following: + +- DRAM ECC Enable (American Megatrends, ASUS, ASRock, MSI) +- ECC Mode (ASUS) + +# Monitorization + +The mechanism for how ECC errors are logged and reported to the end-user depends on the BIOS and operating system. In most cases, corrected ECC errors are written to system/event logs. Uncorrected ECC errors may result in kernel panic or blue screen. + +The Linux kernel supports reporting ECC errors for ECC memory via the EDAC (Error Detection And Correction) driver subsystem. Depending on the Linux distribution, ECC errors may be reported by the following: + +- [`rasdaemon`](rasdaemon.md): monitor ECC memory and report both correctable and uncorrectable memory errors on recent Linux kernels. +- `mcelog` (Deprecated): collects and decodes MCA error events on x86. +- `edac-utils` (Deprecated): fills DIMM labels data and summarizes memory errors. + +To configure rasdaemon follow [this article](rasdaemon.md) + +# Confusion on boards supporting ECC + +I've read that even if some motherboards say that they "Support ECC" some of them don't do anything with it. + +On [this post](https://forums.servethehome.com/index.php?threads/has-anyone-gotten-ecc-logging-rasdaemon-edac-whea-etc-to-work-on-xeon-w-1200-or-w-1300-or-core-12-or-13-gen-processors.39257/) and the [kernel docs](https://www.kernel.org/doc/html/latest/firmware-guide/acpi/apei/einj.html) show that you should see references to ACPI/WHEA in the specs manual. Ideally ACPI5 support. + +From the ) EINJ provides a hardware error injection mechanism. It is very useful for debugging and testing APEI and RAS features in general. + +You need to check whether your BIOS supports EINJ first. For that, look for early boot messages similar to this one: + +``` +ACPI: EINJ 0x000000007370A000 000150 (v01 INTEL 00000001 INTL 00000001) +``` + +Which shows that the BIOS is exposing an EINJ table - it is the mechanism through which the injection is done. + +Alternatively, look in `/sys/firmware/acpi/tables` for an "EINJ" file, which is a different representation of the same thing. + +It doesn't necessarily mean that EINJ is not supported if those above don't exist: before you give up, go into BIOS setup to see if the BIOS has an option to enable error injection. Look for something called `WHEA` or similar. Often, you need to enable an `ACPI5` support option prior, in order to see the `APEI`,`EINJ`,... functionality supported and exposed by the BIOS menu. + +To use `EINJ`, make sure the following are options enabled in your kernel configuration: + +``` +CONFIG_DEBUG_FS +CONFIG_ACPI_APEI +CONFIG_ACPI_APEI_EINJ +``` + +One way to test it can be to run [memtest](memtest.md) as it sometimes [shows ECC errors](https://forum.level1techs.com/t/asrock-taichi-x570-ecc-options-no-longer-in-bios/178045) such as `** Warning** ECC injection may be disabled for AMD Ryzen (70h-7fh)`. + +Other people ([1](https://www.memtest86.com/ecc.htm), [2](https://www.reddit.com/r/ASRock/comments/jlsw5z/x570_pro4_correctable_ecc_errors_no_response_from/) say that there are a lot of motherboards that NEVER report any corrected errors to the OS. In order to see corrected errors, PFEH (Platform First Error Handling) has to be disabled. On some motherboards and FW versions this setting is hidden from the user and always enabled, thus resulting in zero correctable errors getting reported. + +[They also suggest](https://www.memtest86.com/ecc.htm) to disable "Quick Boot". In order to initialize ECC, memory has to be written before it can be used. Usually this is done by BIOS, but with some motherboards this step is skipped if "Quick Boot" is enabled. + +The people behind [memtest](memtest.md) have a [paid tool to test ECC](https://www.passmark.com/products/ecc-tester/index.php) diff --git a/docs/fastapi.md b/docs/fastapi.md index aba723008cc..1d1155f593c 100644 --- a/docs/fastapi.md +++ b/docs/fastapi.md @@ -60,9 +60,17 @@ pip install uvicorn[standard] - Run the server: - ```bash - uvicorn main:app --reload - ``` + - From the command line: + + ```bash + uvicorn main:app --reload + ``` + - Or [from python](https://stackoverflow.com/questions/73908734/how-to-run-uvicorn-fastapi-server-as-a-module-from-another-python-file): + ```python + import uvicorn + if __name__ == "__main__": + uvicorn.run("main:app", host="0.0.0.0", port=8000, reload=True) + ``` - Open your browser at http://127.0.0.1:8000/items/5?q=somequery. You will see the JSON response as: @@ -842,6 +850,23 @@ app = FastAPI() # rest of the application... ``` +For more information on changing the logging read [1](https://nuculabs.dev/p/fastapi-uvicorn-logging-in-production) + +To set the datetime of the requests [use this configuration](https://stackoverflow.com/questions/62894952/fastapi-gunicon-uvicorn-access-log-format-customization) + +```python +@asynccontextmanager +async def lifespan(api: FastAPI): + logger = logging.getLogger("uvicorn.access") + console_formatter = uvicorn.logging.ColourizedFormatter( + "{asctime} {levelprefix} : {message}", style="{", use_colors=True + ) + logger.handlers[0].setFormatter(console_formatter) + yield + +api = FastAPI(lifespan=lifespan) +``` + ## Logging to Sentry FastAPI can diff --git a/docs/ffmpeg.md b/docs/ffmpeg.md index 72712dbd3cc..92f7e74bc7d 100644 --- a/docs/ffmpeg.md +++ b/docs/ffmpeg.md @@ -320,6 +320,17 @@ This will rotate the video 180° counter-clockwise. ffmpeg -i input.mp4 -filter:v 'transpose=2,transpose=2' rotated-video.mp4 ``` + +For the transpose parameter you can pass: + +- 0 = 90° counterclockwise and vertical flip (default) +- 1 = 90° clockwise +- 2 = 90° counterclockwise +- 3 = 90° clockwise and vertical flip + +Note that this will re-encode the audio and video parts. You can usually copy the audio without touching it, by using -c:a copy + +You can't overwrite the file directly, you need to move it to a temp and then to move it. [This python script does it for you](https://github.com/laurentperrinet/photoscripts/blob/master/rotate_video.py) ## Speed up or Slow down the video You can change the speed of your video using the setpts (set presentation time diff --git a/docs/git.md b/docs/git.md index c8145b75116..5c4c2c398e7 100644 --- a/docs/git.md +++ b/docs/git.md @@ -557,6 +557,21 @@ another repository into your project and keep your commits separate. ## Submodule tips +### [Update all git submodules](https://stackoverflow.com/questions/1030169/pull-latest-changes-for-all-git-submodules) + +If it's the first time you check-out a repo you need to use `--init` first: + +```bash +git submodule update --init --recursive +``` + +To update to latest tips of remote branches use: + +```bash +git submodule update --recursive --remote + +``` + ### Submodule Foreach There is a foreach submodule command to run some arbitrary command in each diff --git a/docs/gitea.md b/docs/gitea.md index 322f1902ae6..d40986d378d 100644 --- a/docs/gitea.md +++ b/docs/gitea.md @@ -16,6 +16,143 @@ opinion. Gitea provides automatically updated Docker images within its Docker Hub organisation. +## [Disable the regular login, use only Oauth](https://discourse.gitea.io/t/solved-removing-default-login-interface/2740/2) + +Inside your [`custom` directory](https://docs.gitea.io/en-us/customizing-gitea/) which may be `/var/lib/gitea/custom`: + +* Create the directories `templates/user/auth`, +* Create the `signin_inner.tmpl` file with the next contents. If it fails check [the latest version of the file](https://raw.githubusercontent.com/go-gitea/gitea/main/templates/user/auth/signin_inner.tmpl) and tweak it accordingly: + ```jinja2 + {{if or (not .LinkAccountMode) (and .LinkAccountMode .LinkAccountModeSignIn)}} + {{template "base/alert" .}} + {{end}} +

+ {{if .LinkAccountMode}} + {{ctx.Locale.Tr "auth.oauth_signin_title"}} + {{else}} + {{ctx.Locale.Tr "auth.login_userpass"}} + {{end}} +

+
+
+ {{if .OAuth2Providers}} + + {{end}} +
+
+ ``` + +## [Configure it with terraform](https://registry.terraform.io/providers/Lerentis/gitea/latest/docs) + +Gitea can be configured through terraform too. There is an [official provider](https://gitea.com/gitea/terraform-provider-gitea/src/branch/main) that doesn't work, there's a [fork that does though](https://registry.terraform.io/providers/Lerentis/gitea/latest/docs). Sadly it doesn't yet support configuring Oauth Authentication sources. Be careful [`gitea_oauth2_app`](https://registry.terraform.io/providers/Lerentis/gitea/latest/docs/resources/oauth2_app) looks to be the right resource to do that, but instead it configures Gitea to be the Oauth provider, not a consumer. + +To configure the provider you need to specify the url and a Gitea API token, keeping in mind that whoever gets access to this information will have access and full permissions on your Gitea instance it's critical that [you store this information well](terraform.md#sensitive-information). We'll use [`sops` to encrypt the token with GPG.](#sensitive-information-in-the-terraform-source-code). + +First create a Gitea user under `Site Administration/User Accounts/` with the `terraform` name (use your Oauth2 provider if you have one!). + +Then log in with that user and create a token with name `Terraform` under `Settings/Applications`, copy it to your clipboard. + +Configure `sops` by defining the gpg keys in a `.sops.yaml` file at the top of your repository: + +```yaml +--- +creation_rules: + - pgp: >- + 2829BASDFHWEGWG23WDSLKGL323534J35LKWERQS, + 2GEFDBW349YHEDOH2T0GE9RH0NEORIG342RFSLHH +``` + +Then create the secrets file with the command `sops secrets.enc.json` somewhere in your terraform repository. For example: + +```json +{ + "gitea_token": "paste the token here" +} +``` + +```hcl +terraform { + required_providers { + gitea = { + source = "Lerentis/gitea" + version = "~> 0.12.1" + } + sops = { + source = "carlpett/sops" + version = "~> 0.5" + } + } +} + +provider "gitea" { + base_url = "https://gitea.your-domain.org" + token = data.sops_file.secrets.data["gitea_token"] +} +``` + +### [Create an organization](https://registry.terraform.io/providers/Lerentis/gitea/latest/docs/resources/team) + +If you manage your users externally for example with an Oauth2 provider like [Authentik](authentik.md) you don't need to create a resource for the users, use a `data` instead: + +```terraform +resource "gitea_org" "docker_compose" { + name = "docker-compose" +} + +resource "gitea_team" "docker_compose" { + name = "Developers" + organisation = gitea_org.docker_compose.name + permission = "owner" + members = [ + data.gitea_user.lyz.username, + ] +} +``` + +If you have many organizations that share the same users you can use variables. + +```terraform + +resource "gitea_org" "docker_compose" { + name = "docker-compose" +} + +resource "gitea_team" "docker_compose" { + name = "Developers" + organisation = gitea_org.docker_compose.name + permission = "owner" + members = [ + data.gitea_user.lyz.username, + ] +} +``` + +To import organisations and teams you need to use their `ID`. You can see the ID of the organisations in the Administration panel. To get the Teams ID you need to use the API. Go to https://your.gitea.com/api/swagger#/organization/orgListTeams and enter the organisation name. + +## Create an admin user through the command line + +```bash +gitea --config /etc/gitea/app.ini admin user create --admin --email email --username user_name --password password +``` + +Or you can change [the admin's password](https://discourse.gitea.io/t/how-to-change-gitea-admin-password-from-the-command-terminal-line/1930): + +```bash +gitea --config /etc/gitea/app.ini admin user change-password -u username -p password +``` + +# Actions ## [Configure gitea actions](https://blog.gitea.io/2023/03/hacking-on-gitea-actions/) We've been using [Drone](drone.md) as CI runner for some years now as Gitea didn't have their native runner. On [Mar 20, 2023](https://blog.gitea.io/2023/03/gitea-1.19.0-is-released/) however Gitea released the version 1.19.0 which promoted to stable the Gitea Actions which is a built-in CI system like GitHub Actions. With Gitea Actions, you can reuse your familiar workflows and Github Actions in your self-hosted Gitea instance. While it is not currently fully compatible with GitHub Actions, they intend to become as compatible as possible in future versions. The typical procedure is as follows: @@ -402,141 +539,11 @@ This is useful to send notifications if any of the jobs failed. ${{ github.repository }}: [${{ github.ref }}@${{ github.sha }}](${{ github.server_url }}/${{ github.repository }}/actions) ``` -## [Disable the regular login, use only Oauth](https://discourse.gitea.io/t/solved-removing-default-login-interface/2740/2) - -Inside your [`custom` directory](https://docs.gitea.io/en-us/customizing-gitea/) which may be `/var/lib/gitea/custom`: - -* Create the directories `templates/user/auth`, -* Create the `signin_inner.tmpl` file with the next contents. If it fails check [the latest version of the file](https://raw.githubusercontent.com/go-gitea/gitea/main/templates/user/auth/signin_inner.tmpl) and tweak it accordingly: - ```jinja2 - {{if or (not .LinkAccountMode) (and .LinkAccountMode .LinkAccountModeSignIn)}} - {{template "base/alert" .}} - {{end}} -

- {{if .LinkAccountMode}} - {{ctx.Locale.Tr "auth.oauth_signin_title"}} - {{else}} - {{ctx.Locale.Tr "auth.login_userpass"}} - {{end}} -

-
-
- {{if .OAuth2Providers}} - - {{end}} -
-
- ``` - -## [Configure it with terraform](https://registry.terraform.io/providers/Lerentis/gitea/latest/docs) - -Gitea can be configured through terraform too. There is an [official provider](https://gitea.com/gitea/terraform-provider-gitea/src/branch/main) that doesn't work, there's a [fork that does though](https://registry.terraform.io/providers/Lerentis/gitea/latest/docs). Sadly it doesn't yet support configuring Oauth Authentication sources. Be careful [`gitea_oauth2_app`](https://registry.terraform.io/providers/Lerentis/gitea/latest/docs/resources/oauth2_app) looks to be the right resource to do that, but instead it configures Gitea to be the Oauth provider, not a consumer. - -To configure the provider you need to specify the url and a Gitea API token, keeping in mind that whoever gets access to this information will have access and full permissions on your Gitea instance it's critical that [you store this information well](terraform.md#sensitive-information). We'll use [`sops` to encrypt the token with GPG.](#sensitive-information-in-the-terraform-source-code). - -First create a Gitea user under `Site Administration/User Accounts/` with the `terraform` name (use your Oauth2 provider if you have one!). - -Then log in with that user and create a token with name `Terraform` under `Settings/Applications`, copy it to your clipboard. - -Configure `sops` by defining the gpg keys in a `.sops.yaml` file at the top of your repository: - -```yaml ---- -creation_rules: - - pgp: >- - 2829BASDFHWEGWG23WDSLKGL323534J35LKWERQS, - 2GEFDBW349YHEDOH2T0GE9RH0NEORIG342RFSLHH -``` - -Then create the secrets file with the command `sops secrets.enc.json` somewhere in your terraform repository. For example: - -```json -{ - "gitea_token": "paste the token here" -} -``` - -```hcl -terraform { - required_providers { - gitea = { - source = "Lerentis/gitea" - version = "~> 0.12.1" - } - sops = { - source = "carlpett/sops" - version = "~> 0.5" - } - } -} - -provider "gitea" { - base_url = "https://gitea.your-domain.org" - token = data.sops_file.secrets.data["gitea_token"] -} -``` - -### [Create an organization](https://registry.terraform.io/providers/Lerentis/gitea/latest/docs/resources/team) - -If you manage your users externally for example with an Oauth2 provider like [Authentik](authentik.md) you don't need to create a resource for the users, use a `data` instead: +## Create your own actions -```terraform -resource "gitea_org" "docker_compose" { - name = "docker-compose" -} +Note: Using private actions is not yet supported. Look at [1](https://github.com/go-gitea/gitea/issues/27935), [2](https://github.com/go-gitea/gitea/issues/24635), [3](https://github.com/go-gitea/gitea/issues/26032). Even though [there is a workaround](https://github.com/go-gitea/gitea/issues/25929) that didn't work for me. -resource "gitea_team" "docker_compose" { - name = "Developers" - organisation = gitea_org.docker_compose.name - permission = "owner" - members = [ - data.gitea_user.lyz.username, - ] -} -``` - -If you have many organizations that share the same users you can use variables. - -```terraform - -resource "gitea_org" "docker_compose" { - name = "docker-compose" -} - -resource "gitea_team" "docker_compose" { - name = "Developers" - organisation = gitea_org.docker_compose.name - permission = "owner" - members = [ - data.gitea_user.lyz.username, - ] -} -``` - -To import organisations and teams you need to use their `ID`. You can see the ID of the organisations in the Administration panel. To get the Teams ID you need to use the API. Go to https://your.gitea.com/api/swagger#/organization/orgListTeams and enter the organisation name. - -## Create an admin user through the command line - -```bash -gitea --config /etc/gitea/app.ini admin user create --admin --email email --username user_name --password password -``` - -Or you can change [the admin's password](https://discourse.gitea.io/t/how-to-change-gitea-admin-password-from-the-command-terminal-line/1930): - -```bash -gitea --config /etc/gitea/app.ini admin user change-password -u username -p password -``` +Follow [this simple tutorial](https://docs.github.com/en/actions/creating-actions/creating-a-docker-container-action) # [Gitea client command line tool](https://gitea.com/gitea/tea) diff --git a/docs/goodconf.md b/docs/goodconf.md index 4736fe4e9b0..8f839619e48 100644 --- a/docs/goodconf.md +++ b/docs/goodconf.md @@ -60,7 +60,58 @@ For more details see Pydantic's docs for examples of loading: - [Dotenv (.env) files](https://pydantic-docs.helpmanual.io/usage/settings/#dotenv-env-support). - [Docker secrets](https://pydantic-docs.helpmanual.io/usage/settings/#secret-support). +## Initialize the config with a default value if the file doesn't exist + +```python + def load(self, filename: Optional[str] = None) -> None: + self._config_file = filename + if not self.store_dir.is_dir(): + log.warning("The store directory doesn't exist. Creating it") + os.makedirs(str(self.store_dir)) + if not Path(self.config_file).is_file(): + log.warning("The yaml store file doesn't exist. Creating it") + self.save() + super().load(filename) + +``` +## Config saving + +So far [`goodconf` doesn't support saving the config](https://github.com/lincolnloop/goodconf/issues/12). Until it's ready you can use the next snippet: + +```python +class YamlStorage(GoodConf): + """Adapter to store and load information from a yaml file.""" + + @property + def config_file(self) -> str: + """Return the path to the config file.""" + return str(self._config_file) + + @property + def store_dir(self) -> Path: + """Return the path to the store directory.""" + return Path(self.config_file).parent + + def reload(self) -> None: + """Reload the contents of the authentication store.""" + self.load(self.config_file) + + def load(self, filename: Optional[str] = None) -> None: + """Load a configuration file.""" + if not filename: + filename = f"{self.store_dir}/data.yaml" + super().load(self.config_file) + + def save(self) -> None: + """Save the contents of the authentication store.""" + with open(self.config_file, "w+", encoding="utf-8") as file_cursor: + yaml = YAML() + yaml.default_flow_style = False + yaml.dump(self.dict(), file_cursor) +``` + # References -- [Git](https://github.com/lincolnloop/goodconf/) +- [Source](https://github.com/lincolnloop/goodconf/) + [![](not-by-ai.svg){: .center}](https://notbyai.fyi) diff --git a/docs/immich.md b/docs/immich.md index 9ea6dbdfa1c..d39c3820919 100644 --- a/docs/immich.md +++ b/docs/immich.md @@ -151,13 +151,11 @@ Example: https://localhost:22283/api cd /home/user/albums im --api-key YOUR_API_KEY --api-host YOUR_API_HOST --original-path /home/user/albums --replace-path /mnt/albums . - ## Edit an image metadata You can't do it directly through the interface yet, use [exiftool](linux_snippets.md#Remove-image-metadata) instead. This is interesting to remove the geolocation of the images that are not yours - ## [Fix a file date on a external library](https://immich.app/docs/features/libraries/#external-libraries) You can change the dates directly by `touching` the file. @@ -226,10 +224,11 @@ __import__("pdb").set_trace() ``` ## Rotate pictures or videos -Until [Image rotation is supported](https://github.com/immich-app/immich/discussions/1695) you have to manually rotate the media. The best way I've found to do this is: +Until [Image rotation is supported](https://github.com/immich-app/immich/discussions/1695) you have to manually rotate the media. For videos the best way I've found is to: -- Select the elements in the web interface and download them -- +- Annotate the file paths on two lists clockwise and counterclockwise. +- Run [the `rotate_video.py`](https://raw.githubusercontent.com/laurentperrinet/photoscripts/master/rotate_video.py) script on those paths (`-c` for counterclockwise and without argument for clockwise) +- Run the script to correct the dates of the archivesj # Not there yet There are some features that are still lacking: diff --git a/docs/linux/google_chrome.md b/docs/linux/google_chrome.md index dc2026ec4f8..e4c289c5d93 100644 --- a/docs/linux/google_chrome.md +++ b/docs/linux/google_chrome.md @@ -28,4 +28,12 @@ to use that service. apt-get update apt-get install google-chrome-stable ``` +# Usage +## [Open a specific profile](https://superuser.com/questions/377186/how-do-i-start-chrome-using-a-specified-user-profile) +```bash +google-chrome --profile-directory="Profile Name" +``` + +Where `Profile Name` is one of the profiles listed under `ls ~/.config/chromium | grep -i profile`. + [![](not-by-ai.svg){: .center}](https://notbyai.fyi) diff --git a/docs/linux/zfs.md b/docs/linux/zfs.md index 8193fb8b5c1..0abf19fe822 100644 --- a/docs/linux/zfs.md +++ b/docs/linux/zfs.md @@ -653,6 +653,21 @@ To do it: zpool create cold-backup-01 /dev/sde2 ``` +# Monitorization + +If you use [loki](loki.md) remember to monitor the `/proc/spl/kstat/zfs/dbgmsg` file: + +```yaml +- job_name: zfs + static_configs: + - targets: + - localhost + labels: + job: zfs + __path__: /proc/spl/kstat/zfs/dbgmsg +``` + + # [Troubleshooting](https://openzfs.github.io/openzfs-docs/Basic%20Concepts/Troubleshooting.html) To debug ZFS errors you can check: @@ -667,7 +682,24 @@ Likely cause: kernel thread hung or panic If a kernel thread is stuck, then a backtrace of the stuck thread can be in the logs. In some cases, the stuck thread is not logged until the deadman timer expires. -The only way I've yet found to solve this is rebooting the machine (not ideal). I even have to use the magic keys -.- . +The only way I've yet found to solve this is rebooting the machine (not ideal). I even have to use the magic keys -.- . A solution may be to [Reboot server on kernel panic ](linux_snippets.md#reboot-server-on-kernel-panic). + +You can monitor this issue with loki using the next alerts: + +```yaml +groups: + - name: zfs + rules: + - alert: SlowSpaSyncZFSError + expr: | + count_over_time({job="zfs"} |~ `spa_deadman.*slow spa_sync` [5m]) + for: 1m + labels: + severity: critical + annotations: + summary: "Slow sync traces found in the ZFS debug logs at {{ $labels.hostname}}" + message: "This usually happens before the ZFS becomes unresponsible" +``` ## [kernel NULL pointer dereference in zap_lockdir](https://github.com/openzfs/zfs/issues/11804) diff --git a/docs/linux_resilience.md b/docs/linux_resilience.md new file mode 100644 index 00000000000..64bf009b7b6 --- /dev/null +++ b/docs/linux_resilience.md @@ -0,0 +1,28 @@ +Increasing the resilience of the servers is critical when hosting services for others. This is the roadmap I'm following for my servers. + +# Autostart services if the system reboots +Using init system services to manage your services +# Get basic metrics traceability and alerts +Set up [Prometheus](prometheus.md) with: + +- The [blackbox exporter](blackbox_exporter.md) to track if the services are available to your users and to monitor SSL certificates health. +- The [node exporter](node_exporter.md) to keep track on the resource usage of your machines and set alerts to get notified when concerning events happen (disks are getting filled, CPU usage is too high) + +# Get basic logs traceability and alerts + +Set up [Loki](loki.md) and clear up your system log errors. + +# Improve the resilience of your data +If you're still using `ext4` for your filesystems instead of [`zfs`](zfs.md) you're missing a big improvement. To set it up: + +- [Plan your zfs storage architecture](zfs_storage_planning.md) +- [Install ZFS](zfs.md) +- [Create ZFS local and remote backups](sanoid.md) +- [Monitor your ZFS ] + +# Automatically react on system failures +- [Kernel panics](https://www.supertechcrew.com/kernel-panics-and-lockups/) +- [watchdog](watchdog.md) + +# Future undeveloped improvements +- Handle the system reboots after kernel upgrades diff --git a/docs/linux_snippets.md b/docs/linux_snippets.md index 398f8426d58..025c2d75b3e 100644 --- a/docs/linux_snippets.md +++ b/docs/linux_snippets.md @@ -4,6 +4,80 @@ date: 20200826 author: Lyz --- +# [Send multiline messages with notify-send](https://stackoverflow.com/questions/35628702/display-multi-line-notification-using-notify-send-in-python) +The title can't have new lines, but the body can. + +```bash +notify-send "Title" "This is the first line.\nAnd this is the second.") +``` +# [Find BIOS version](https://www.cyberciti.biz/faq/check-bios-version-linux/) + +```bash +dmidecode | less +``` +# [Reboot server on kernel panic ](https://unix.stackexchange.com/questions/29567/how-to-early-configure-linux-kernel-to-reboot-on-panic ) +The `proc/sys/kernel/panic` file gives read/write access to the kernel variable `panic_timeout`. If this is zero, the kernel will loop on a panic; if nonzero it indicates that the kernel should autoreboot after this number of seconds. When you use the software watchdog device driver, the recommended setting is `60`. + +To set the value add the next contents to the `/etc/sysctl.d/99-panic.conf` + +``` +kernel.panic = 60 +``` + +Or with an ansible task: + +```yaml +- name: Configure reboot on kernel panic + become: true + lineinfile: + path: /etc/sysctl.d/99-panic.conf + line: kernel.panic = 60 + create: true + state: present +``` + +# [Share a calculated value between github actions steps](https://docs.github.com/en/actions/using-workflows/workflow-commands-for-github-actions#setting-an-output-parameter) + +You need to set a step's output parameter. Note that the step will need an `id` to be defined to later retrieve the output value. + +```bash +echo "{name}={value}" >> "$GITHUB_OUTPUT" +``` + +For example: + +```yaml +- name: Set color + id: color-selector + run: echo "SELECTED_COLOR=green" >> "$GITHUB_OUTPUT" +- name: Get color + env: + SELECTED_COLOR: ${{ steps.color-selector.outputs.SELECTED_COLOR }} + run: echo "The selected color is $SELECTED_COLOR" +``` + +# [Split a zip into sizes with restricted size ](https://unix.stackexchange.com/questions/198982/zip-files-with-size-limit) +Something like: + +```bash +zip -9 myfile.zip * +zipsplit -n 250000000 myfile.zip +``` + +Would produce `myfile1.zip`, `myfile2.zip`, etc., all independent of each other, and none larger than 250MB (in powers of ten). `zipsplit` will even try to organize the contents so that each resulting archive is as close as possible to the maximum size. +# [find files that were modified between dates](https://unix.stackexchange.com/questions/29245/how-to-list-files-that-were-changed-in-a-certain-range-of-time) +The best option is the `-newerXY`. The m and t flags can be used. + +- `m` The modification time of the file reference +- `t` reference is interpreted directly as a time + +So the solution is + +```bash +find . -type f -newermt 20111222 \! -newermt 20111225 +``` + +The lower bound in inclusive, and upper bound is exclusive, so I added 1 day to it. And it is recursive. # [Rotate image with the command line ](https://askubuntu.com/questions/591733/rotate-images-from-terminal) If you want to overwrite in-place, `mogrify` from the ImageMagick suite seems to be the easiest way to achieve this: diff --git a/docs/loki.md b/docs/loki.md index 81125d6a9bb..726abc5c7c8 100644 --- a/docs/loki.md +++ b/docs/loki.md @@ -217,19 +217,24 @@ This example configuration sources rules from a local disk. ```yaml ruler: + alertmanager_url: http://alertmanager:9093 storage: type: local local: - directory: /tmp/rules - rule_path: /tmp/scratch - alertmanager_url: http://localhost + directory: /etc/loki/rules + rule_path: /tmp/rules ring: kvstore: store: inmemory enable_api: true + enable_alertmanager_v2: true ``` -There are two kinds of rules: alerting rules and recording rules. +If you only have one Loki instance you need to save the rule yaml files in the `/etc/loki/rules/fake/` otherwise Loki will silently ignore them (it took me a lot of time to figure this out `-.-`). + +Surprisingly I haven't found any compilation of Loki alerts. I'll gather here the ones I create. + +There are two kinds of rules: alerting rules and recording rules. ### Alerting rules @@ -262,6 +267,11 @@ groups: labels: severity: critical ``` +More examples of alert rules can be found in the next articles: + +- [ECC error alerts](rasdaemon.md#monitorization) +- [ZFS errors](zfs.md#zfs-pool-is-stuck) +- [Sanoid errors](sanoid.md#monitorization) ### Recording rules Recording rules allow you to precompute frequently needed or computationally expensive expressions and save their result as a new set of time series. @@ -296,16 +306,11 @@ ruler: client: url: http://localhost:9090/api/v1/write ``` -# Usage - ## [Build dashboards](https://grafana.com/blog/2020/04/08/loki-quick-tip-how-to-create-a-grafana-dashboard-for-searching-logs-using-loki-and-prometheus/) -## [Creating alerts](https://grafana.com/docs/loki/latest/alert/) - -Surprisingly I haven't found any compilation of Loki alerts. I'll gather here the ones I create. - -### Generic docker errors -To catch the errors shown in docker (assuming you're using my same [promtail configuration](promtail.md#scrape-docker-logs)) you can use the next rule (that needs to go into your Loki configuration). +# Troubleshooting +[This stackoverflow answer](https://stackoverflow.com/questions/74329564/how-configure-recording-and-alerting-rules-with-loki) has some insights on how to debug broken loki rules # References - [Docs](https://grafana.com/docs/loki/latest/) + [![](not-by-ai.svg){: .center}](https://notbyai.fyi) diff --git a/docs/luddites.md b/docs/luddites.md new file mode 100644 index 00000000000..7eb53cd20eb --- /dev/null +++ b/docs/luddites.md @@ -0,0 +1,3 @@ +# References + +- [Comic about luddites](https://www.technologyreview.com/2024/02/28/1088262/luddites-resisting-automated-future-technology/) diff --git a/docs/magic_keys.md b/docs/magic_keys.md new file mode 100644 index 00000000000..badc6aa9b65 --- /dev/null +++ b/docs/magic_keys.md @@ -0,0 +1,67 @@ +--- +Title: Docs for Magic Keys +Author: Lyz +Date: 20170417 +Keywords: magic keys + rescue + freeze + kernel panic +Tags: publish +--- + +The magic SysRq key is a key combination understood by the Linux kernel, which +allows the user to perform various low-level commands regardless of the system's +state. It is often used to recover from freezes, or to reboot a computer without +corrupting the filesystem.[1] Its effect is similar to the computer's hardware +reset button (or power switch) but with many more options and much more control. + +This key combination provides access to powerful features for software +development and disaster recovery. In this sense, it can be considered a form of +escape sequence. Principal among the offered commands are means to forcibly +unmount file systems, kill processes, recover keyboard state, and write +unwritten data to disk. With respect to these tasks, this feature serves as +a tool of last resort. + +The magic SysRq key cannot work under certain conditions, such as a kernel +panic[2] or a hardware failure preventing the kernel from running properly. + +The key combination consists of Alt+Sys Req and another key, which controls the +command issued. + +On some devices, notably laptops, the Fn key may need to be pressed to use the +magic SysRq key. + +# Reboot the machine + +A common use of the magic SysRq key is to perform a safe reboot of a Linux +computer which has otherwise locked up (abbr. REISUB). This can prevent a fsck +being required on reboot and gives some programs a chance to save emergency +backups of unsaved work. The QWERTY (or AZERTY) mnemonics: "Raising Elephants Is +So Utterly Boring", "Reboot Even If System Utterly Broken" or simply the word +"BUSIER" read backwards, are often used to remember the following SysRq-keys +sequence: + +* unRaw (take control of keyboard back from X), +* tErminate (send SIGTERM to all processes, allowing them to terminate gracefully), +* kIll (send SIGKILL to all processes, forcing them to terminate immediately), +* Sync (flush data to disk), +* Unmount (remount all filesystems read-only), +* reBoot. + +When magic SysRq keys are used to kill a frozen graphical program, the program +has no chance to restore text mode. This can make everything unreadable. The +commands textmode (part of SVGAlib) and the reset command can restore text mode +and make the console readable again. + +On distributions that do not include a textmode executable, the key command +Ctrl+Alt+F1 may sometimes be able to force a return to a text console. (Use F1, +F2, F3,..., F(n), where n is the highest number of text consoles set up by the +distribution. Ctrl+Alt+F(n+1) would normally be used to reenter GUI mode on +a system on which the X server has not crashed.) + +# [Interact with the sysrq through the commandline](https://unix.stackexchange.com/questions/714910/what-is-a-good-way-to-test-watchdog-script-or-command-to-deliberately-overload) +It can also be used by echoing letters to `/proc/sysrq-trigger`, for example to trigger a system crash and take a crashdump you can: + +```bash +echo c > /proc/sysrq-trigger +``` diff --git a/docs/memtest.md b/docs/memtest.md new file mode 100644 index 00000000000..a7f73fddd02 --- /dev/null +++ b/docs/memtest.md @@ -0,0 +1,16 @@ +[memtest86](https://www.memtest86.com/) is a testing software for RAM. + +# Installation +```bash +apt-get install memtest86+ +``` + +After the installation you'll get Memtest entries in grub which you can spawn. + +For some unknown reason the memtest of the boot menu didn't work for me. So I [downloaded the latest free version of memtest](https://www.memtest86.com/download.htm) (It's at the bottom of the screen), burnt it in a usb and booted from there. + +# Usage +It will run by itself. For 64GB of ECC RAM it took aproximately 100 minutes to run all the tests. + +## [Check ECC errors](https://www.memtest86.com/ecc.htm) +MemTest86 directly polls ECC errors logged in the chipset/memory controller registers and displays it to the user on-screen. In addition, ECC errors are written to the log and report file. diff --git a/docs/nas.md b/docs/nas.md index 5ff60320599..ebf86198c57 100644 --- a/docs/nas.md +++ b/docs/nas.md @@ -99,6 +99,10 @@ Registered ECC ram, but only Unregistered ECC ram. ## Motherboard +When choosing a motherboard make sure that: + +- If you want [ECC](ecc.md) that it [truly supports ECC](ecc.md#confusion-on-boards-supporting-ecc). + After reading these reviews([1](https://reviewsgarage.com/best-motherboards-for-nas/), [2](https://pcper.com/2020/03/asrock-x570m-pro4-micro-atx-motherboard-review/)) @@ -122,6 +126,8 @@ And it gives me room enough to grow: - I'm only going to use 2 slots of RAM giving me 32GB, but I could grow 32 more easily. +The only downside so far is that [it doesn't look to be IPMI compliant, so it doesn't have hardware watchdog support](watchdog.md#watchdog-hardware is-disabled-error-on-boot) + ## CPU After doing some [basic research](cpu.md) I've chosen the diff --git a/docs/pass.md b/docs/pass.md new file mode 100644 index 00000000000..e5a6a54d46b --- /dev/null +++ b/docs/pass.md @@ -0,0 +1,8 @@ +[pass](http://www.passwordstore.org/) is a command line password store + +# Installation + +## Configure rofi launcher + +- Save [this script](https://raw.githubusercontent.com/carnager/rofi-pass/master/rofi-pass) somewhere in your `$PATH` +- Configure your window manager to launch it whenever you need a password. diff --git a/docs/pdm.md b/docs/pdm.md index 00aef5d7a6b..0de76ad7af0 100644 --- a/docs/pdm.md +++ b/docs/pdm.md @@ -351,7 +351,7 @@ array. Sometimes `pdm` is not able to [locate the best package combination](https://github.com/pdm-project/pdm/issues/1354), -or it does too many loops, so to help it you can update your version constrains +or it does too many loops, so to help it you can updatehtml. your version constrains so that it has the minimum number of candidates. To solve circular dependencies we first need to locate what are the conflicting diff --git a/docs/process_exporter.md b/docs/process_exporter.md new file mode 100644 index 00000000000..fd9e9b2fc93 --- /dev/null +++ b/docs/process_exporter.md @@ -0,0 +1,5 @@ +[`process_exporter`](https://github.com/ncabatoff/process-exporter?tab=readme-ov-file) is a rometheus exporter that mines /proc to report on selected processes. + +# References +- [Source](https://github.com/ncabatoff/process-exporter?tab=readme-ov-file ) +- [Grafana dashboard](https://grafana.com/grafana/dashboards/249-named-processes/) diff --git a/docs/promtail.md b/docs/promtail.md index 65c4050aeb7..5047632fd67 100644 --- a/docs/promtail.md +++ b/docs/promtail.md @@ -10,6 +10,18 @@ It primarily: Attaches labels to log streams Pushes them to the Loki instance. +# Installation +Use [patrickjahns ansible role](https://github.com/patrickjahns/ansible-role-promtail). Some interesting variables are: + +```yaml +loki_url: localhost +promtail_system_user: root + +promtail_config_clients: + - url: "http://{{ loki_url }}:3100/loki/api/v1/push" + external_labels: + hostname: "{{ ansible_hostname }}" +``` # [Configuration](https://grafana.com/docs/loki/latest/send-data/promtail/configuration/) Promtail is configured in a YAML file (usually referred to as config.yaml) which contains information on the Promtail server, where positions are stored, and how to scrape logs from files. @@ -43,13 +55,13 @@ If you're going to use `journald` for your logs you can skip this section. ```yaml scrape_configs: -- job_name: system - static_configs: - - targets: - - localhost - labels: - job: varlogs - __path__: /var/log/*log + - job_name: system + static_configs: + - targets: + - localhost + labels: + job: varlogs + __path__: /var/log/*log ``` ### [Scrape journald logs](https://grafana.com/docs/loki/latest/send-data/promtail/scraping/#journal-scraping-linux-only) @@ -204,7 +216,11 @@ Promtail features an embedded web server exposing a web console at `/` and the f - GET `/ready`: This endpoint returns 200 when Promtail is up and running, and there’s at least one working target. - GET `/metrics`: This endpoint returns Promtail metrics for Prometheus. +# [Troubleshooting](https://grafana.com/docs/loki/latest/send-data/promtail/troubleshooting/) + +Find where is the `positions.yaml` file and see if it evolves. +Sometimes if you are not seeing the logs in loki it's because the query you're running is not correct. # References - [Docs](https://grafana.com/docs/loki/latest/send-data/promtail/) diff --git a/docs/python_protocols.md b/docs/python_protocols.md new file mode 100644 index 00000000000..22530403fac --- /dev/null +++ b/docs/python_protocols.md @@ -0,0 +1,53 @@ +The Python type system supports two ways of deciding whether two objects are compatible as types: nominal subtyping and structural subtyping. + +Nominal subtyping is strictly based on the class hierarchy. If class Dog inherits class `Animal`, it’s a subtype of `Animal`. Instances of `Dog` can be used when `Animal` instances are expected. This form of subtyping subtyping is what Python’s type system predominantly uses: it’s easy to understand and produces clear and concise error messages, and matches how the native `isinstance` check works – based on class hierarchy. + +Structural subtyping is based on the operations that can be performed with an object. Class `Dog` is a structural subtype of class `Animal` if the former has all attributes and methods of the latter, and with compatible types. + +Structural subtyping can be seen as a static equivalent of duck typing, which is well known to Python programmers. See [PEP 544](https://peps.python.org/pep-0544/) for the detailed specification of protocols and structural subtyping in Python. +# Usage + +You can define your own protocol class by inheriting the special Protocol class: + +```python +from typing import Iterable +from typing_extensions import Protocol + +class SupportsClose(Protocol): + # Empty method body (explicit '...') + def close(self) -> None: ... + +class Resource: # No SupportsClose base class! + + def close(self) -> None: + self.resource.release() + + # ... other methods ... + +def close_all(items: Iterable[SupportsClose]) -> None: + for item in items: + item.close() + +close_all([Resource(), open('some/file')]) # OK +``` + +`Resource` is a subtype of the `SupportsClose` protocol since it defines a compatible close method. Regular file objects returned by `open()` are similarly compatible with the protocol, as they support `close()`. + +If you want to define a docstring on the method use the next syntax: + +```python + def load(self, filename: Optional[str] = None) -> None: + """Load a configuration file.""" + ... +``` + +## [Make protocols work with `isinstance`](https://mypy.readthedocs.io/en/stable/protocols.html#using-isinstance-with-protocols) +To check an instance against the protocol using `isinstance`, we need to decorate our protocol with `@runtime_checkable` + +## [Make a protocol property variable](https://mypy.readthedocs.io/en/stable/protocols.html#invariance-of-protocol-attributes) + +## [Make protocol of functions](https://mypy.readthedocs.io/en/stable/protocols.html#callback-protocols) + +# References +- [Mypy article on protocols](https://mypy.readthedocs.io/en/stable/protocols.html) +- [Predefined protocols reference](https://mypy.readthedocs.io/en/stable/protocols.html#predefined-protocol-reference) diff --git a/docs/questionary.md b/docs/questionary.md index 9c97ad53e6c..04ff0e62696 100644 --- a/docs/questionary.md +++ b/docs/questionary.md @@ -15,7 +15,6 @@ interfaces. It makes it very easy to query your user for input. ```bash pip install questionary ``` - ## [Usage](https://questionary.readthedocs.io/en/stable/pages/quickstart.html) ### [Asking a single question](https://questionary.readthedocs.io/en/stable/pages/quickstart.html#asking-a-single-question) @@ -98,8 +97,6 @@ as usual and the default value will be ignored. If you want the question to exit when it receives a `KeyboardInterrupt` event, use `unsafe_ask` instead of `ask`. - - ## [Question types](https://questionary.readthedocs.io/en/stable/pages/types.html) The different question types are meant to cover different use cases. The diff --git a/docs/ram.md b/docs/ram.md index 8ff4bd18be4..0b357552e4a 100644 --- a/docs/ram.md +++ b/docs/ram.md @@ -60,7 +60,7 @@ performance. RAM latency (lower the better) = (CAS Latency (CL) x 2000 ) / Frequency (MHz) ``` -### [ECC](https://en.wikipedia.org/wiki/ECC_memory) +### [ECC](ecc.md) Error correction code memory (ECC memory) is a type of computer data storage that uses an error correction code to detect and correct n-bit data corruption diff --git a/docs/rasdaemon.md b/docs/rasdaemon.md new file mode 100644 index 00000000000..a929be9ba69 --- /dev/null +++ b/docs/rasdaemon.md @@ -0,0 +1,103 @@ +[`rasdaemon`](https://github.com/mchehab/rasdaemon) is a RAS (Reliability, Availability and Serviceability) logging tool. It records memory errors, using the EDAC tracing events. EDAC is a Linux kernel subsystem with handles detection of ECC errors from memory controllers for most chipsets on i386 and x86_64 architectures. EDAC drivers for other architectures like arm also exists. + +# Installation +```bash +apt-get install rasdaemon +``` + +The output will be available via syslog but you can show it to the foreground (`-f`) or to an sqlite3 database (`-r`) + +To post-process and decode received MCA errors on AMD SMCA systems, run: + +```bash +rasdaemon -p --status --ipid --smca --family --model --bank +``` + +Status and IPID Register values (in hex) are mandatory. The smca flag with family and model are required if not decoding locally. Bank parameter is optional. + +You may also start it via systemd: + +```bash +systemctl start rasdaemon +``` + +The rasdaemon will then output the messages to journald. + +# [Usage](https://www.setphaserstostun.org/posts/monitoring-ecc-memory-on-linux-with-rasdaemon/) +At this point `rasdaemon` should already be running on your system. You can now use the `ras-mc-ctl` tool to query the errors that have been detected. If everything is well configured you'll see something like: + +```bash +$: ras-mc-ctl --error-count +Label CE UE +mc#0csrow#2channel#0 0 0 +mc#0csrow#2channel#1 0 0 +mc#0csrow#3channel#1 0 0 +mc#0csrow#3channel#0 0 0 +``` + +If it's not you'll see: + +```bash +ras-mc-ctl: Error: No DIMMs found in /sys or new sysfs EDAC interface not found. +``` + +The `CE` column represents the number of corrected errors for a given DIMM, `UE` represents uncorrectable errors that were detected. The label on the left shows the EDAC path under `/sys/devices/system/edac/mc/` of every DIMM. This is not very readable, if you wish to improve the labeling [read this article](https://www.setphaserstostun.org/posts/monitoring-ecc-memory-on-linux-with-rasdaemon/) + +More ways to check is to run: + +```bash +$: ras-mc-ctl --status +ras-mc-ctl: drivers are loaded. +``` + +You can also see a summary of the state with: + +```bash +$: ras-mc-ctl --summary +No Memory errors. + +No PCIe AER errors. + +No Extlog errors. + +DBD::SQLite::db prepare failed: no such table: devlink_event at /usr/sbin/ras-mc-ctl line 1183. +Can't call method "execute" on an undefined value at /usr/sbin/ras-mc-ctl line 1184. +``` + +# Monitorization + +You can use [loki](loki.md) to monitor ECC errors shown in the logs with the next alerts: + +```yaml +groups: + - name: ecc + rules: + - alert: ECCError + expr: | + count_over_time({job="systemd-journal", unit="rasdaemon.service", level="error"} [5m]) > 0 + for: 1m + labels: + severity: critical + annotations: + summary: "Possible ECC error detected in {{ $labels.hostname}}" + + - alert: ECCWarning + expr: | + count_over_time({job="systemd-journal", unit="rasdaemon.service", level="warning"} [5m]) > 0 + for: 1m + labels: + severity: warning + annotations: + summary: "Possible ECC warning detected in {{ $labels.hostname}}" + - alert: ECCAlert + expr: | + count_over_time({job="systemd-journal", unit="rasdaemon.service", level!~"info|error|warning"} [5m]) > 0 + for: 1m + labels: + severity: info + annotations: + summary: "ECC log trace with unknown severity level detected in {{ $labels.hostname}}" +``` +# References + +- [Source](https://github.com/mchehab/rasdaemon) diff --git a/docs/rofi.md b/docs/rofi.md new file mode 100644 index 00000000000..d66f45d9a68 --- /dev/null +++ b/docs/rofi.md @@ -0,0 +1,77 @@ +[Rofi](https://github.com/davatorium/rofi?tab=readme-ov-file) is a window switcher, application launcher and dmenu replacement. + +# [Installation](https://github.com/davatorium/rofi/blob/next/INSTALL.md) + +```bash +sudo apt-get install rofi +``` + +# [Usage](https://github.com/davatorium/rofi?tab=readme-ov-file#usage) +To launch rofi directly in a certain mode, specify a mode with `rofi -show `. To show the run dialog: + +```bash +rofi -show run +``` + +Or get the options from a script: + +```bash +~/my_script.sh | rofi -dmenu +``` + +Specify an ordered, comma-separated list of modes to enable. Enabled modes can be changed at runtime. Default key is Ctrl+Tab. If no modes are specified, all configured modes will be enabled. To only show the run and ssh launcher: + +```bash +rofi -modes "run,ssh" -show run +``` + +The modes to combine in combi mode. For syntax to `-combi-modes` , see `-modes`. To get one merge view, of window,run, and ssh: + +```bash +rofi -show combi -combi-modes "window,run,ssh" -modes combi +``` + +# [Configuration](https://github.com/davatorium/rofi/blob/next/CONFIG.md) + +The configuration lives at `~/.config/rofi/config.rasi` to create this file with the default conf run: + +```bash +rofi -dump-config > ~/.config/rofi/config.rasi +``` + +## [Use fzf to do the matching]() + +To run once: + +```bash +rofi -show run -sorting-method fzf -matching fuzzy +``` + +To persist them change those same values in the configuration. +## Theme changing +To change the theme: +- Choose the one you like most looking [here](https://davatorium.github.io/rofi/themes/themes/) +- Run `rofi-theme-selector` to select it +- Accept it with `Alt + a` + +## [Keybindings change](https://davatorium.github.io/rofi/current/rofi-keys.5/) + +# [Plugins](https://github.com/davatorium/rofi/wiki/User-scripts) +You can write your custom plugins. If you're on python using [`python-rofi`](https://github.com/bcbnz/python-rofi) seems to be the best option although it looks unmaintained. + +Some interesting examples are: + +- [Python based plugin](https://framagit.org/Daguhh/naivecalendar/-/tree/master?ref_type=heads) +- [Creation of nice menus](https://gitlab.com/vahnrr/rofi-menus/-/tree/master?ref_type=heads) +- [Nice collection of possibilities](https://github.com/adi1090x/rofi/tree/master) +- [Date picker](https://github.com/DMBuce/i3b/blob/master/bin/pickdate) +- [Orgmode capture](https://github.com/wakatara/rofi-org-todo/blob/master/rofi-org-todo.py) + +Other interesting references are: + +- [List of key bindings](https://davatorium.github.io/rofi/current/rofi-keys.5/) +- [Theme guide](https://davatorium.github.io/rofi/current/rofi-theme.5/#examples) +# References +- [Source](https://github.com/davatorium/rofi?tab=readme-ov-file) +- [Docs](https://davatorium.github.io/rofi/) +- [Plugins](https://github.com/davatorium/rofi/wiki/User-scripts) diff --git a/docs/sanoid.md b/docs/sanoid.md index 7b7c542127c..9a07a0d7530 100644 --- a/docs/sanoid.md +++ b/docs/sanoid.md @@ -231,7 +231,23 @@ In this case this should work: ```bash /sbin/syncoid --recursive --force-delete --sendoptions="Rw" zpool/backups zfs-recv@10.29.3.27:zpool/backups ``` - +# Monitorization + +You can monitor this issue with loki using the next alerts: + +```yaml +groups: + - name: zfs + rules: + - alert: ErrorInSanoidLogs + expr: | + count_over_time({job="systemd-journal", syslog_identifier="sanoid"} |= `ERROR` [5m]) + for: 1m + labels: + severity: critical + annotations: + summary: "Errors found on sanoid log at {{ $labels.hostname}}" +``` # Troubleshooting ## [Syncoid no tty present and no askpass program specified](https://sidhion.com/blog/posts/zfs-syncoid-slow/) diff --git a/docs/signal.md b/docs/signal.md index ba21550a6a1..8286c8bfdea 100644 --- a/docs/signal.md +++ b/docs/signal.md @@ -40,6 +40,36 @@ Cons: # Installation +## [Use the Molly FOSS android client](https://molly.im/) +Molly is an independent Signal fork for Android. The advantages are: + +- Contains no proprietary blobs, unlike Signal. +- Protects database with passphrase encryption. +- Locks down the app automatically when you are gone for a set period of time. +- Securely shreds sensitive data from RAM. +- Automatic backups on a daily or weekly basis. +- Supports SOCKS proxy and Tor via Orbot. + +### [Migrate from Signal](https://github.com/mollyim/mollyim-android/wiki/Migrating-From-Signal) + +Note, the migration should be done when the available Molly version is equal to or later than the currently installed Signal app version. + +- Verify your Signal backup passphrase. In the Signal app: Settings > Chats > Chat backups > Verify backup passphrase. +- Optionally, put your phone offline (enable airplane mode or disable data services) until after Signal is uninstalled in step 5. This will prevent the possibility of losing any Signal messages that are received during or after the backup is created. +- Create a Signal backup. In the Signal app, go to Settings > Chats > Chat backups > Create backup. +- Uninstall the Signal app. Now you can put your phone back online (disable airplane mode or re-enable data services). +- Install the Molly or Molly-FOSS app. +- Open the Molly app. Enable database encryption if desired. As soon as the option is given, tap Transfer or restore account. Answer any permissions questions. +- Choose to Restore from backup and tap Choose backup. Navigate to your Signal backup location (Signal/Backups/, by default) and choose the backup that was created in step 3. +- Check the backup details and then tap Restore backup to confirm. Enter the backup passphrase when requested. +- If asked, choose a new folder for backup storage. Or choose Not Now and do it later. + +Consider also: + +- Any previously linked devices will need to be re-linked. Go to Settings > Linked devices in the Molly app. If Signal Desktop is not detecting that it is no longer linked, try restarting it. +- Verify your Molly backup settings and passphrase at Settings > Chats > Chat backups (to change the backup folder, disable and then enable backups). Tap Create backup to create your first Molly backup. +- When you are satisfied that Molly is working, you may want to delete the old Signal backups (in Signal/Backups, by default). +## Install the Signal app These instructions only work for 64 bit Debian-based Linux distributions such as Ubuntu, Mint etc. * Install our official public software signing key diff --git a/docs/time_management_abstraction_levels.md b/docs/time_management_abstraction_levels.md index f00324e02b5..0fafa5ca88e 100644 --- a/docs/time_management_abstraction_levels.md +++ b/docs/time_management_abstraction_levels.md @@ -3,7 +3,7 @@ To be able to manage the complexity of the life roadmap we can use models for different levels of abstraction with different purposes. In increasing level of abstraction: - [Step](#step) -- [Task](#task) +- [Action](#action) - [Project](#project) - [Area](#area) - [Goal](#goal) @@ -12,7 +12,7 @@ To be able to manage the complexity of the life roadmap we can use models for di ## Step -Is the smallest unit in our model, it's a clear representation of an action you need to do. It needs to fit a phrase and usually starts in a verb. The scope of the action has to be narrow enough so that you can follow it without investing thinking energies. In orgmode they are represented as checklists: +Is the smallest unit in our model, it's a clear representation of an action you need to do. It needs to fit a phrase and usually starts in a verb. The scope of the action has to be narrow enough so that you can follow it without ambiguity. In orgmode they are represented as checklists: ```orgmode - [ ] Go to the green grocery store @@ -26,7 +26,7 @@ Sometimes is useful to add more context to the steps, you can use an indented li - [2023-12-12] He told me to call him tomorrow ``` -This is useful when you update waiting tasks. +This is useful when you update waiting actions. There are cases where it's also interesting to record when you've completed a step, you can append the date at the end. @@ -34,7 +34,7 @@ There are cases where it's also interesting to record when you've completed a st - [x] Completed step [2023-12-12] ``` -## Task +## Action Model an action that is defined by a list of steps that need to be completed. It has two possible representations in orgmode: @@ -66,15 +66,15 @@ Nested lists can also be found inside todo items: - [ ] Go to the green grocery store ``` -This is fine as long as it's manageable, once you start seeing many levels of indentation is a great sign that you need to divide your task in different tasks. +This is fine as long as it's manageable, once you start seeing many levels of indentation is a great sign that you need to divide your action in different actions. -### Adding more context to the task +### Adding more context to the action -Sometimes a task title is not enough. You need to register more context to be able to deal with the task. In those cases we need the task to be represented as a todo element. Between the title and the step list we can add the description. +Sometimes a action title is not enough. You need to register more context to be able to deal with the action. In those cases we need the action to be represented as a todo element. Between the title and the step list we can add the description. ```orgmode -* TODO Task title - This is the description of the task to add more context +* TODO Action title + This is the description of the action to add more context - [ ] Step 1 - [ ] Step 2 @@ -83,8 +83,8 @@ Sometimes a task title is not enough. You need to register more context to be ab If you need to use a list in the context, add a Steps section below to avoid errors on the editor. ```orgmode -* TODO Task title - This is the description of the task to add more context: +* TODO Action title + This is the description of the action to add more context: - Context 1 - Context 2 @@ -95,20 +95,20 @@ If you need to use a list in the context, add a Steps section below to avoid err - [ ] Step 2 ``` -### Preventing the closing of a task without reading the step list +### Preventing the closing of a action without reading the step list -If you manage your tasks from an agenda or only reading the task title, there may be cases where you feel that the task is done, but if you see the step list you may realize that there is still stuff to do. A measure that can prevent this case is to add a mark in the task title that suggest you to check the steps. For example: +If you manage your actions from an agenda or only reading the action title, there may be cases where you feel that the action is done, but if you see the step list you may realize that there is still stuff to do. A measure that can prevent this case is to add a mark in the action title that suggest you to check the steps. For example: ```orgmode -* TODO Task title (CHECK) +* TODO Action title (CHECK) - [ ] ... ``` -This is specially useful on recurring tasks that have a defined workflow that needs to be followed, or on tasks that have a defined validation criteria. +This is specially useful on recurring actions that have a defined workflow that needs to be followed, or on actions that have a defined validation criteria. ## Project -Model an action that gathers a list of tasks towards a common greater outcome. +Model an action that gathers a list of actions towards a common greater outcome. ```orgmode * TODO Guarantee you eat well this week @@ -122,7 +122,7 @@ Model an action that gathers a list of tasks towards a common greater outcome. ## Area -Model a group of projects and tasks that follow the same interest, roles or accountabilities. These are not things to finish but rather to use as criteria for analyzing, defining a specific aspect of your life and to prioritize the projects to reach a higher outcome. We'll use areas to maintain balance and sustainability on our responsibilities as we operate in the world. +Model a group of projects and actions that follow the same interest, roles or accountabilities. These are not things to finish but rather to use as criteria for analyzing, defining a specific aspect of your life and to prioritize the projects to reach a higher outcome. We'll use areas to maintain balance and sustainability on our responsibilities as we operate in the world. I use specific orgmode files with the next structure: @@ -157,7 +157,7 @@ An [objective] is an idea of the future or desired result that a person or a gro [Strategy](strategy.md) is a general plan to achieve one or more long-term or overall objectives under conditions of uncertainty. They can be used to define the direction of the [areas](#area) ## Tactic -A [tactic](https://en.wikipedia.org/wiki/Tactic_(method)) is a conceptual action or short series of actions with the aim of achieving a short-term goal. This action can be implemented as one or more specific tasks. +A [tactic](https://en.wikipedia.org/wiki/Tactic_(method)) is a conceptual action or short series of actions with the aim of achieving a short-term goal. This action can be implemented as one or more specific actions. ## Life path @@ -182,7 +182,7 @@ The structure of the [orgmode](orgmode.md) document is as follows: - [ ] ... ``` -Where the principles are usually links to principle documents and the objectives links to tasks. +Where the principles are usually links to principle documents and the objectives links to actions. ## Goal Model what you want to be experiencing in various areas of your life one or two years from now. A `goals.org` file with a list of headings may work. @@ -193,7 +193,7 @@ Aggregate group of goals under a three to five year time span common outcome. Th ## Purpose and principles -The purpose defines the reason and meaning of your existence, principles define your morals, the parameters of action and the criteria for excellence of conduct. These are the core definition of what you really are. Visions, goals, objectives, projects and tasks derive and lead towards them. +The purpose defines the reason and meaning of your existence, principles define your morals, the parameters of action and the criteria for excellence of conduct. These are the core definition of what you really are. Visions, goals, objectives, projects and actions derive and lead towards them. As we increase in the level of abstraction we need more time and energy (both mental and willpower) to adjust the path, it may also mean that the invested efforts so far are not aligned with the new direction, so we may need to throw away some of the advances made. That's why we need to support those changes with a higher levels of analysis and thought. diff --git a/docs/watchdog.md b/docs/watchdog.md new file mode 100644 index 00000000000..f7a8001fd08 --- /dev/null +++ b/docs/watchdog.md @@ -0,0 +1,126 @@ +A [watchdog timer](https://en.wikipedia.org/wiki/Watchdog_timer) (WDT, or simply a watchdog), sometimes called a computer operating properly timer (COP timer), is an electronic or software timer that is used to detect and recover from computer malfunctions. Watchdog timers are widely used in computers to facilitate automatic correction of temporary hardware faults, and to prevent errant or malevolent software from disrupting system operation. + +During normal operation, the computer regularly restarts the watchdog timer to prevent it from elapsing, or "timing out". If, due to a hardware fault or program error, the computer fails to restart the watchdog, the timer will elapse and generate a timeout signal. The timeout signal is used to initiate corrective actions. The corrective actions typically include placing the computer and associated hardware in a safe state and invoking a computer reboot. + +Microcontrollers often include an integrated, on-chip watchdog. In other computers the watchdog may reside in a nearby chip that connects directly to the CPU, or it may be located on an external expansion card in the computer's chassis. + +# Hardware watchdog + +Before you start using the hardware watchdog you need to check if your hardware actually supports it. + +If you see [Watchdog hardware is disabled error on boot](#watchdog-hardware-is-disabled-error-on-boot) things are not looking good. + +## Check if the hardware watchdog is enabled +You can see if hardware watchdog is loaded by running `wdctl`. For example for a machine that has it enabled you'll see: + +``` +Device: /dev/watchdog0 +Identity: iTCO_wdt [version 0] +Timeout: 30 seconds +Pre-timeout: 0 seconds +Timeleft: 30 seconds +FLAG DESCRIPTION STATUS BOOT-STATUS +KEEPALIVEPING Keep alive ping reply 1 0 +MAGICCLOSE Supports magic close char 0 0 +SETTIMEOUT Set timeout (in seconds) 0 0 +``` + +On a machine that doesn't you'll see: + +``` +wdctl: No default device is available.: No such file or directory +``` + +Another option is to run `dmesg | grep wd` or `dmesg | grep watc -i`. For example for a machine that has enabled the hardware watchdog you'll see something like: + +``` +[ 20.708839] iTCO_wdt: Intel TCO WatchDog Timer Driver v1.11 +[ 20.708894] iTCO_wdt: Found a Intel PCH TCO device (Version=4, TCOBASE=0x0400) +[ 20.709009] iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0) +``` + +For one that is not you'll see: + +``` +[ 1.934999] sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver +[ 1.935057] sp5100-tco sp5100-tco: Using 0xfed80b00 for watchdog MMIO address +[ 1.935062] sp5100-tco sp5100-tco: Watchdog hardware is disabled +``` + +If you're out of luck and your hardware doesn't support it you can delegate the task to the software watchdog or get some [usb watchdog](https://github.com/zatarra/usb-watchdog) +# [Systemd watchdog](https://0pointer.de/blog/projects/watchdog.html) + +Starting with version 183 systemd provides full support for hardware watchdogs (as exposed in /dev/watchdog to userspace), as well as supervisor (software) watchdog support for invidual system services. The basic idea is the following: if enabled, systemd will regularly ping the watchdog hardware. If systemd or the kernel hang this ping will not happen anymore and the hardware will automatically reset the system. This way systemd and the kernel are protected from boundless hangs -- by the hardware. To make the chain complete, systemd then exposes a software watchdog interface for individual services so that they can also be restarted (or some other action taken) if they begin to hang. This software watchdog logic can be configured individually for each service in the ping frequency and the action to take. Putting both parts together (i.e. hardware watchdogs supervising systemd and the kernel, as well as systemd supervising all other services) we have a reliable way to watchdog every single component of the system. + + +# [Configuring the watchdog](https://0pointer.de/blog/projects/watchdog.html) +To make use of the hardware watchdog it is sufficient to set the `RuntimeWatchdogSec=` option in `/etc/systemd/system.conf`. It defaults to `0` (i.e. no hardware watchdog use). Set it to a value like `20s` and the watchdog is enabled. After 20s of no keep-alive pings the hardware will reset itself. Note that `systemd` will send a ping to the hardware at half the specified interval, i.e. every 10s. + +Note that the hardware watchdog device (`/dev/watchdog`) is single-user only. That means that you can either enable this functionality in systemd, or use a separate external watchdog daemon, such as the aptly named `watchdog`. Although the built-in hardware watchdog support of systemd does not conflict with other watchdog software by default. systemd does not make use of `/dev/watchdog` by default, and you are welcome to use external watchdog daemons in conjunction with systemd, if this better suits your needs. + +`ShutdownWatchdogSec=`` is another option that can be configured in `/etc/systemd/system.conf`. It controls the watchdog interval to use during reboots. It defaults to 10min, and adds extra reliability to the system reboot logic: if a clean reboot is not possible and shutdown hangs, we rely on the watchdog hardware to reset the system abruptly, as extra safety net. + +Now, let's have a look how to add watchdog logic to individual services. + +First of all, to make software watchdog-supervisable it needs to be patched to send out "I am alive" signals in regular intervals in its event loop. Patching this is relatively easy. First, a daemon needs to read the `WATCHDOG_USEC=` environment variable. If it is set, it will contain the watchdog interval in usec formatted as ASCII text string, as it is configured for the service. The daemon should then issue `sd_notify("WATCHDOG=1")` calls every half of that interval. A daemon patched this way should transparently support watchdog functionality by checking whether the environment variable is set and honouring the value it is set to. + +To enable the software watchdog logic for a service (which has been patched to support the logic pointed out above) it is sufficient to set the `WatchdogSec=` to the desired failure latency. See `systemd.service(5)` for details on this setting. This causes `WATCHDOG_USEC=` to be set for the service's processes and will cause the service to enter a failure state as soon as no keep-alive ping is received within the configured interval. + +The next step is to configure whether the service shall be restarted and how often, and what to do if it then still fails. To enable automatic service restarts on failure set `Restart=on-failure` for the service. To configure how many times a service shall be attempted to be restarted use the combination of `StartLimitBurst=` and `StartLimitInterval=` which allow you to configure how often a service may restart within a time interval. If that limit is reached, a special action can be taken. This action is configured with `StartLimitAction=`. The default is a none, i.e. that no further action is taken and the service simply remains in the failure state without any further attempted restarts. The other three possible values are `reboot`, `reboot-force` and `reboot-immediate`. + +- `reboot` attempts a clean reboot, going through the usual, clean shutdown logic. +- `reboot-force` is more abrupt: it will not actually try to cleanly shutdown any services, but immediately kills all remaining services and unmounts all file systems and then forcibly reboots (this way all file systems will be clean but reboot will still be very fast). +- `reboot-immediate` does not attempt to kill any process or unmount any file systems. Instead it just hard reboots the machine without delay. `reboot-immediate` hence comes closest to a reboot triggered by a hardware watchdog. All these settings are documented in `systemd.service(5)`. + +Putting this all together we now have pretty flexible options to watchdog-supervise a specific service and configure automatic restarts of the service if it hangs, plus take ultimate action if that doesn't help. + +Here's an example unit file: + +```ini +[Unit] +Description=My Little Daemon +Documentation=man:mylittled(8) + +[Service] +ExecStart=/usr/bin/mylittled +WatchdogSec=30s +Restart=on-failure +StartLimitInterval=5min +StartLimitBurst=4 +StartLimitAction=reboot-force +```` + +This service will automatically be restarted if it hasn't pinged the system manager for longer than 30s or if it fails otherwise. If it is restarted this way more often than 4 times in 5min action is taken and the system quickly rebooted, with all file systems being clean when it comes up again. + +To write the code of the watchdog service you can follow one of these guides: + +- [Python based watchdog](https://sleeplessbeastie.eu/2022/08/15/how-to-create-watchdog-for-systemd-service/) +- [Bash based watchdog](https://www.medo64.com/2019/01/systemd-watchdog-for-any-service/) +# [Testing a watchdog](https://serverfault.com/questions/375220/how-to-check-what-if-hardware-watchdogs-are-available-in-linux) +One simple way to test a watchdog is to trigger a kernel panic. This can be done as root with: + +```bash +echo c > /proc/sysrq-trigger +``` + +The kernel will stop responding to the watchdog pings, so the watchdog will trigger. + +SysRq is a 'magical' key combo you can hit which the kernel will respond to regardless of whatever else it is doing, unless it is completely locked up. It can also be used by echoing letters to /proc/sysrq-trigger, like we're doing here. + +In this case, the letter c means perform a system crash and take a crashdump if configured. + + +# Troubleshooting + +## Watchdog hardware is disabled error on boot + +According to the discussion at [the kernel mailing list](https://lore.kernel.org/linux-watchdog/20220509163304.86-1-mario.limonciello@amd.com/T/#u) it means that the system contains hardware watchdog but it has been disabled (probably by BIOS) and Linux cannot enable the hardware. + +If your BIOS doesn't have a switch to enable it, consider the watchdog hardware broken for your system. + +Some people are blacklisting the module so that it's not loaded and therefore it doesn't return the error ([1](https://www.reddit.com/r/openSUSE/comments/a3nmg5/watchdog_hardware_is_disabled_on_boot/), [2](https://bbs.archlinux.org/viewtopic.php?id=239075) + +# References + +- [0pointer post on systemd watchdogs](https://0pointer.de/blog/projects/watchdog.html) +- [Heckel post on how to reboot using watchdogs](https://blog.heckel.io/2020/10/08/reliably-rebooting-ubuntu-using-watchdogs/) diff --git a/mkdocs.yml b/mkdocs.yml index c549f2721ad..d33c367b10c 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -26,6 +26,7 @@ nav: - Environmentalism: environmentalism.md - Laboral: laboral.md - Collaborating tools: collaborating_tools.md + - Ludditest: luddites.md - Life Management: - life_management.md - Time management: @@ -197,6 +198,7 @@ nav: - Code Styling: coding/python/python_code_styling.md - Docstrings: coding/python/docstrings.md - Properties: python_properties.md + - Protocols: python_protocols.md - Package Management: - python_package_management.md - PDM: pdm.md @@ -430,6 +432,10 @@ nav: - OpenZFS storage planning: zfs_storage_planning.md - Sanoid: sanoid.md - ZFS Prometheus exporter: zfs_exporter.md + - Resilience: + - linux_resilience.md + - Memtest: memtest.md + - watchdog: watchdog.md - Monitoring: - Monitoring Comparison: monitoring_comparison.md - Prometheus: @@ -441,6 +447,7 @@ nav: - Blackbox Exporter: devops/prometheus/blackbox_exporter.md - Elasticsearch Exporter: elasticsearch_exporter.md - Node Exporter: devops/prometheus/node_exporter.md + - Process Exporter: process_exporter.md - Python Prometheus: python-prometheus.md - Instance sizing analysis: devops/prometheus/instance_sizing_analysis.md - Prometheus Troubleshooting: >- @@ -474,7 +481,11 @@ nav: - Refinement Template: refinement_template.md - Hardware: - CPU: cpu.md - - RAM: ram.md + - RAM: + - ram.md + - ECC RAM: + - ecc.md + - rasdaemon: rasdaemon.md - Power Supply Unit: psu.md - GPU: gpu.md - Pedal PC: pedal_pc.md @@ -518,6 +529,7 @@ nav: - Kodi: kodi.md - Koel: koel.md - LUKS: linux/luks/luks.md + - Magic keys: magic_keys.md - Matrix: matrix.md - Matrix Highlight: matrix_highlight.md - mbsync: mbsync.md @@ -530,11 +542,13 @@ nav: - nodejs: linux/nodejs.md - Oracle Database: oracle_database.md - Outrun: outrun.md + - Pass: pass.md - Peek: peek.md - Pipx: pipx.md - Profanity: profanity.md - retroarch: retroarch.md - rm: linux/rm.md + - rofi: rofi.md - Rocketchat: rocketchat.md - sed: sed.md - Syncthing: linux/syncthing.md