Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#44 Line interpolation for fan speed. #60

Draft
wants to merge 46 commits into
base: master
Choose a base branch
from

Conversation

7Adrian
Copy link

@7Adrian 7Adrian commented Jul 23, 2023

Hi, I create changes for letting users to use line interpolation of fan speed. We covered this upgrade in #44.

How that interpolation work:

  • ENABLE_LINE_INTERPOLATION = true
  • FAN_SPEED = 10
  • HIGH_FAN_SPEED = 50
  • CPU_TEMPERATURE_FOR_START_LINE_INTERPOLATION = 30
  • CPU_TEMPERATURE_THRESHOLD = 70
CPU Temperature Fan Speed
15 °C 10 %
30 °C 10 %
35 °C 15 %
50 °C 30 %
69 °C 49 %
70 °C Dell fan control

By default application work the same as before. You can change it if you want enable line interpolation by default.

Main changes:

  • New parameters: ENABLE_LINE_INTERPOLATION, CPU_TEMPERATURE_FOR_START_LINE_INTERPOLATION and HIGH_FAN_SPEED.
  • Updated Readme
  • New value is displayed in console on startup and in fan profile.

PS. Changes tested on dell server with one CPU.

@7Adrian 7Adrian mentioned this pull request Jul 23, 2023
@barnhill
Copy link

id be glad to test this out for you if you have it in a docker image on dockerhub

@7Adrian
Copy link
Author

7Adrian commented Jul 24, 2023

@barnhill I needed to create dockerhub account, but now you can test it:

docker-compose.yaml example:

version: '3'

services:
  Dell_iDRAC_fan_controller:
    image: 7adrian/dell_idrac_fan_controller_with_line_interpolation:latest
    container_name: Dell_iDRAC_fan_controller
    restart: unless-stopped
    environment:
      - IDRAC_HOST=<your.IP>
      - IDRAC_USERNAME=<login>
      - IDRAC_PASSWORD=<password>
      - ENABLE_LINE_INTERPOLATION=true
      - FAN_SPEED=10
      - HIGH_FAN_SPEED=45
      - CPU_TEMPERATURE_FOR_START_LINE_INTERPOLATION=42
      - CPU_TEMPERATURE_THRESHOLD=60
      - CHECK_INTERVAL=3
      - DISABLE_THIRD_PARTY_PCIE_CARD_DELL_DEFAULT_COOLING_RESPONSE=false

PS. In this image default values is changed to above.

@barnhill
Copy link

running this now ... will see how this works. This is a much needed change to allow the fan curve to scale on utilization

@barnhill
Copy link

barnhill commented Jul 24, 2023

With a CPU_TEMPERATURE_FOR_START_LINE_INTERPOLATION set at 40 it appears that its not ramping fans up till the cooler CPU is detected to pass this threshold and in this case its CPU1. Also for a 1 degree intrusion into the interpolation range it looks like its a bit aggressive by adding 12% to the fan speed with:

FAN_SPEED=10
HIGH_FAN_SPEED=45

Screenshot 2023-07-24 at 11 53 39 AM

@7Adrian
Copy link
Author

7Adrian commented Jul 24, 2023

You right, I setup increase fan speed only when CPU1 is above CPU_TEMPERATURE_FOR_START_LINE_INTERPOLATION, but if that happens program start to find higher temperature of CPUs and calculate fan speed of that that the reason why it adding 12%. I will fix it in a moment :)
And after fixing that I'll write how I can improve that functionality even more.

# Check if TEMP_WINDOW is grater than 0
if [ $TEMP_WINDOW -gt $FAN_VALUE_TO_ADD ];
then
FAN_VALUE_TO_ADD="$((FAN_WINDOW * TEMPERATURE_ABOVE_LOWER_THRESHOLD / TEMP_WINDOW))"
Copy link
Author

@7Adrian 7Adrian Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bash natively doesn't support floating numbers, so if first you calculate TEMPERATURE_ABOVE_LOWER_THRESHOLD / TEMP_WINDOW you will have always 0...

if $ENABLE_LINE_INTERPOLATION
then
CURRENT_FAN_SPEED=$FAN_SPEED
if [ $CPU1_TEMPERATURE -gt $CPU_TEMPERATURE_FOR_START_LINE_INTERPOLATION ];
Copy link

@barnhill barnhill Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs an OR

$CPU2_TEMPERATURE -gt $CPU_TEMPERATURE_FOR_START_LINE_INTERPOLATION

or else it will never trigger interpolation when CPU2 temp is over the threshold for interpolation which is what I was seeing.

@7Adrian
Copy link
Author

7Adrian commented Jul 24, 2023

I pushed fixed version, you can check now.

@7Adrian
Copy link
Author

7Adrian commented Jul 24, 2023

@barnhill The reason probably you had 12% when 1 degree change are:
Program started to count fan speed when CPU1 is above lower threshold. After that program find higher temperature in your case CPU2 and based on that calculate fan speed. You've got 7 degree difference which calculate to 12%.
Now should calculate that correctly.

if $ENABLE_LINE_INTERPOLATION
then
CURRENT_FAN_SPEED=$DECIMAL_FAN_SPEED
if [ $CPU1_TEMPERATURE -gt $CPU_TEMPERATURE_FOR_START_LINE_INTERPOLATION ] || [$IS_CPU2_TEMPERATURE_SENSOR_PRESENT] && [$CPU2_TEMPERATURE -gt $CPU_TEMPERATURE_FOR_START_LINE_INTERPOLATION];
Copy link

@barnhill barnhill Jul 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think order of operation is a problem here now because now CPU2 is over the threshold and CPU1 isnt and its not ramping fans up.

Threshold set to start interpolation set at 40

Inlet  CPU 1  CPU 2  Exhaust          Active fan speed profile 
26°C   40°C   47°C     38°C    Interpolated fan control profile (10%) 

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually getting this error in the script:

/Dell_iDRAC_fan_controller.sh: line 243: [true]: command not found

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[$IS_CPU2_TEMPERATURE_SENSOR_PRESENT = true] ?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this changed and will pull again in a few minutes to see if it fixes that issue.

@7Adrian
Copy link
Author

7Adrian commented Jul 24, 2023

I changed if order and pushed new docker image. You can check now.

@barnhill
Copy link

Looks like its working as expected so far

27°C 39°C 46°C 38°C Interpolated fan control profile (20%)

@barnhill
Copy link

Im not sure which sensor this script was original using for CPU temp but it is slightly hotter than what TrueNAS is reporting as the hottest core temp. So its apparently using different sensors. That is something outside this PR though.

@7Adrian
Copy link
Author

7Adrian commented Jul 24, 2023

Great! :)
I think about making better functionality such as:

  • adding PID controller
  • letting user create own profile, by putting pairs of (CPU_TEMP, FAN_SPEED) and interpolate between points
  • checking dell default response (when i.e. only using dell stuff) store data and then when you put non compatible disk program will recreate dell fan profile (when only using dell stuff) as much as possible
  • create PID settings and temperature tolerance for different mode like: ultra quiet, quiet, normal (only dell stuff) -> it can not only help with noise but can reduce power consumption in some cases
  • try to add more temperature sensors which will calculate fan speed not only based on CPU temp, but exhaust, disk etc.

With PID you can tweak values to achieve different behavior when temperature going to high:
Proportional - fan will change speed based on temperature different
Integral - fan will change speed based on how long temperature is above high value
Derivative - fan will change speed based on temperature behavior (going down or up)

Tell me what you think about that. It's overkill, or can be very useful?

Edit. And I thinking about add night time mode to set different max fan speed during day and night.

@barnhill
Copy link

I think proportional would be the most useful ... integral might let CPUs run hot too long.
Night mode sounds good but with proportional would it do anything if the fan setting is super low (10%)?

I think it might be over engineering a solution a bit. A complicated config would be a barrier to entry and we want people using it.

@7Adrian
Copy link
Author

7Adrian commented Jul 24, 2023

Im not sure which sensor this script was original using for CPU temp but it is slightly hotter than what TrueNAS is reporting as the hottest core temp. So its apparently using different sensors. That is something outside this PR though.

I think program use the same CPU temperature as you can find in IDRAC 8 -> Power / Thermal -> Temperatures -> CPUX Temp.
In my case TrueNAS show lower temperature too.

@barnhill
Copy link

I set my thresholds slightly higher to prevent fan ramp when the CPU temps are tolerable in truenas.

@barnhill
Copy link

Also I think first things first is we should get this merged with approval from the repo owner. Im going to keep an eye on it for a few days and see what happens.

@7Adrian
Copy link
Author

7Adrian commented Jul 24, 2023

I think proportional would be the most useful ... integral might let CPUs run hot too long. Night mode sounds good but with proportional would it do anything if the fan setting is super low (10%)?

I think it might be over engineering a solution a bit. A complicated config would be a barrier to entry and we want people using it.

With day/night I think about during day you can set something like starting increase fan speed when temp above 35 degree and going really fast when temperature is 10 degree before warning/critical point. But during night starting temperature going to 45 degree and fast fan speed when temperature is 5 degree before warning/critical point.

To simplify usage for entry users I thinking about create profiles (program should automatically read cpu warning/critical temperature) and based on that user will be able to only set environment variable (ultra_quiet, quiet). And program will calculate fan speed to stay at healthy level.
But that only speculation for now :)

@barnhill
Copy link

right now at 15% fans mine is super quiet in the closet. I dont know that I would even use night mode since I couldnt hear it anyway outside the closet its so quiet. I would like to figure out how to read local ipmi data instead of using the network if I can.

@7Adrian
Copy link
Author

7Adrian commented Jul 25, 2023

right now at 15% fans mine is super quiet in the closet. I dont know that I would even use night mode since I couldnt hear it anyway outside the closet its so quiet. I would like to figure out how to read local ipmi data instead of using the network if I can.

If you using Truenas Scale and want to use local ipmi you need to:

  1. Set container environment variable IDRAC_HOST to local
  2. Set storage host path volumes (Host Path: /dev/impi0, Mount Path: /dev/impi0)
  3. Enable Privileged Mode in Security Context

Environment
Storage
Security

@barnhill
Copy link

worked like a champ. Had to enable priv mode first then save ... then come back in and put the device in

@barnhill
Copy link

@tigerblue77 Just wondering if you are interested in this PR. It adds some capabilities to configure the fan curves more and I have verified @7Adrian 's work and it works as expected.

@kyle-blair
Copy link

I'm going to give this a test right now. My server is jumping from my set fan speed to the obnoxiously loud dell default. Really could use a ramp response like this PR provides. @7Adrian using a PID algorithm is an interesting idea. It might be overkill but it would definitely be fun...I probably wouldn't want to try to implement PID in bash though, that doesn't sound fun.

@kyle-blair
Copy link

Ran the below values on a Dell R720XD. I don't have much time on this server and I think it might've taken a bad fall in shipping, so the heatsinks might need to be reapplied.

ENABLE_LINE_INTERPOLATION=true
HIGH_FAN_SPEED=50
CPU_TEMPERATURE_FOR_START_LINE_INTERPOLATION=30
FAN_SPEED=10
CPU_TEMPERATURE_THRESHOLD=60
CHECK_INTERVAL=5
DISABLE_THIRD_PARTY_PCIE_CARD_DELL_DEFAULT_COOLING_RESPONSE=true

These values were not great. It seemed like the cpu temp was jumping all over the place and the Dell default kicked in a couple times. Output:

27-07-2023 03:36:02   26°C   50°C   54°C     38°C    Interpolated fan control profile (42%)                                             Disabled   -
27-07-2023 03:36:07   26°C   50°C   54°C     38°C    Interpolated fan control profile (42%)                                             Disabled   -
27-07-2023 03:36:15   26°C   50°C   54°C     38°C    Interpolated fan control profile (42%)                                             Disabled   -
27-07-2023 03:36:18   26°C   51°C   55°C     38°C    Interpolated fan control profile (43%)                                             Disabled   -
27-07-2023 03:36:22   26°C   50°C   55°C     38°C    Interpolated fan control profile (43%)                                             Disabled   -
27-07-2023 03:36:27   26°C   50°C   54°C     38°C    Interpolated fan control profile (42%)                                             Disabled   -
27-07-2023 03:36:32   26°C   49°C   53°C     38°C    Interpolated fan control profile (40%)                                             Disabled   -
27-07-2023 03:36:40   26°C   50°C   54°C     38°C    Interpolated fan control profile (42%)                                             Disabled   -
                     ------- Temperatures -------
    Date & time      Inlet  CPU 1  CPU 2  Exhaust          Active fan speed profile          Third-party PCIe card Dell default cooling response  Comment
27-07-2023 03:36:44   26°C   50°C   55°C     38°C    Interpolated fan control profile (43%)                                             Disabled   -
27-07-2023 03:36:48   26°C   53°C   60°C     38°C    Interpolated fan control profile (50%)                                             Disabled   -
27-07-2023 03:36:52   26°C   53°C   58°C     38°C    Interpolated fan control profile (47%)                                             Disabled   -
27-07-2023 03:36:58   26°C   50°C   55°C     38°C    Interpolated fan control profile (43%)                                             Disabled   -
27-07-2023 03:37:03   26°C   55°C   57°C     38°C    Interpolated fan control profile (46%)                                             Disabled   -
27-07-2023 03:37:12   26°C   52°C   61°C     38°C  Dell default dynamic fan control profile                                             Disabled   -
27-07-2023 03:37:14   26°C   52°C   56°C     38°C    Interpolated fan control profile (44%)                                             Disabled   -
27-07-2023 03:37:19   26°C   50°C   54°C     38°C    Interpolated fan control profile (42%)                                             Disabled   -
27-07-2023 03:37:24   26°C   59°C   59°C     38°C    Interpolated fan control profile (48%)                                             Disabled   -
27-07-2023 03:37:29   26°C   56°C   61°C     38°C  Dell default dynamic fan control profile                                             Disabled   -
                     ------- Temperatures -------
    Date & time      Inlet  CPU 1  CPU 2  Exhaust          Active fan speed profile          Third-party PCIe card Dell default cooling response  Comment
27-07-2023 03:37:36   26°C   54°C   59°C     38°C    Interpolated fan control profile (48%)                                             Disabled   -
27-07-2023 03:37:39   26°C   59°C   62°C     38°C  Dell default dynamic fan control profile                                             Disabled   -

It seemed like something odd was going on with the sensor or the reading. You can see the cpu temps jump 9 degrees celsius in less than 10 seconds. I can't necessarily attribute that behavior to these changes since line interpolation seemed to perform correctly. Fan speeds went up as CPU temp went up. When the threshold was hit, the Dell default kicked in.

Then I tried these values which worked amazingly well:

ENABLE_LINE_INTERPOLATION=true
HIGH_FAN_SPEED=50
CPU_TEMPERATURE_FOR_START_LINE_INTERPOLATION=40
FAN_SPEED=10
CPU_TEMPERATURE_THRESHOLD=60
CHECK_INTERVAL=60
DISABLE_THIRD_PARTY_PCIE_CARD_DELL_DEFAULT_COOLING_RESPONSE=true
27-07-2023 03:51:55   25°C   52°C   55°C     37°C    Interpolated fan control profile (40%)                                             Disabled   -
27-07-2023 03:52:54   25°C   52°C   60°C     37°C    Interpolated fan control profile (50%)                                             Disabled   -
27-07-2023 03:53:54   25°C   50°C   55°C     37°C    Interpolated fan control profile (40%)                                             Disabled   -
27-07-2023 03:54:54   25°C   48°C   53°C     37°C    Interpolated fan control profile (36%)                                             Disabled   -
27-07-2023 03:55:54   25°C   48°C   52°C     37°C    Interpolated fan control profile (34%)                                             Disabled   -
27-07-2023 03:56:54   25°C   48°C   52°C     37°C    Interpolated fan control profile (34%)                                             Disabled   -
27-07-2023 03:57:54   25°C   48°C   52°C     37°C    Interpolated fan control profile (34%)                                             Disabled   -
27-07-2023 03:58:54   25°C   48°C   52°C     37°C    Interpolated fan control profile (34%)                                             Disabled   -
27-07-2023 03:59:55   25°C   48°C   51°C     37°C    Interpolated fan control profile (32%)                                             Disabled   -
                     ------- Temperatures -------
    Date & time      Inlet  CPU 1  CPU 2  Exhaust          Active fan speed profile          Third-party PCIe card Dell default cooling response  Comment
27-07-2023 04:00:54   25°C   48°C   52°C     37°C    Interpolated fan control profile (34%)                                             Disabled   -
27-07-2023 04:01:54   25°C   50°C   55°C     37°C    Interpolated fan control profile (40%)                                             Disabled   -
27-07-2023 04:02:57   25°C   57°C   60°C     37°C    Interpolated fan control profile (50%)                                             Disabled   -

It seems like the algorithm is sensitive to the CPU_TEMPERATURE_FOR_START_LINE_INTERPOLATION value. CPU temps are much more stable and so are fan speeds. I plan to use this over the default static single fan speed. I'll leave it running and maybe try to do some stress tests.

Haven't had a chance to look at the code yet, been looking at code all day already.

May have spoken too soon, as I was writing this the CPU temp spiked again. I think there's either something wrong with my CPUs or with the temp readings. Let me revert back to the original docker image and see if the temperature readings are stable.

@kyle-blair
Copy link

kyle-blair commented Jul 27, 2023

Yeah, my CPU temps are just jumpy for some reason. I reverted to the original docker image/settings and same behavior. Also verified the same temps in the idrac browser interface. Looks like I'll be redoing my CPU heatsinks.

Edit:
Turns out my computer had started a backup to TrueNAS so that virtual machine was jumping between 20%-100% CPU utilization across 4 cores which likely contributed to the jumpy readings. Sorry, just getting my first homelab set up and getting used to things. I think because the fan response is jumpy under load PID just might be the right answer (unless it's a heatsink issue).

@7Adrian
Copy link
Author

7Adrian commented Jul 27, 2023

@kyle-blair what's your CPU TDP? In your case I think using new high grade thermal paste can help, because your exhaust temperature doesn't change much under different fan speed, so it looks like air doesn't get temperature from CPU.
In your case (for now) I probably will experiment with something like:

ENABLE_LINE_INTERPOLATION=true
HIGH_FAN_SPEED=60
CPU_TEMPERATURE_FOR_START_LINE_INTERPOLATION=35
FAN_SPEED=15
CPU_TEMPERATURE_THRESHOLD=60
CHECK_INTERVAL=1

Short interval time should safe your CPU from being overheat, and you definitely need higher HIGH_FAN_SPEED if you got Dell fan control from time to time.

PID controller I mentioned above I'm planning to start new project from scratch in mostly C++.

@tigerblue77 tigerblue77 force-pushed the master branch 2 times, most recently from 0568f14 to ae625ab Compare December 18, 2024 20:55
@tigerblue77
Copy link
Owner

tigerblue77 commented Dec 18, 2024

@7Adrian I'm not a lawyer, but from my point of view it only concerns the application (here the Docker container) and its use. In concrete terms, I don't want the work of the people who participate in this repository to be taken over, copied, repackaged or reused in any way whatsoever for profit. For my part, As far as I'm concerned, I won't rule out adding a donation link to the README.md file in the future, but I want this tool to remain free 😊

I stop my work here for today, here are a few notes :

  • I rebased a few times
  • I took out some code which was not linked to the feature to make reviews easier (some of the code I already added to the master branch and another part I put in a branch of your fork)
  • I still need to :
    • review the main functions
    • check that we don't do any useless hex/dec conversions by passing the wrong variable to a function
  • then I will merge
  • then I'd like to :
    • improve the code structure, it gets hard to understand I think. Maybe using more functions ? Maybe using a UI_function file for all printf functions and another function file for "backend" let's say. Splitting basic mode and interpolation mode into two script files doesn't seem to be a good idea as long as they share most of the code.
    • Add 4 CPU support for buddies like @ctark
    • Add test units, based on the work you started and I put aside in a branch of your fork

Any help is welcome ! Try to split in different branches/PR when possible :)
See you !

@jcastro
Copy link

jcastro commented Dec 28, 2024

hi @tigerblue77 I'm going to get an R740 as a lease to create another episode of my Superserver series, and I just wanted to ask you if you want me to test something. I know iDrac 9 up to the .30 version is okay, but I thought I should ask just in case you need testing or feedback! Thanks for your great work; I use it every day <3

@tigerblue77
Copy link
Owner

Hello @jcastro,
Thanks a lot for your message, videos and test propositions. I'd be happy if some of you could confirm that my changes to @7Adrian's fork code didn't break anything. I can't test on my side actually as I'm not at home for holidays and I'm working on a lot of non-IT stuff (new position + leaving current one, signature of several real estate investments launched in the last 6 months, etc...).
For future posts not directly related to this PR, please open a dedicated "discussion" or "issue" 😊
I wish you all good times with your relatives 🎄🎅🎆

@jcastro
Copy link

jcastro commented Dec 29, 2024

Hello @jcastro, Thanks a lot for your message, videos and test propositions. I'd be happy if some of you could confirm that my changes to @7Adrian's fork code didn't break anything. I can't test on my side actually as I'm not at home for holidays and I'm working on a lot of non-IT stuff (new position + leaving current one, signature of several real estate investments launched in the last 6 months, etc...). For future posts not directly related to this PR, please open a dedicated "discussion" or "issue" 😊 I wish you all good times with your relatives 🎄🎅🎆

For sure! I'm just a bit lost on what's the docker image I need to test from @7Adrian's fork? thanks!

@7Adrian
Copy link
Author

7Adrian commented Dec 29, 2024

I'm going to test the changes, and I'll publish image to the Docker repository with different tag once I've finished my tests.

Edit:
@tigerblue77 there are breaking changes: "/!\ Your server isn't a Dell product. Exiting.". I will try to fix it.

@tigerblue77
Copy link
Owner

tigerblue77 commented Dec 29, 2024

@7Adrian oops, sorry for that...
I'm free right now if you want you can contact me on Discord to work on this (tigerblue77#XXXX)

@tigerblue77 tigerblue77 force-pushed the master branch 4 times, most recently from fa62b41 to 931c80c Compare December 29, 2024 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2nd Threshold?