Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#44 Line interpolation for fan speed. #60

Draft
wants to merge 46 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
46 commits
Select commit Hold shift + click to select a range
fb3e72e
Adding optional line interpolation to dynamically control fan speed.
7Adrian Dec 18, 2024
e975b11
Update README.md
7Adrian Dec 18, 2024
8a93f90
Change checking order.
7Adrian Dec 18, 2024
b2389e2
Minor improvements and variables renaming
tigerblue77 Dec 18, 2024
6c82e89
Minor fixes (thanks @CommanderBubble)
tigerblue77 Dec 18, 2024
534b478
Removed useless quotes
tigerblue77 Dec 18, 2024
217902e
Set some variables readonly
tigerblue77 Dec 18, 2024
81f3ae3
Improved fan speed interpolation check
tigerblue77 Dec 18, 2024
01a9508
Improved code structure
tigerblue77 Dec 18, 2024
ee390e1
Improved code structure
tigerblue77 Dec 18, 2024
6d133a4
Renamed apply_fan_speed_interpolation_fan_control_profile function
tigerblue77 Dec 18, 2024
3b794f7
temp
tigerblue77 Dec 18, 2024
1cd2832
Adding options for 14gen Dell servers via env var
Dec 18, 2024
fcc8bfb
Cant start env variable with numbers
Dec 18, 2024
752cfa0
Renamed variable
tigerblue77 Dec 18, 2024
316ed48
Added automatic Gen 14 + check
tigerblue77 Dec 18, 2024
78e1b50
Minor code structure improvements
tigerblue77 Dec 18, 2024
4d9594d
Adding optional line interpolation to dynamically control fan speed.
7Adrian Dec 18, 2024
404b0f9
Merge. Minor improvements and variables renaming
tigerblue77 Dec 18, 2024
a32f5e4
Minor fixes (thanks @CommanderBubble)
tigerblue77 Dec 18, 2024
849414e
Merge. Improved code structure
tigerblue77 Dec 18, 2024
86cca58
Add log for gracefully exit.
Dec 18, 2024
1d69e28
Remove accidental duplicate of convert_decimal_value_to_hexadecimal
Dec 18, 2024
e016981
Create new function `apply_fan_control_to_specified_value`. Function …
Dec 18, 2024
2ea2650
Refactor - remove commented code.
Dec 18, 2024
8c5cf8b
Move line interpolation calculations to functions.sh. Add print funct…
Dec 18, 2024
154bc08
Refactor - remove commented code.
Dec 18, 2024
98a0895
Adding "function" before name "calculate_interpolated_fan_speed" for …
Dec 18, 2024
f539469
Add clamping for calculate_interpolated_fan_speed.
Dec 18, 2024
cc4b91a
Minor improvements
tigerblue77 Dec 18, 2024
939c0f2
Added max function
tigerblue77 Dec 18, 2024
a39e9b1
Variable renaming
tigerblue77 Dec 18, 2024
5847015
Code refactor and improvement
tigerblue77 Dec 18, 2024
3894c16
Created convert_hexadecimal_value_to_decimal function
tigerblue77 Dec 18, 2024
8e684c4
Minor improvements
tigerblue77 Dec 18, 2024
bc401ed
Added pre check
tigerblue77 Dec 29, 2024
e5c1793
minor improvement
tigerblue77 Dec 29, 2024
96adff1
Variable renaming
tigerblue77 Dec 29, 2024
2a430ca
v1
tigerblue77 Dec 29, 2024
06dd51e
v2
tigerblue77 Dec 29, 2024
0ae1f4d
v3
tigerblue77 Dec 29, 2024
14571bf
v4
tigerblue77 Dec 29, 2024
dda1646
v5
tigerblue77 Dec 29, 2024
f4ebd60
v6
tigerblue77 Dec 29, 2024
5de4b30
v7
tigerblue77 Dec 29, 2024
f204fd9
temp
tigerblue77 Dec 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 71 additions & 19 deletions Dell_iDRAC_fan_controller.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,38 @@ trap 'graceful_exit' SIGINT SIGQUIT SIGTERM

# Check if FAN_SPEED variable is in hexadecimal format. If not, convert it to hexadecimal
if [[ $FAN_SPEED == 0x* ]]; then
readonly DECIMAL_FAN_SPEED=$(printf '%d' $FAN_SPEED)
readonly HEXADECIMAL_FAN_SPEED=$FAN_SPEED
readonly DECIMAL_LOW_FAN_SPEED_OBJECTIVE=$(convert_hexadecimal_value_to_decimal "$FAN_SPEED")
# Unused
# readonly HEXADECIMAL_FAN_SPEED=$FAN_SPEED
else
readonly DECIMAL_FAN_SPEED=$FAN_SPEED
readonly HEXADECIMAL_FAN_SPEED=$(convert_decimal_value_to_hexadecimal $FAN_SPEED)
readonly DECIMAL_LOW_FAN_SPEED_OBJECTIVE=$FAN_SPEED
# Unused
# readonly HEXADECIMAL_FAN_SPEED=$(convert_decimal_value_to_hexadecimal "$FAN_SPEED")
fi

# Check if fan speed interpolation is enabled
if [ -z "$HIGH_FAN_SPEED" ] || [ -z "$CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION" ] || [ "$CPU_TEMPERATURE_THRESHOLD" -eq "$CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION" ]; then
readonly FAN_SPEED_INTERPOLATION_ENABLED=false

# We define these variables to the same values than user fan control profile
readonly HIGH_FAN_SPEED=$FAN_SPEED
readonly CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION=$CPU_TEMPERATURE_THRESHOLD
elif [[ "$FAN_SPEED" -gt "$HIGH_FAN_SPEED" ]]; then
echo 'Error : $FAN_SPEED have to be less or equal to $HIGH_FAN_SPEED. Exiting.'
exit 1
else
readonly FAN_SPEED_INTERPOLATION_ENABLED=true
fi

# Check if HIGH_FAN_SPEED variable is in hexadecimal format. If not, convert it to hexadecimal
if [[ $HIGH_FAN_SPEED == 0x* ]]; then
readonly DECIMAL_HIGH_FAN_SPEED_OBJECTIVE=$(convert_hexadecimal_value_to_decimal "$HIGH_FAN_SPEED")
# Unused
# readonly HEXADECIMAL_HIGH_FAN_SPEED=$HIGH_FAN_SPEED
else
readonly DECIMAL_HIGH_FAN_SPEED_OBJECTIVE=$HIGH_FAN_SPEED
# Unused
# readonly HEXADECIMAL_HIGH_FAN_SPEED=$(convert_decimal_value_to_hexadecimal "$HIGH_FAN_SPEED")
fi

# Check if the iDRAC host is set to 'local' or not then set the IDRAC_LOGIN_STRING accordingly
Expand All @@ -45,23 +72,34 @@ fi

# If server model is Gen 14 (*40) or newer
if [[ $SERVER_MODEL =~ .*[RT][[:space:]]?[0-9][4-9]0.* ]]; then
DELL_POWEREDGE_GEN_14_OR_NEWER=true
CPU1_TEMPERATURE_INDEX=2
CPU2_TEMPERATURE_INDEX=4
readonly DELL_POWEREDGE_GEN_14_OR_NEWER=true
readonly CPU1_TEMPERATURE_INDEX=2
readonly CPU2_TEMPERATURE_INDEX=4
else
DELL_POWEREDGE_GEN_14_OR_NEWER=false
CPU1_TEMPERATURE_INDEX=1
CPU2_TEMPERATURE_INDEX=2
readonly DELL_POWEREDGE_GEN_14_OR_NEWER=false
readonly CPU1_TEMPERATURE_INDEX=1
readonly CPU2_TEMPERATURE_INDEX=2
fi

# Log main informations
echo "Server model: $SERVER_MANUFACTURER $SERVER_MODEL"
echo "iDRAC/IPMI host: $IDRAC_HOST"

# Log the fan speed objective, CPU temperature threshold and check interval
echo "Fan speed objective: $DECIMAL_FAN_SPEED%"
echo "CPU temperature threshold: $CPU_TEMPERATURE_THRESHOLD°C"
# Log the check interval, fan speed objective and CPU temperature threshold
echo "Check interval: ${CHECK_INTERVAL}s"
echo "Fan speed interpolation enabled: $FAN_SPEED_INTERPOLATION_ENABLED"
if $FAN_SPEED_INTERPOLATION_ENABLED; then
echo "Fan speed lower value: $DECIMAL_LOW_FAN_SPEED_OBJECTIVE%"
echo "Fan speed higher value: $DECIMAL_HIGH_FAN_SPEED_OBJECTIVE%"
echo "CPU lower temperature threshold: $CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION°C"
echo "CPU higher temperature threshold: $CPU_TEMPERATURE_THRESHOLD°C"
echo ""
# Print interpolated fan speeds for demonstration
print_interpolated_fan_speeds "$CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION" "$CPU_TEMPERATURE_THRESHOLD" "$DECIMAL_LOW_FAN_SPEED_OBJECTIVE" "$DECIMAL_HIGH_FAN_SPEED_OBJECTIVE"
else
echo "Fan speed objective: $DECIMAL_LOW_FAN_SPEED_OBJECTIVE%"
echo "CPU temperature threshold: $CPU_TEMPERATURE_THRESHOLD°C"
fi
echo ""

# Define the interval for printing
Expand All @@ -73,7 +111,7 @@ IS_DELL_FAN_CONTROL_PROFILE_APPLIED=true
# Check present sensors
IS_EXHAUST_TEMPERATURE_SENSOR_PRESENT=true
IS_CPU2_TEMPERATURE_SENSOR_PRESENT=true
retrieve_temperatures $IS_EXHAUST_TEMPERATURE_SENSOR_PRESENT $IS_CPU2_TEMPERATURE_SENSOR_PRESENT
retrieve_temperatures "$IS_EXHAUST_TEMPERATURE_SENSOR_PRESENT" "$IS_CPU2_TEMPERATURE_SENSOR_PRESENT"
if [ -z "$EXHAUST_TEMPERATURE" ]; then
echo "No exhaust temperature sensor detected."
IS_EXHAUST_TEMPERATURE_SENSOR_PRESENT=false
Expand All @@ -93,35 +131,49 @@ while true; do
sleep $CHECK_INTERVAL &
SLEEP_PROCESS_PID=$!

retrieve_temperatures $IS_EXHAUST_TEMPERATURE_SENSOR_PRESENT $IS_CPU2_TEMPERATURE_SENSOR_PRESENT
retrieve_temperatures "$IS_EXHAUST_TEMPERATURE_SENSOR_PRESENT" "$IS_CPU2_TEMPERATURE_SENSOR_PRESENT"

# Initialize a variable to store the comments displayed when the fan control profile changed
COMMENT=" -"
# Check if CPU 1 is overheating then apply Dell default dynamic fan control profile if true
if CPU1_OVERHEAT; then
if CPU1_OVERHEATING; then
apply_Dell_fan_control_profile

if ! $IS_DELL_FAN_CONTROL_PROFILE_APPLIED; then
IS_DELL_FAN_CONTROL_PROFILE_APPLIED=true

# If CPU 2 temperature sensor is present, check if it is overheating too.
# Do not apply Dell default dynamic fan control profile as it has already been applied before
if $IS_CPU2_TEMPERATURE_SENSOR_PRESENT && CPU2_OVERHEAT; then
if $IS_CPU2_TEMPERATURE_SENSOR_PRESENT && CPU2_OVERHEATING; then
COMMENT="CPU 1 and CPU 2 temperatures are too high, Dell default dynamic fan control profile applied for safety"
else
COMMENT="CPU 1 temperature is too high, Dell default dynamic fan control profile applied for safety"
fi
fi
# If CPU 2 temperature sensor is present, check if it is overheating then apply Dell default dynamic fan control profile if true
elif $IS_CPU2_TEMPERATURE_SENSOR_PRESENT && CPU2_OVERHEAT; then
elif $IS_CPU2_TEMPERATURE_SENSOR_PRESENT && CPU2_OVERHEATING; then
apply_Dell_fan_control_profile

if ! $IS_DELL_FAN_CONTROL_PROFILE_APPLIED; then
IS_DELL_FAN_CONTROL_PROFILE_APPLIED=true
COMMENT="CPU 2 temperature is too high, Dell default dynamic fan control profile applied for safety"
fi
elif CPU1_HEATING || $IS_CPU2_TEMPERATURE_SENSOR_PRESENT && CPU2_HEATING; then
HIGHEST_CPU_TEMPERATURE=$CPU1_TEMPERATURE
if $IS_CPU2_TEMPERATURE_SENSOR_PRESENT; then
HIGHEST_CPU_TEMPERATURE=$(max $CPU1_TEMPERATURE $CPU2_TEMPERATURE)
fi

# F1 - lower fan speed
# F2 - higher fan speed
# T_CPU - highest temperature of both CPUs (if only one exists that will be CPU1 temp value)
# T1 - lower temperature threshold
# T2 - higher temperature threshold
# Fan speed = F1 + ( ( F2 - F1 ) * ( T_CPU - T1 ) / ( T2 - T1 ) )
DECIMAL_FAN_SPEED_TO_APPLY=$((DECIMAL_LOW_FAN_SPEED_OBJECTIVE + ((DECIMAL_HIGH_FAN_SPEED_OBJECTIVE - DECIMAL_LOW_FAN_SPEED_OBJECTIVE) * ((HIGHEST_CPU_TEMPERATURE - CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION) / (CPU_TEMPERATURE_THRESHOLD - CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION))))
apply_user_fan_control_profile 2 $DECIMAL_FAN_SPEED_TO_APPLY
else
apply_user_fan_control_profile
apply_user_fan_control_profile 1 $DECIMAL_LOW_FAN_SPEED_OBJECTIVE

# Check if user fan control profile is applied then apply it if not
if $IS_DELL_FAN_CONTROL_PROFILE_APPLIED; then
Expand Down
4 changes: 3 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ ENV IDRAC_HOST local
# ENV IDRAC_USERNAME root
# ENV IDRAC_PASSWORD calvin
ENV FAN_SPEED 5
ENV CPU_TEMPERATURE_THRESHOLD 50
ENV HIGH_FAN_SPEED 40
ENV CPU_TEMPERATURE_THRESHOLD 60
ENV CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION 50
ENV CHECK_INTERVAL 60
ENV DISABLE_THIRD_PARTY_PCIE_CARD_DELL_DEFAULT_COOLING_RESPONSE false
ENV KEEP_THIRD_PARTY_PCIE_CARD_COOLING_RESPONSE_STATE_ON_EXIT false
Expand Down
38 changes: 35 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,8 @@ docker run -d \
-e CPU_TEMPERATURE_THRESHOLD=<decimal temperature threshold> \
-e CHECK_INTERVAL=<seconds between each check> \
-e DISABLE_THIRD_PARTY_PCIE_CARD_DELL_DEFAULT_COOLING_RESPONSE=<true or false> \
-e CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION=<decimal temperature lower threshold> \
-e HIGH_FAN_SPEED=<decimal or hexadecimal fan speed> \
-e KEEP_THIRD_PARTY_PCIE_CARD_COOLING_RESPONSE_STATE_ON_EXIT=<true or false> \
--device=/dev/ipmi0:/dev/ipmi0:rw \
tigerblue77/dell_idrac_fan_controller:latest
Expand All @@ -102,6 +104,8 @@ docker run -d \
-e CPU_TEMPERATURE_THRESHOLD=<decimal temperature threshold> \
-e CHECK_INTERVAL=<seconds between each check> \
-e DISABLE_THIRD_PARTY_PCIE_CARD_DELL_DEFAULT_COOLING_RESPONSE=<true or false> \
-e CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION=<decimal temperature lower threshold> \
-e HIGH_FAN_SPEED=<decimal or hexadecimal fan speed> \
-e KEEP_THIRD_PARTY_PCIE_CARD_COOLING_RESPONSE_STATE_ON_EXIT=<true or false> \
tigerblue77/dell_idrac_fan_controller:latest
```
Expand All @@ -124,6 +128,8 @@ services:
- CPU_TEMPERATURE_THRESHOLD=<decimal temperature threshold>
- CHECK_INTERVAL=<seconds between each check>
- DISABLE_THIRD_PARTY_PCIE_CARD_DELL_DEFAULT_COOLING_RESPONSE=<true or false>
- CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION=<decimal temperature lower threshold>
- HIGH_FAN_SPEED=<decimal or hexadecimal fan speed>
- KEEP_THIRD_PARTY_PCIE_CARD_COOLING_RESPONSE_STATE_ON_EXIT=<true or false>
devices:
- /dev/ipmi0:/dev/ipmi0:rw
Expand All @@ -147,6 +153,8 @@ services:
- CPU_TEMPERATURE_THRESHOLD=<decimal temperature threshold>
- CHECK_INTERVAL=<seconds between each check>
- DISABLE_THIRD_PARTY_PCIE_CARD_DELL_DEFAULT_COOLING_RESPONSE=<true or false>
- CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION=<decimal temperature lower threshold>
- HIGH_FAN_SPEED=<decimal or hexadecimal fan speed when interpolation enabled>
- KEEP_THIRD_PARTY_PCIE_CARD_COOLING_RESPONSE_STATE_ON_EXIT=<true or false>
```

Expand All @@ -161,19 +169,43 @@ All parameters are optional as they have default values (including default iDRAC
- `IDRAC_USERNAME` parameter is only necessary if you're adressing a distant iDRAC. **Default** value is "root".
- `IDRAC_PASSWORD` parameter is only necessary if you're adressing a distant iDRAC. **Default** value is "calvin".
- `FAN_SPEED` parameter can be set as a decimal (from 0 to 100%) or hexadecimaladecimal value (from 0x00 to 0x64) you want to set the fans to. **Default** value is 5(%).
- `CPU_TEMPERATURE_THRESHOLD` parameter is the T°junction (junction temperature) threshold beyond which the Dell fan mode defined in your BIOS will become active again (to protect the server hardware against overheat). **Default** value is 50(°C).
- `CPU_TEMPERATURE_THRESHOLD` parameter is the T°junction (junction temperature) threshold beyond which the Dell fan mode defined in your BIOS will become active again (to protect the server hardware against overheat). **Default** value is 60(°C).
- `CHECK_INTERVAL` parameter is the time (in seconds) between each temperature check and potential profile change. **Default** value is 60(s).
- `DISABLE_THIRD_PARTY_PCIE_CARD_DELL_DEFAULT_COOLING_RESPONSE` parameter is a boolean that allows to disable third-party PCIe card Dell default cooling response. **Default** value is false.

If you want to enable fan speed interpolation, add the following parameters :
- `CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION` parameter enables fan speed interpolation once exceeded. Fan speed interpolation will increase your fan speed proportionally to **HIGH_FAN_SPEED** until **CPU_TEMPERATURE_THRESHOLD** is reached. This parameter must be less or equal to **CPU_TEMPERATURE_THRESHOLD**. **Default** value is 50(°C).
- `HIGH_FAN_SPEED` parameter is the fan speed that will be set at `CPU_TEMPERATURE_THRESHOLD` when interpolation mode is enabled. In other words, it defines maximum fan speed before swiching back to the Dell default dynamic fan control profile (see `CPU_TEMPERATURE_THRESHOLD` parameter). **Default** value is 40(%).

Example of how interpolation works:
- `FAN_SPEED` = 10
- `HIGH_FAN_SPEED` = 50
- `CPU_TEMPERATURE_THRESHOLD_FOR_FAN_SPEED_INTERPOLATION` = 30
- `CPU_TEMPERATURE_THRESHOLD` = 70

| CPU temperature | Fan speed |
| --------------- | ---------------------------------------- |
| 15 °C | 10 % |
| 30 °C | 10 % |
| 35 °C | 15 % |
| 50 °C | 30 % |
| 69 °C | 49 % |
| 70 °C | Dell default dynamic fan control profile |
| 80 °C | Dell default dynamic fan control profile |

When using fan speed interpolation, we recommend decreasing **CHECK_INTERVAL**, for example "3" (seconds), to avoid the noise nuisance associated with a sudden increase in fan speed.
- `KEEP_THIRD_PARTY_PCIE_CARD_COOLING_RESPONSE_STATE_ON_EXIT` parameter is a boolean that allows to keep the third-party PCIe card Dell default cooling response state upon exit. **Default** value is false, so that it resets the third-party PCIe card Dell default cooling response to Dell default.

<p align="right">(<a href="#top">back to top</a>)</p>

<!-- TROUBLESHOOTING -->
## Troubleshooting

If your server frequently switches back to the default Dell fan mode:
If your server frequently switches back to the default Dell default dynamic fan control profile:
1. Check `Tcase` (case temperature) of your CPU on Intel Ark website and then set `CPU_TEMPERATURE_THRESHOLD` to a slightly lower value. Example with my CPUs ([Intel Xeon E5-2630L v2](https://www.intel.com/content/www/us/en/products/sku/75791/intel-xeon-processor-e52630l-v2-15m-cache-2-40-ghz/specifications.html)) : Tcase = 63°C, I set `CPU_TEMPERATURE_THRESHOLD` to 60(°C).
2. If it's already good, adapt your `FAN_SPEED` value to increase the airflow and thus further decrease the temperature of your CPU(s)
2. If it's already good, either :
- adapt your `FAN_SPEED` value to increase the airflow and thus further decrease the temperature of your CPU(s)
- enable and experiment fan speed interpolation mode
3. If neither increasing the fan speed nor increasing the threshold solves your problem, then it may be time to replace your thermal paste

<p align="right">(<a href="#top">back to top</a>)</p>
Expand Down
Loading