Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Power On on MSI z690 ddr5 not powering on reliably #603

Open
philipandag opened this issue Nov 21, 2024 · 2 comments
Open

Power On on MSI z690 ddr5 not powering on reliably #603

philipandag opened this issue Nov 21, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@philipandag
Copy link
Contributor

philipandag commented Nov 21, 2024

Device

MSI z690 ddr5

RTE version

OSFV version

540-reset-to-defaults-restore-serial

Affected component(s) or functionality

dasharo-compatibility/cpu-core-count.robot: CCC test case. Most probably any other case that reboots the platform multiple times too.

Brief summary

The platform sometimes simply does not power on after the keyword

How reproducible

~50% on single tests. The whole CCC suite will surely experience at least one fail

How to reproduce

Run the CCC test suite

Expected behavior

The platform should always power on after the Power On KW as multiple tests rely on that

Actual behavior

It sometimes simply does not power on. In the failed test cases the platform is turned off after the Power On keyword, which was determined with the lack of video output on PiKVM. Powering it on manually using osfv_cli rte pwr on after the Power On keyword fails to do so results in the platform booting normally and the tests continuing.

Link to screenshots or logs

Logs from two runs of the CCC test:
cpu-cores-count.robot_log.zip

Additional context

I have no clue why it wouldn't work looking at the implementation of the keyword in msi-z690-common. Maybe the sleep times are just too short?

Power On
    [Documentation]    Keyword clears telnet buffer and sets Device Under Test
    ...    into Power On state using RTE OC buffers. Implementation
    ...    must be compatible with the theory of operation of a
    ...    specific platform.
    Restore Initial DUT Connection Method
    IF    '${DUT_CONNECTION_METHOD}' == 'SSH'    RETURN
    Sleep    2s
    Rte Power Off    ${6}
    Sleep    5s
    # read the old output
    Telnet.Read
    Rte Power On

Solutions you've tried

No response

@philipandag philipandag added the bug Something isn't working label Nov 21, 2024
@philipandag
Copy link
Contributor Author

I have re-run the suite increasing the timeouts in the Power On significantly (10s and 15s) and the whole suite has passed. Solving the issue is just a matter of choosing a less overshot sleep durations.
cpu-cores-count.robot_log.zip

@miczyg1
Copy link
Contributor

miczyg1 commented Nov 22, 2024

Looks related/similar: #578

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants