-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Linkstation LS-WVL is unreachable when booted with daily images #17
Comments
I should add the following observations. If Debian Stretch is installed on a Linkstation using images from http://ftp.debian.org/debian/dists/stable/main/installer-armel/current/images/kirkwood/network-console/buffalo/ls-wvl/ and then dist-upgraded to Buster, the Linkstation becomes unreachable after a reboot. After the upgrade but before the reboot, I edited /etc/rsyslog.conf to make sure that I get kernel logs written into /var/log/dmesg.log so that the logs can be analysed if the Linkstation does not come online successfully. Well, the Linkstation did not come online successfully. At that point, I powered off the Linkstation, took out the disk, connected it to my workstation and examined the contents of /var/log/. No log files were written during reboot after the upgrade. The only file in /var/log/ that appeared to had been touched on reboot was /var/log/wtmp, that's it. It appears that the current Debian version in trunk has some issue that bricks the Linkstation on reboot. The last Debian kernel I was able to boot my Linkstation into successfully was 4.14.0-3 (circa February 2018). Note to all current users of OpenLinkstation: if you installed Debian from daily images per official instructions in this Git repo, do NOT upgrade your Debian to the current version in the trunk, kernel 4.16, as it will likely brick your Linkstation. At least until the problem is resolved. |
@val-kulkov thanks for your report! I tried upgrade my stretch box to use latest 4.16 kernel from backports, and it boot fine.
I guess it just d-i image issue. |
@rogers0 : thank you very much for looking into it! I hope it is just the d-i image issue as you indicated. There may be something else however. I tried running the same upgrade using a very old HDD and, unexpectedly, the upgrade successfully completed! Then I tried the other HDDs and got the same result as before: Hitachi HDS725050KLA360, 500 GB (Feb 2006): success Could it be some race condition on the first boot after upgrade? I look forward to testing your update. Hopefully it is just the d-i image issue. Thank you! |
Just checked daily images to see if the problem is still there. "sudo parted /dev/sdi" and then:
After that:
Then I insert the prepared disk into the LinkStation and turn the LinkStation on. The LED lights come on green on the network hub, indicating a gigabit link. However, the Linkstation does not even respond to ping requests:
I am going to leave it powered on for a few hours and see if anything changes. If it starts responding to ping requests, I'll post an update here. |
Is there any news? I have that problem too |
Greetings from a thankful user! I experience a relatively similar issue with Debian-Installer on Daily D-I builds last week. In my case, I wanted to run Stretch, which it is! Flawlessly if I may add! 👍 I presume this might be related to something on the Please dont hesitate on contacting me for any kind of tests you might want to run, @rogers0. I'm running a Kirkwood LS-WXL. |
I can confirm that updating both latest Debian's sid 4.18.0-2-marvell kernel and 3.95 flash-kernel appears to be indeed working as expected. Maybe there's an issue with the partitioning? I'll try to build an UART interface to see what U-boot says about D-I installation.
Little typo, I'm in fact running a LS-WVL not LS-WXL, sorry about that. |
I installed Debian to NAS Buffalo successfully
|
Confirming success installing the latest daily Debian image on LS-WVL: 4.18.0-2-marvell #1 Debian 4.18.10-2 (2018-11-03) armv5tel GNU/Linux, but with one very important "BUT": it takes over 40 minutes for the LinkStation to complete booting. Here is the output of "dmesg | tail -12":
It may be that the LinkStation performs fsck on every reboot now. I have no way to confirm that because I cannot set up a serial connection to the LinkStation's serial interface. dmesg output does not provide the necessary details. I have two ext3-formatted WD Red 4TB disks assembled as RAID1. See the details of my partition setup earlier in this thread. The size of my data partition is almost 4TB. If fsck runs on every reboot, then this explains the delay. I should make it clear that the boot delay occurs not only on the first boot after installation, but on every subsequent reboot. Since my LinkStation was not coming online for a long time after Debian installation, it looked to me like the installs were failing. It is only when I got distracted and left the LinkStation powered on after a Debian install for a couple of hours I discovered that the LinkStation was finally online. It is possible that all my "unsuccessful" attempts to install Debian were in fact successful. I should note that the RAID setup probably makes no difference on the boot delay. When trying to install Debian on the system with just one WD Red 4TB, the installation process seemed to have completed successfully but then I did not see the LinkStation coming online for about 5 minutes. Which I took as the sign that the installation had failed. What causes the boot delay is a mystery to me. Again, without a serial connection I can't do much to investigate it. Any suggestions will be much appreciated. |
My hunch about fsck was probably wrong. When I ssh into LS-WVL as root and run fsck on the unmounted 3.6GB data partition, it takes about 17 seconds for fsck to complete. So something else must be causing the the 40+ minutes boot delay. |
When boot with installer@IP |
@nhhuayt : what do you mean by "anything on disk that prevent boot"? A broken MBR? A broken partition? |
@val-kulkov finally I tried the installer image on my LS-WVL box.
I can ping, and ssh to the box.
If shouldn't have broken file-system if you use a new HDD or re-partition the whole disk. |
@rogers0 : yes, as of 2018-11-16 (maybe earlier, I did not try earlier installer images), ssh installer@ works again. Please see my 2018-11-16 post above. The problem now is that it takes more than 40 minutes to reboot a LS-WVL. I wrote a script pinging LS-WVL every minute while it reboots to see how much time it takes for my LS-WVL to go online. My LS-WVL starts responding to ping requests after about 43 minutes. This behaviour is consistent. I wish I had access to the serial console to see what is going on. 'dmesg' does not provide useful information. |
@val-kulkov I think serial console is not easy for LS-WVL. But you can try netconsole, which you can get the dmesg output over the network. You need to:
|
@val-kulkov I find I already described netconsole stuff on the slides, page 18,19: |
@rogers0 : thank you for reminding about netconsole. I considered using it when I reported the issue, but back then I was unable to login to LS-WVL at all and therefore I could not edit /etc/initramfs-tools/modules and do With 4.18.0-3-marvell # 1 Debian 4.18.20-2 (2018-11-23), I can log in to LS-WVL and enable netconsole, but I am not getting any output from it at all. Apparently, netconsole is loaded too early when eth0 does not yet exist. See line 164 in dmesg output. The dmesg output shows that eth0 is created on the 9th second after boot: line 213. However, the Ethernet link is not up until about 44 minutes from the boot time. During this time, the LED light for the link with LS-WVL on the network switch is off and LS-WVL does not respond to ping requests. I wonder if there is a way to make uboot more verbose and capture its output into some log file during the boot process? Alternatively, I wonder if it is possible to delay netconsole loading by about 10 seconds? |
It turns out that clock reset on reboot is the root cause of the boot delay. Since the LinkStation has no hardware RTC, the system clock is set to Unix epoch on reboot but then early in the boot process systemd advances the clock to the OS build time. Here is the record of it from dmesg:
In my case, the system build time is 'Dec 21 13:53'. Later in the boot process when systemd gets to perform file system checks, the system time is weeks in the past because the NTP time synchronization is performed after the file system checks have been completed. This trips fsck because it checks the drive's superblock last mount time and last write time and finds that both are in the future. fsck sees this as a problem with the file system and therefore decides to perform a full (forced) file system check that takes much more time than a regular file system check. In my case, the full file system check takes 11 minutes for a 1 TB drive and 44 minutes for a 4 TB drive. A regular fsck of the 1 TB drive takes about 6 seconds.
Installing "fake-hwclock" package seems to solve this problem. How fake-hwclock does it is a bit of a mystery to me, because fake-hwclock.service appears to be dependent on local-fs.target, which involves running fsck: see the output of The problem that was initially reported here where LinkStation was unreachable for hours could be related to fsck, too. Debian Bug report log #878843 described a problem that was somewhat similar to the one initially reported here. |
I have the same issue with my LS-WVL - a bootloop after first step - debootstrap. I use Wireshark as netcosole and see that Linux Debian start to boot, then after Læv ÚEVÁ@áîÀ¨�À¨� |
@val-kulkov @nwizard74 @rogers0 @nhhuayt I believe the issue with the system taking so long to come up is due to a lack of entropy for he RNG. For security reasons Debian has been disabling CONFIG_RANDOM_TRUST_CPU in it's kernels since the same time you started seeing the issue. I've resolved this for my devices by installing haveged This issue is discussed in detail here: You can read more about haveged here: |
This has also been the cause of the installer images failing for me with armhf devices. when the netconsole installer starts it tries to generate a private key for the ssh server, fails because of lack of entropy and then hangs forever. For the armhf installer I added a call to rngd from the rng-tools package (easier to embed than haveged) to generate aditional entropy before the installer tries to start sshd. I bet the same thing is happening with the armel installer. |
@1000001101000 : your observations may explain why LinkStation is unreachable by ssh for a long time or forever. But the network subsystem should not depend on sshd. Therefore LinkStation should pick up an ip address from a DHCP server and respond to ping requests, correct? |
I just uninstalled haveged on my ls421de and rebooted. It's behaving more or less as you expect, it grabbed an ip address via dhcp and responds to ping but still hasn't finished starting sshd after 15+ minutes. I tried booting a installer image without the rngd modification and it never came online. It's possible I messed something up but it matches my memory of when I was originally testing this. I believe the script to generate the private key for sshd runs before network interfaces are started which results in the device never joining the network. If I get some time in the next couple of days I'll pull out my ls-wvl and try this out. If I get the same failure I'll try adding the same rngd modification to the armel installer and see if that solves it. |
I just loaded the daily installer image on an ls-wxl and it booted without issue. It failed to load most of the modules needed for detecting disks which is rather frustrating, unfortunately the installer images for testing often have issues like that. it seems like the issue I experienced when first testing the installer under 4.19 don't seem to apply to the current one somehow. |
Attention: @rogers0
Booting Linkstation LS-WVL with daily images from https://d-i.debian.org/daily-images/armel/daily/kirkwood/network-console/buffalo/ in its first ext3-formatted partition (/dev/sda1) does bring the Linkstation to a point where one can connect to it using "ssh installer@linkstation" to continue Debian installation.
The Linkstation does not respond to ping requests. The LEDs on the network switch to which the Linkstation is connected are glowing green and therefore a gigabit connection appears to have been established. Nonetheless, the Linkstation is unreachable.
Specifically, these images were used in the tests described above:
I repeated the process with daily images for May 1, 2018, going back in time as far as I could: https://d-i.debian.org/daily-images/armel/20180501-01:11/kirkwood/network-console/buffalo/ls-wvl/ The result was the same. Linkstation did not respond to ping requests.
The hard disk I used in the tests was WD Red 4 TB, with the first partition formatted as ext3, size: 1 GB.
I do not have the means to establish a serial connection to the Linkstation console. I am not sure what other tests I can run to investigate this issue further.
Notably, the same Linkstation successfully boots with initrd.buffalo and uImage.buffalo from http://ftp.debian.org/debian/dists/stable/main/installer-armel/current/images/kirkwood/network-console/buffalo/ls-wvl/ (Debian Stretch).
The text was updated successfully, but these errors were encountered: