Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Program freezes after approx 35 hours #4

Open
pinjan opened this issue Aug 7, 2013 · 8 comments
Open

Program freezes after approx 35 hours #4

pinjan opened this issue Aug 7, 2013 · 8 comments

Comments

@pinjan
Copy link

pinjan commented Aug 7, 2013

Program freezes after approx 35 hours. Log can be found:
http://www.filehostfree.com/?d=52027E8A1

@zeronickname
Copy link
Owner

Hmm.... really does not seem to be much in there. Everything's running along fine and then... nothing.
Was the raspberry pi responding? Nothing on terminal or dmesg or /var/log/syslog?

I've seen odd crashes when I first started working on this, and a new/beefier power supply fixed the problem. I have no idea if this is the same issue.

@pinjan
Copy link
Author

pinjan commented Aug 7, 2013

yes , weird. Raspberry was not responding. Needed to unplug power supply to reboot. var/log/syslog had following lines at same time :
Aug 7 11:56:23 raspberrypi kernel: [134045.316360] INFO: task kworker/u:2:10981 blocked for more than 120 seconds.
Aug 7 11:56:37 raspberrypi kernel: [134045.316384] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 7 11:56:47 raspberrypi kernel: [134045.316400] kworker/u:2 D c0399658 0 10981 2 0x00000000
Aug 7 11:56:58 raspberrypi kernel: [134045.316487] from
Aug 7 11:57:07 raspberrypi kernel: [134045.316545] from
Aug 7 11:57:14 raspberrypi kernel: [134045.316576] from
Aug 7 11:57:20 raspberrypi kernel: [134045.316604] from
Aug 7 11:57:28 raspberrypi kernel: [134045.316629] from
Aug 7 11:57:41 raspberrypi kernel: [134045.316663] from
Aug 7 11:57:54 raspberrypi kernel: [134045.316715] from

@pinjan pinjan closed this as completed Aug 7, 2013
@pinjan pinjan reopened this Aug 7, 2013
@zeronickname
Copy link
Owner

My kernel debugging is not great, but the timestamps add up well, so it must be related.

One thing I notice (looking again at the logs you uploaded earlier) is that the time to take a full size picture seems to increase as time goes on. I was using python's tempfile module (which uses /tmp as a temporary file store) while uploading pictures to picasa; but not deleting the files once I was done with them (expecting the OS to free space as needed) -- Perhaps this was a bad assumption. I am now explicitly calling os.unlink on the files after uploading them

Would you be able to run another test with the latest script please?
Remember, setting hrs_to_loop = 0 makes it run indefinitely.

@pinjan
Copy link
Author

pinjan commented Aug 8, 2013

Hi, I will start new testrun today or latest tomorrow morning. Hopefully we have some results on Sunday/Monday. I will be running via ethernet cable instead of WLAN.

@zeronickname
Copy link
Owner

OK, I had the same problem. Weird.

INFO: task kworker/u:0:6 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
kworker/u:0     D c0399628     0     6      2 0x00000000
[<c0399628>] (__schedule+0x2c4/0x5b0) from [<c02b808c>] (__mmc_claim_host+0xa8/0x1a8)
[<c02b808c>] (__mmc_claim_host+0xa8/0x1a8) from [<c02bde9c>] (mmc_sd_detect+0x20/0x74)
[<c02bde9c>] (mmc_sd_detect+0x20/0x74) from [<c02ba42c>] (mmc_rescan+0x1c4/0x344)
[<c02ba42c>] (mmc_rescan+0x1c4/0x344) from [<c0035048>] (process_one_work+0x150/0x40c)
[<c0035048>] (process_one_work+0x150/0x40c) from [<c0035628>] (worker_thread+0x150/0x498)
[<c0035628>] (worker_thread+0x150/0x498) from [<c003a7ec>] (kthread+0x88/0x94)
[<c003a7ec>] (kthread+0x88/0x94) from [<c000e9fc>] (kernel_thread_exit+0x0/0x8)

Looks like there are some issues with SD card access? Not sure.
I also spotted that there are loads of

smsc95xx 1-1.1:1.0: eth0: kevent 2 may have been dropped

around the "kworker blocked" messages. A little googling led me to this page, which suggested increasing the size of vm.min_free_kbytes to 16Mb
Perhaps the script is using too much RAM and causing a kernel crash? I'm speculating here.

Anyway, I'm re-running the script with the settings suggested in the previous link:

vm.min_free_kbytes = 16384
smsc95xx.turbo_mode=N

@pinjan
Copy link
Author

pinjan commented Aug 10, 2013

Could this kworker issue be due dynamic ip? Are you having static or dynamic ip, I guess I have dynamic. See : https://bugs.launchpad.net/ubuntu/+source/linux/+bug/975445

@pinjan
Copy link
Author

pinjan commented Aug 11, 2013

My last run started on 9.8.2013 07:45 ended 10.8.2013 19:57 to following messages:

2013-08-10 19:57:06 DEBUG: it's been 900.017170191 seconds since last pic taken. Force it!
2013-08-10 19:57:07 DEBUG: hirez queue push
2013-08-10 19:57:07 DEBUG: FullUploader: popped one off the queue
2013-08-10 19:57:07 DEBUG: Current queue size FullSize:0 ThumbSize:0
2013-08-10 22:24:30 CRITICAL: InsertPhotoSimple barfed! Exception: [Errno 8] _ssl.c:504: EOF occurred in violation of protocol
2013-08-11 01:21:06 CRITICAL: Picasa Login failed! Exception: [Errno -2] Name or service not known
2013-08-11 02:23:50 ERROR: Re-Login failure. Try again in 5s...
2013-08-11 05:20:10 CRITICAL: Picasa Login failed! Exception: [Errno -2] Name or service not known
2013-08-11 06:19:13 ERROR: Re-Login failure. Try again in 5s...
2013-08-11 09:20:08 CRITICAL: Picasa Login failed! Exception: [Errno -2] Name or service not known
2013-08-11 11:03:08 ERROR: Re-Login failure. Try again in 5s...
2013-08-11 14:42:26 CRITICAL: Picasa Login failed! Exception: [Errno -2] Name or service not known
2013-08-11 16:27:43 ERROR: Re-Login failure. Try again in 5s...

I am leaning towards network issue due DCHP. I am trying now next run with static ip.

@zeronickname
Copy link
Owner

Hi pinjan,

I am not sure about the last trace you uploaded. It seems to be a different issue to the original crash:

INFO: task kworker/u:0:6 blocked for more than 120 seconds.

With the aforementioned tweak of vm.min_free_kbytes = 16384, I managed to replicate the issue, but this time instead of a pi that was completely hung & requiring a reboot, I could still access it and poke around. From what I could tell, the process mmcqd was using a lot of cpu. I ran some of the tests detailed here, and it looked like the SD card read speeds had plummeted. These appear to line up with the logs which show that it takes an extra-ordinarily long amount of time to save the image to disk.

Perhaps I should try with a different SD card? Perhaps not saving images to SD card at all (which was the original implementation but I noticed you very quickly run out of RAM if there are lots of photos being taken and you can't upload them quickly).

I might post this to the raspberry Pi forums and see if someone else has any ideas about the mmcqd issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants