Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

日志异常,太多“too many open files; retrying in 1s” #45

Open
void285 opened this issue Dec 6, 2019 · 2 comments
Open

日志异常,太多“too many open files; retrying in 1s” #45

void285 opened this issue Dec 6, 2019 · 2 comments

Comments

@void285
Copy link

void285 commented Dec 6, 2019

环境:服务端、Ubuntu 16、版本v0.102.3 Beta。

偶然输出日志,发现文件过大,查了下发现很多“too many open files; retrying in 1s”错误和“unsupported value: NaN”错误,下面是日志异常的具体情况:

异常行举例:

2019/09/13 10:43:25 http: Accept error: accept tcp [::]:15944: accept4: too many open files; retrying in 1s

2019-12-06 12:26:23 [ERROR] get stat failed: json: unsupported value: NaN

异常统计:

root@vps:/usr/local/stathub# ll -h log/stathub.log
-rw-r--r-- 1 nobody nogroup 683M Dec  6 12:24 log/stathub.log

root@vps:/usr/local/stathub# grep many log/stathub.log | wc -c
707339852

root@vps:/usr/local/stathub# wc -l log/stathub.log
6660045 log/stathub.log

root@vps:/usr/local/stathub# grep many log/stathub.log | wc -l
6549406

root@vps:/usr/local/stathub# grep many log/stathub.log | awk '{print $1}' | sort | uniq -c
      9 2019-09-05
  51805 2019/09/05
     12 2019-09-06
  78531 2019/09/06
  86372 2019/09/07
  86367 2019/09/08
  86365 2019/09/09
  86366 2019/09/10
  86367 2019/09/11
  86369 2019/09/12
  86366 2019/09/13
  86368 2019/09/14
  86371 2019/09/15
  86513 2019/09/16
  86347 2019/09/17
  86352 2019/09/18
  86355 2019/09/19
  86316 2019/09/20
  86211 2019/09/21
  86170 2019/09/22
  86266 2019/09/23
  86489 2019/09/24
  86357 2019/09/25
  86359 2019/09/26
  86357 2019/09/27
  86368 2019/09/28
  86370 2019/09/29
  86363 2019/09/30
  86366 2019/10/01
  86361 2019/10/02
  86355 2019/10/03
  86360 2019/10/04
  86361 2019/10/05
  86362 2019/10/06
  86347 2019/10/07
  86348 2019/10/08
  86288 2019/10/09
  86283 2019/10/10
  86320 2019/10/11
  86323 2019/10/12
  86325 2019/10/13
  86357 2019/10/14
  86319 2019/10/15
  85938 2019/10/16
  86342 2019/10/17
  86342 2019/10/18
  86309 2019/10/19
  86357 2019/10/20
  86358 2019/10/21
  86368 2019/10/22
  86524 2019/10/23
  86356 2019/10/24
  86350 2019/10/25
  86357 2019/10/26
  86365 2019/10/27
  86361 2019/10/28
  86370 2019/10/29
  86330 2019/10/30
  86347 2019/10/31
  86366 2019/11/01
  86368 2019/11/02
  86359 2019/11/03
  86368 2019/11/04
  86365 2019/11/05
  86365 2019/11/06
  86368 2019/11/07
  86369 2019/11/08
  86368 2019/11/09
  86359 2019/11/10
  86370 2019/11/11
  86363 2019/11/12
  86370 2019/11/13
  86357 2019/11/14
  86367 2019/11/15
  86357 2019/11/16
  86351 2019/11/17
  86365 2019/11/18
  86367 2019/11/19
  17019 2019/11/20
     12 2019-12-06
  12272 2019/12/06

root@vps:/usr/local/stathub# grep -v many log/stathub.log > a.log

root@vps:/usr/local/stathub# ll -h a.log
-rw-r--r--  1 root   root    7.9M Dec  6 12:34 a.log

root@vps:/usr/local/stathub# tail -n 3 a.log
2019-12-06 12:24:23 [ERROR] get stat failed: json: unsupported value: NaN
2019-12-06 12:25:23 [ERROR] get stat failed: json: unsupported value: NaN
2019-12-06 12:26:23 [ERROR] get stat failed: json: unsupported value: NaN

root@vps:/usr/local/stathub# wc -l a.log
110477 a.log

root@vps:/usr/local/stathub# grep "unsupported value: NaN" a.log | wc -l
109200

root@vps:/usr/local/stathub# grep -v "unsupported value: NaN" a.log > b.log

root@vps:/usr/local/stathub# ll -h b.log
-rw-r--r-- 1 root root 182K Dec  6 12:35 b.log

root@vps:/usr/local/stathub# tail b.log
| Thank you for your using, By Li Kexian           |
| StatHub, Apache License, Version 2.0             |
----------------------------------------------------
panic: listen tcp :15944: bind: address already in use

goroutine 1 [running]:
main.HttpService()
        /Users/likexian/www/github/stathub-go/src/server.go:41 +0x2f2
main.main()
        /Users/likexian/www/github/stathub-go/src/main.go:160 +0x896

剩余的大部分就是大约1千行" TLS handshake error"错误了,还算正常。

@void285
Copy link
Author

void285 commented Dec 7, 2019

新情况报告,
发现一台海外的客户机总是不出现在客户端列表中,查看其日志,发现也大量报告了上面的两个错误,因为“too many open files”的错误是每秒记录一次,"nan"错误是每分记录一次,所以很多天都是第一个错误出现86300多行,第二个出现1440行,于是日志文件很大。如:

awk '{print $1}' log/stathub.log | sort | uniq -c
节选
……
   1440 2019-10-03
  86348 2019/10/03
……

(两个错误行的日志记录的时间格式也有差异)

所以这个情况在客户端、服务器都有出现。

可能是因为不断重试,top列表中可间歇性看到stathub进程出现在前几名:

image

@likexian
Copy link
Owner

likexian commented Dec 7, 2019

收到,我后面优化一下重试的逻辑。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants