Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory disconnected. #60

Open
PedroSFreitas opened this issue May 14, 2017 · 4 comments
Open

Out of memory disconnected. #60

PedroSFreitas opened this issue May 14, 2017 · 4 comments

Comments

@PedroSFreitas
Copy link
Contributor

We are currently facing some kind of memory leaking running smart_module.py. And I believe it should be best open an issue and let others send their suggestions. Any feedback should be great!

Here is a small log from the last run:

2017-05-14 08:19:09.598386 - alert.log - INFO - Fetching alert param. from database
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
2017-05-14 08:19:11.291427 - smartmodule.log - INFO - Wrote to analytic database: [{'fields': {'unit': 'C', 'value': '22.25'}, 'tags': {'site': u'HPF-0', 'asset': 'Indoor Temperature'}, 'time': '2017-05-14 08:19:10.437610', 'measurement': 'Environment'}].
STATUS/QUERY I might need to know how you are!
ASSET/QUERY/1234567890987654 Is it warm here?
Device file: /sys/bus/w1/devices/28-03168af288ff/w1_slave
STATUS/RESPONSE [{'memory': {'cached': 125579264, 'used': 49889280, 'free': 235081728}, 'disk': {'total': 7622344704L, 'free': 2562277376L, 'used': 4689182720L}, 'network': {'packet_recv': 685138, 'packet_sent': 616943}, 'time': 1494749952.018644, 'hostname': 'RTU278768', 'boot': '2017-05-11 11:22:57', 'cpu': {'percentage': 1.4}, 'clients': 1}]
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
[Errno 32] Broken pipe
2017-05-14 08:21:25.400334 - communicator.log - INFO - Disconnected: Out of memory..
2017-05-14 08:21:26.412194 - communicator.log - INFO - Connected with result code 0
$SYS/broker/clients/total 0
Running command self.smart_module.on_query_status()
Running command self.smart_module.on_check_alert()
STATUS/QUERY I might need to know how you are!
ASSET/QUERY/1234567890987654 Is it warm here?
Device file: /sys/bus/w1/devices/28-03168af288ff/w1_slave
Running command self.smart_module.on_query_status()
STATUS/QUERY I might need to know how you are!
Running command self.smart_module.on_check_alert()

After the disconnected, it seems to connect again but there is no answer to queries.

Here you can see how the Free Memory is dropping constantly.
image

And here you can see how the Used Memory is quite stable.
image

It could be Cached Memory, but the growing doesn't match with the drop in Free.
image

@TylerReedMC
Copy link
Collaborator

After weeks of testing, tweaking and testing, we believe the primary leak is in the schedule library. Another leak may occur after unintended disconnects from the MQTT broker. Looking at another scheduling package.

@james-prior
Copy link
Contributor

What is the smallest program that can provoke the problem?
10 to 30 lines is a good size for a demo.

@moritz89
Copy link
Contributor

Have you run the system using pdb? This is an example analysis:

(Pdb) objgraph.show_most_common_types(limit=20)
dict                       378631
list                       184791
builtin_function_or_method 57542
tuple                      55478
Message                    48129
function                   45575
instancemethod             31949
NonBlockingSocket          31876
NonBlockingConnection      31876
_socketobject              31876
_Condition                 28320
AMQPReader                 14900
cell                       9678

@PedroSFreitas
Copy link
Contributor Author

@moritz89 to be honest it has been quite some time that I tested it.
I believe the best now would be a full new test.

What would be really best, in my humble opinion, is a re-write of the smart module part.

@PedroSFreitas PedroSFreitas removed their assignment Jan 5, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants