-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Amazon SSA not working #670
Comments
cc @jerryk55 |
@agrare was this ever addressed? I recall you making some fixes for SmartState Analysis, but cannot figure out if this is still an issue |
There was an issue with the agent_coordinator worker that we fixed, not sure if that fixes this or not though |
This issue has been automatically marked as stale because it has not been updated for at least 3 months. If you can still reproduce this issue on the current release or on Thank you for all your contributions! More information about the ManageIQ triage process can be found in the triage process documentation. |
Hello,
Customer is trying to run SmartState Analysis on an instance of AWS running CentOS.
In the user tasks this analysis fails with: "job timed out after 3060.770851409 seconds of inactivity. Inactivity threshold [3000 seconds]"
This is Infrastructure Management running as virtual appliance on VMWare.
In the provided logs (TS004217864/2020-09-27/ dir on ecurep) I can see:
--> process agent coordinator running
12190 root 21 1 395768 204608 14740 R 15.4 0.8 0:09.24 MIQ: Amazon::CloudManager::EventCatcher id: 76, queue: ems_3
12188 root 30 10 361160 202108 13100 R 15.3 0.8 0:09.19 MIQ: Amazon::AgentCoordinatorWorker id: 75, queue: ems_agent_coordinator
12194 root 23 3 361408 195428 13128 R 14.7 0.8 0:08.85 MIQ: Amazon::CloudManager::MetricsCollectorWorker id: 78, queue: amazon
--> The related task info with GUID
[----] I, [2020-09-27T15:03:16.993192 #2228:2abfbf325970] INFO -- : MIQ(ManageIQ::Providers::Amazon::AgentCoordinatorWorker#start) Worker started: ID [75], PID [12188], GUID [5b493b88-02ff-4570-8697-6e393122260b]
--> I have looked at ews.log for GUID but, the only timeout error I found is the following:
[----] I, [2020-09-27T15:59:18.098238 #12188:2aeeb017595c] INFO -- : MIQ(ManageIQ::Providers::Amazon::AgentCoordinatorWorker::Runner#do_work) Alive agents in EMS(guid=1c9078f4-1231-4f90-9b63-29fbc205ce72): [].
.....
and then:
[----] I, [2020-09-27T15:59:23.405619 #2684:2ad3ca51b968] INFO -- : MIQ(MiqQueue#deliver) Message id: [5879], Delivering...
[----] W, [2020-09-27T15:59:23.409676 #2684:2ad3ca51b968] WARN -- : MIQ(ManageIQ::Providers::Amazon::CloudManager::Scanning::Job#timeout!) Job: guid: [17167f79-8f9d-428f-a382-4cb29ee8e498], job timed out after 3052.271594948 seconds of inactivity. Inactivity threshold [3000 seconds], aborting
[----] I, [2020-09-27T15:59:23.416496 #2684:2ad3ca51b968] INFO -- : MIQ(MiqQueue.put) Message id: [5882], id: [], Zone: [default], Role: [smartstate], Server: [], MiqTask id: [], Ident: [generic], Target id: [], Instance id: [1], Task id: [], Command: [Job.signal_abort], Timeout: [600], Priority: [100], State: [ready], Deliver On: [], Data: [], Args: ["job timed out after 3052.271594948 seconds of inactivity. Inactivity threshold [3000 seconds]", "error"]
[----] I, [2020-09-27T15:59:23.416693 #2684:2ad3ca51b968] INFO -- : MIQ(MiqQueue#delivered) Message id: [5879], State: [ok], Delivered in [0.01107655] seconds
[----] I, [2020-09-27T15:59:23.422359 #2684:2ad3ca51b968] INFO -- : MIQ(MiqGenericWorker::Runner#get_message_via_drb) Message id: [5881], MiqWorker id: [64], Zone: [], Role: [], Server: [], MiqTask id: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [MiqQueue.check_for_timeout], Timeout: [600], Priority: [90], State: [dequeue], Deliver On: [], Data: [], Args: [], Dequeued in: [3.302457898] seconds
[----] I, [2020-09-27T15:59:23.422617 #2684:2ad3ca51b968] INFO -- : MIQ(MiqQueue#deliver) Message id: [5881], Delivering...
[----] I, [2020-09-27T15:59:23.423784 #2684:2ad3ca51b968] INFO -- : MIQ(MiqQueue#delivered) Message id: [5881], State: [ok], Delivered in [0.001177885] seconds
[----] I, [2020-09-27T15:59:28.470684 #2228:2abfbf325970] INFO -- : MIQ(MiqServer#populate_queue_messages) Fetched 1 miq_queue rows for queue_name=generic, wcount=4, priority=200
[----] I, [2020-09-27T15:59:29.506630 #2684:2ad3ca51b968] INFO -- : MIQ(MiqGenericWorker::Runner#get_message_via_drb) Message id: [5882], MiqWorker id: [64], Zone: [default], Role: [smartstate], Server: [], MiqTask id: [], Ident: [generic], Target id: [], Instance id: [1], Task id: [], Command: [Job.signal_abort], Timeout: [600], Priority: [100], State: [dequeue], Deliver On: [], Data: [], Args: ["job timed out after 3052.271594948 seconds of inactivity. Inactivity threshold [3000 seconds]", "error"], Dequeued in: [6.094388538] seconds
[----] I, [2020-09-27T15:59:29.506939 #2684:2ad3ca51b968] INFO -- : MIQ(MiqQueue#deliver) Message id: [5882], Delivering...
[----] I, [2020-09-27T15:59:29.656524 #2684:2ad3ca51b968] INFO -- : MIQ(MiqQueue.put) Message id: [5883], id: [], Zone: [default], Role: [automate], Server: [], MiqTask id: [], Ident: [generic], Target id: [], Instance id: [], Task id: [], Command: [MiqAeEngine.deliver], Timeout: [3600], Priority: [20], State: [ready], Deliver On: [], Data: [], Args: [{:object_type=>"ManageIQ::Providers::Amazon::CloudManager::Vm", :object_id=>34, :attrs=>{:event_type=>"vm_scan_abort", "VmOrTemplate::vm"=>34, :vm_id=>34, :host=>nil, "MiqEvent::miq_event"=>308, :miq_event_id=>308, "EventStream::event_stream"=>308, :event_stream_id=>308}, :instance_name=>"Event", :user_id=>1, :miq_group_id=>1, :tenant_id=>1, :automate_message=>nil}]
[----] E, [2020-09-27T15:59:29.656639 #2684:2ad3ca51b968] ERROR -- : MIQ(ManageIQ::Providers::Amazon::CloudManager::Scanning::Job#process_abort) job aborting, job timed out after 3052.271594948 seconds of inactivity. Inactivity threshold [3000 seconds]
[----] E, [2020-09-27T15:59:29.662790 #2684:2ad3ca51b968] ERROR -- : MIQ(MiqQueue#deliver) Message id: [5882], Error: [finish is not permitted at state aborting]
[----] E, [2020-09-27T15:59:29.662909 #2684:2ad3ca51b968] ERROR -- : [RuntimeError]: finish is not permitted at state aborting Method:[block (2 levels) in class:LogProxy]
[----] E, [2020-09-27T15:59:29.662994 #2684:2ad3ca51b968] ERROR -- : /var/www/miq/vmdb/app/models/job/state_machine.rb:50:in `signal'
Not sure if this is related to the issue.
The text was updated successfully, but these errors were encountered: