We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
24.10
No response
After switching from Triton 23.06 to 24.09, I now get a segfault when running the Sensitive Information Detection (SID) example. https://github.com/nv-morpheus/Morpheus/tree/branch-24.10/examples/nlp_si_detection
Also seeing this in the Triton logs
I1031 19:52:26.979265 1 grpc_server.cc:2558] "Started GRPCInferenceService at 0.0.0.0:8001" I1031 19:52:26.979522 1 http_server.cc:4704] "Started HTTPService at 0.0.0.0:8000" I1031 19:52:27.020947 1 http_server.cc:362] "Started Metrics Service at 0.0.0.0:8002" 2024-10-31 19:53:20.292330528 [W:onnxruntime:log, tensorrt_execution_provider.h:90 log] [2024-10-31 19:53:20 WARNING] Detected layernorm nodes in FP16. 2024-10-31 19:53:20.292396299 [W:onnxruntime:log, tensorrt_execution_provider.h:90 log] [2024-10-31 19:53:20 WARNING] Running layernorm after self-attention with FP16 Reduce or Pow may cause overflow. Forcing Reduce or Pow Layers in FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accuracy. 2024-10-31 19:53:56.917212111 [W:onnxruntime:log, tensorrt_execution_provider.h:90 log] [2024-10-31 19:53:56 WARNING] Detected layernorm nodes in FP16. 2024-10-31 19:53:56.917239452 [W:onnxruntime:log, tensorrt_execution_provider.h:90 log] [2024-10-31 19:53:56 WARNING] Running layernorm after self-attention with FP16 Reduce or Pow may cause overflow. Forcing Reduce or Pow Layers in FP32 precision, or exporting the model to use INormalizationLayer (available with ONNX opset >= 17) can help preserving accuracy. 2024-10-31 19:53:56.966697260 [E:onnxruntime:log, tensorrt_execution_provider.h:88 log] [2024-10-31 19:53:56 ERROR] IExecutionContext::enqueueV3: Error Code 1: Myelin ([exec_instruction.cpp:exec:905] CUDA error 400 launching __myl_RepGatGatGatResResAddResAddResMeaSubMulMea kernel.) 2024-10-31 19:53:56.966738118 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. 2024-10-31 19:53:56.992547855 [E:onnxruntime:log, tensorrt_execution_provider.h:88 log] [2024-10-31 19:53:56 ERROR] IExecutionContext::enqueueV3: Error Code 1: Myelin ([exec_instruction.cpp:exec:905] CUDA error 400 launching __myl_RepGatGatGatResResAddResAddResMeaSubMulMea kernel.) 2024-10-31 19:53:56.992578951 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. 2024-10-31 19:53:57.014066591 [E:onnxruntime:log, tensorrt_execution_provider.h:88 log] [2024-10-31 19:53:57 ERROR] IExecutionContext::enqueueV3: Error Code 1: Myelin ([exec_instruction.cpp:exec:905] CUDA error 400 launching __myl_RepGatGatGatResResAddResAddResMeaSubMulMea kernel.) 2024-10-31 19:53:57.014098015 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. 2024-10-31 19:53:57.030573573 [E:onnxruntime:log, tensorrt_execution_provider.h:88 log] [2024-10-31 19:53:57 ERROR] IExecutionContext::enqueueV3: Error Code 1: Myelin ([exec_instruction.cpp:exec:905] CUDA error 400 launching __myl_RepGatGatGatResResAddResAddResMeaSubMulMea kernel.) 2024-10-31 19:53:57.030602777 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. 2024-10-31 19:53:57.058039747 [E:onnxruntime:log, tensorrt_execution_provider.h:88 log] [2024-10-31 19:53:57 ERROR] IExecutionContext::enqueueV3: Error Code 1: Myelin ([exec_instruction.cpp:exec:905] CUDA error 400 launching __myl_RepGatGatGatResResAddResAddResMeaSubMulMea kernel.) 2024-10-31 19:53:57.058074218 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. 2024-10-31 19:53:57.077752653 [E:onnxruntime:log, tensorrt_execution_provider.h:88 log] [2024-10-31 19:53:57 ERROR] IExecutionContext::enqueueV3: Error Code 1: Myelin ([exec_instruction.cpp:exec:905] CUDA error 400 launching __myl_RepGatGatGatResResAddResAddResMeaSubMulMea kernel.) 2024-10-31 19:53:57.077786224 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. 2024-10-31 19:53:57.102153869 [E:onnxruntime:log, tensorrt_execution_provider.h:88 log] [2024-10-31 19:53:57 ERROR] IExecutionContext::enqueueV3: Error Code 1: Myelin ([exec_instruction.cpp:exec:905] CUDA error 400 launching __myl_RepGatGatGatResResAddResAddResMeaSubMulMea kernel.) 2024-10-31 19:53:57.102185109 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. 2024-10-31 19:53:57.122471509 [E:onnxruntime:log, tensorrt_execution_provider.h:88 log] [2024-10-31 19:53:57 ERROR] IExecutionContext::enqueueV3: Error Code 1: Myelin ([exec_instruction.cpp:exec:905] CUDA error 400 launching __myl_RepGatGatGatResResAddResAddResMeaSubMulMea kernel.) 2024-10-31 19:53:57.122501052 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. 2024-10-31 19:53:57.137861868 [E:onnxruntime:log, tensorrt_execution_provider.h:88 log] [2024-10-31 19:53:57 ERROR] IExecutionContext::enqueueV3: Error Code 1: Myelin ([exec_instruction.cpp:exec:905] CUDA error 400 launching __myl_RepGatGatGatResResAddResAddResMeaSubMulMea kernel.) 2024-10-31 19:53:57.137897524 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. 2024-10-31 19:53:57.153740337 [E:onnxruntime:log, tensorrt_execution_provider.h:88 log] [2024-10-31 19:53:57 ERROR] IExecutionContext::enqueueV3: Error Code 1: Myelin ([exec_instruction.cpp:exec:905] CUDA error 400 launching __myl_RepGatGatGatResResAddResAddResMeaSubMulMea kernel.) 2024-10-31 19:53:57.153768012 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. 2024-10-31 19:53:57.180863558 [E:onnxruntime:log, tensorrt_execution_provider.h:88 log] [2024-10-31 19:53:57 ERROR] IExecutionContext::enqueueV3: Error Code 1: Myelin ([exec_instruction.cpp:exec:905] CUDA error 400 launching __myl_RepGatGatGatResResAddResAddResMeaSubMulMea kernel.) 2024-10-31 19:53:57.180897473 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed.
Follow steps in example. Error is seen when running pipeline with CLI.
====Building Segment Complete!==== Inference Rate: 0 inf [00:40, ? inf/s]E20241031 19:39:15.559962 140437595592256 triton_inference.cpp:75] Triton Error while executing 'results->Shape(model_output.name, &output_shape)'. Error: onnx runtime error 1: Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. ../python/morpheus/morpheus/_lib/src/stages/triton_inference.cpp(447) W20241031 19:39:15.560433 140437595592256 inference_client_stage.cpp:284] Exception while processing message for InferenceClientStage, attempting retry. ex.what(): Triton Error while executing 'results->Shape(model_output.name, &output_shape)'. Error: onnx runtime error 1: Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. ../python/morpheus/morpheus/_lib/src/stages/triton_inference.cpp(447) E20241031 19:39:15.560933 140437595592256 triton_inference.cpp:75] Triton Error while executing 'm_client.async_infer( [this, handle](triton::client::InferResult* result) { m_result.reset(result); handle(); }, m_options, m_inputs, m_outputs)'. Error: onnx runtime error 1: Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. ../python/morpheus/morpheus/_lib/src/stages/triton_inference.cpp(114) *** Aborted at 1730403555 (unix time) try "date -d @1730403555" if you are using GNU date *** PC: @ 0x7fbada196af5 morpheus::TritonInferenceClientSession::infer(morpheus::TritonInferenceClientSession::infer(std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, morpheus::TensorObject, std::less<std::__cxx11::basic_string<char, std::ch�=~ *** SIGSEGV (@0x8) received by PID 5182 (TID 0x7fba2cff9640) from PID 8; stack trace: *** @ 0x7fbb54dd8ee8 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x99ee7) @ 0x7fbae6fe4917 google::(anonymous namespace)::FailureSignalHandler(int, siginfo*, void*) E20241031 19:39:15.596625 140437553628736 triton_inference.cpp:75] Triton Error while executing 'results->Shape(model_output.name, &output_shape)'. Error: onnx runtime error 1: Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. ../python/morpheus/morpheus/_lib/src/stages/triton_inference.cpp(447) W20241031 19:39:15.596770 140437553628736 inference_client_stage.cpp:284] Exception while processing message for InferenceClientStage, attempting retry. ex.what(): Triton Error while executing 'results->Shape(model_output.name, &output_shape)'. Error: onnx runtime error 1: Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. ../python/morpheus/morpheus/_lib/src/stages/triton_inference.cpp(447) E20241031 19:39:15.596795 140437553628736 triton_inference.cpp:75] Triton Error while executing 'm_client.async_infer( [this, handle](triton::client::InferResult* result) { m_result.reset(result); handle(); }, m_options, m_inputs, m_outputs)'. Error: onnx runtime error 1: Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. ../python/morpheus/morpheus/_lib/src/stages/triton_inference.cpp(114) @ 0x7fbb54d81520 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x4251f) @ 0x7fbada196af5 morpheus::TritonInferenceClientSession::infer(morpheus::TritonInferenceClientSession::infer(std::map<std::__cxx11::basic_string<char, std::char_traits, std::allocator >, morpheus::TensorObject, std::less<std::__cxx11::basic_string<char, std::ch�=~ @ 0x7fbada306410 pybind11::cpp_function::initialize<mrc::pymrc::AsyncioScheduler::resume(mrc::pymrc::PyObjectHolder, std::__n4861::coroutine_handle)::{lambda()#1}, void>(mrc::pymrc::AsyncioScheduler::resume(mrc::pymrc::PyObjectHolder, std::__n4861::coroutine_handle<v�=~ @ 0x7fbada1c86f6 pybind11::cpp_function::dispatcher(_object*, _object*, _object*) @ 0x55b0727a7c46 cfunction_call @ 0x55b0727a0f73 _PyObject_MakeTpCall.localalias @ 0x55b072763850 context_run @ 0x55b07279f703 cfunction_vectorcall_FASTCALL_KEYWORDS @ 0x55b07279d8ec _PyEval_EvalFrameDefault @ 0x55b0727a80cc _PyFunction_Vectorcall @ 0x55b072798680 _PyEval_EvalFrameDefault @ 0x55b0727a80cc _PyFunction_Vectorcall @ 0x55b072798680 _PyEval_EvalFrameDefault @ 0x55b0727a80cc _PyFunction_Vectorcall @ 0x55b072798680 _PyEval_EvalFrameDefault @ 0x55b0727b3638 method_vectorcall @ 0x7fbada1ee0f5 pybind11::detail::simple_collector<(pybind11::return_value_policy)1>::call(_object*) const @ 0x7fbada30a1a8 mrc::pymrc::AsyncioRunnable<std::shared_ptrmorpheus::ControlMessage, std::shared_ptrmorpheus::ControlMessage >::run(mrc::runnable::Context&) @ 0x7fbada1b00ef mrc::runnable::RunnableWithContextmrc::runnable::Context::main(mrc::runnable::Context&) E20241031 19:39:15.649539 140437587199552 triton_inference.cpp:75] Triton Error while executing 'results->Shape(model_output.name, &output_shape)'. Error: onnx runtime error 1: Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. ../python/morpheus/morpheus/_lib/src/stages/triton_inference.cpp(447) W20241031 19:39:15.649679 140437587199552 inference_client_stage.cpp:284] Exception while processing message for InferenceClientStage, attempting retry. ex.what(): Triton Error while executing 'results->Shape(model_output.name, &output_shape)'. Error: onnx runtime error 1: Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. ../python/morpheus/morpheus/_lib/src/stages/triton_inference.cpp(447) E20241031 19:39:15.649712 140437587199552 triton_inference.cpp:75] Triton Error while executing 'm_client.async_infer( [this, handle](triton::client::InferResult* result) { m_result.reset(result); handle(); }, m_options, m_inputs, m_outputs)'. Error: onnx runtime error 1: Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. ../python/morpheus/morpheus/_lib/src/stages/triton_inference.cpp(114) @ 0x7fbae13e1fae std::_Function_handler<void (), mrc::runnable::Runner::enqueue(std::shared_ptrmrc::runnable::IEngines, std::vector<std::shared_ptrmrc::runnable::Context, std::allocator<std::shared_ptrmrc::runnable::Context > >&&)::{lambda()#1}>::_M_invoke(std::_Any_�s @ 0x7fbae12f5fc8 std::thread::_State_impl<std::thread::_Invoker<std::tuple<mrc::system::ThreadResources::make_thread<boost::fibers::packaged_task<void ()> >(std::__cxx11::basic_string<char, std::char_traits, std::allocator >, mrc::CpuSet, boost::fibers::package�s @ 0x7fbae7de7b65 execute_native_thread_routine @ 0x7fbb54dd3ac3 (/usr/lib/x86_64-linux-gnu/libc.so.6+0x94ac2) E20241031 19:39:15.690391 140437637555776 triton_inference.cpp:75] Triton Error while executing 'results->Shape(model_output.name, &output_shape)'. Error: onnx runtime error 1: Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. ../python/morpheus/morpheus/_lib/src/stages/triton_inference.cpp(447) @ 0x7fbb54e64a04 clone W20241031 19:39:15.690531 140437637555776 inference_client_stage.cpp:284] Exception while processing message for InferenceClientStage, attempting retry. ex.what(): Triton Error while executing 'results->Shape(model_output.name, &output_shape)'. Error: onnx runtime error 1: Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. ../python/morpheus/morpheus/_lib/src/stages/triton_inference.cpp(447) E20241031 19:39:15.690560 140437637555776 triton_inference.cpp:75] Triton Error while executing 'm_client.async_infer( [this, handle](triton::client::InferResult* result) { m_result.reset(result); handle(); }, m_options, m_inputs, m_outputs)'. Error: onnx runtime error 1: Non-zero status code returned while running TRTKernel_graph_torch_jit_3139280210422962738_0 node. Name:'TensorrtExecutionProvider_TRTKernel_graph_torch_jit_3139280210422962738_0_0' Status Message: TensorRT EP execution context enqueue failed. ../python/morpheus/morpheus/_lib/src/stages/triton_inference.cpp(114) Segmentation fault (core dumped)
**git*** commit de72aafc98ffb298195224fc77c2a31eac9efda2 (HEAD -> branch-24.10, origin/pull-request/1965, origin/branch-24.10) Author: Yuchen Zhang <[email protected]> Date: Thu Oct 31 11:14:58 2024 -0700 Fix `log_parsing` example pipeline null output issue (#2024) This bug is caused by the transition from `MultiMessage` to `ControlMessage`. `inference_stage.py::InferenceStage::_build_single` calls `_convert_one_response` in a loop for a batch, and the argument is passing is the same for the whole batch, but inside `_convert_one_response` we're grabbing the tensors and assigning starting at zero, so the tensors are overwriting each other and causes the issue. Added `batch_offset` variable to keep track where the next incoming tensor should start writing to the output message. Closes #2019 ## By Submitting this PR I confirm: - I am familiar with the [Contributing Guidelines](https://github.com/nv-morpheus/Morpheus/blob/main/docs/source/developer_guide/contributing.md). - When the PR is ready for review, new or existing tests cover these changes. - When the PR is ready for review, the documentation is up to date with these changes. Authors: - Yuchen Zhang (https://github.com/yczhang-nv) - David Gardner (https://github.com/dagardner-nv) Approvers: - David Gardner (https://github.com/dagardner-nv) - Christopher Harris (https://github.com/cwharris) URL: https://github.com/nv-morpheus/Morpheus/pull/2024 **git submodules*** f69a1fa8f5977b02a70436d92febfd4db1e0ad4d external/morpheus-visualizations (v24.10.00a-1-gf69a1fa) 87b33dd0b7fd3d7460742bc5ad13d77e0d722c3c external/utilities (v24.10.00a-10-g87b33dd) ***OS Information*** DISTRIB_ID=Ubuntu DISTRIB_RELEASE=22.04 DISTRIB_CODENAME=jammy DISTRIB_DESCRIPTION="Ubuntu 22.04.5 LTS" PRETTY_NAME="Ubuntu 22.04.5 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.5 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" UBUNTU_CODENAME=jammy Linux EFAJARDO-DT 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux ***GPU Information*** Thu Oct 31 23:03:56 2024 +-----------------------------------------------------------------------------------------+ | NVIDIA-SMI 560.35.03 Driver Version: 560.35.03 CUDA Version: 12.6 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 Quadro RTX 8000 Off | 00000000:15:00.0 Off | Off | | 33% 38C P8 11W / 260W | 547MiB / 49152MiB | 0% Default | | | | N/A | +-----------------------------------------+------------------------+----------------------+ +-----------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=========================================================================================| +-----------------------------------------------------------------------------------------+ ***CPU*** Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 46 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 12 On-line CPU(s) list: 0-11 Vendor ID: GenuineIntel Model name: Intel(R) Xeon(R) Gold 6128 CPU @ 3.40GHz CPU family: 6 Model: 85 Thread(s) per core: 2 Core(s) per socket: 6 Socket(s): 1 Stepping: 4 CPU max MHz: 3700.0000 CPU min MHz: 1200.0000 BogoMIPS: 6800.00 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd mba ibrs ibpb stibp fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 erms invpcid rtm cqm mpx rdt_a avx512f avx512dq rdseed adx smap clflushopt clwb intel_pt avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req pku ospke md_clear flush_l1d arch_capabilities L1d cache: 192 KiB (6 instances) L1i cache: 192 KiB (6 instances) L2 cache: 6 MiB (6 instances) L3 cache: 19.3 MiB (1 instance) NUMA node(s): 1 NUMA node0 CPU(s): 0-11 Vulnerability Gather data sampling: Mitigation; Microcode Vulnerability Itlb multihit: KVM: Mitigation: VMX unsupported Vulnerability L1tf: Mitigation; PTE Inversion Vulnerability Mds: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Meltdown: Mitigation; PTI Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable Vulnerability Retbleed: Mitigation; IBRS Vulnerability Spec rstack overflow: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization Vulnerability Spectre v2: Mitigation; IBRS, IBPB conditional, STIBP conditional, RSB filling, PBRSB-eIBRS Not affected Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable ***CMake*** /opt/conda/envs/morpheus/bin/cmake cmake version 3.27.9 CMake suite maintained and supported by Kitware (kitware.com/cmake). ***g++*** /opt/conda/envs/morpheus/bin/g++ g++ (conda-forge gcc 12.1.0-17) 12.1.0 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ***nvcc*** /opt/conda/envs/morpheus/bin/nvcc nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Thu_Jun__6_02:18:23_PDT_2024 Cuda compilation tools, release 12.5, V12.5.82 Build cuda_12.5.r12.5/compiler.34385749_0 ***Python*** /opt/conda/envs/morpheus/bin/python Python 3.10.15 ***Environment Variables*** PATH : /opt/conda/envs/morpheus/bin:/opt/conda/condabin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/conda/bin LD_LIBRARY_PATH : /usr/local/nvidia/lib:/usr/local/nvidia/lib64 NUMBAPRO_NVVM : NUMBAPRO_LIBDEVICE : CONDA_PREFIX : /opt/conda/envs/morpheus PYTHON_PATH :
The text was updated successfully, but these errors were encountered:
efajardo-nv
Successfully merging a pull request may close this issue.
Version
24.10
Which installation method(s) does this occur on?
No response
Describe the bug.
After switching from Triton 23.06 to 24.09, I now get a segfault when running the Sensitive Information Detection (SID) example.
https://github.com/nv-morpheus/Morpheus/tree/branch-24.10/examples/nlp_si_detection
Also seeing this in the Triton logs
Click here to see error details
Minimum reproducible example
Follow steps in example. Error is seen when running pipeline with CLI.
Relevant log output
Click here to see error details
Full env printout
Click here to see environment details
Other/Misc.
No response
Code of Conduct
The text was updated successfully, but these errors were encountered: