Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Regression Issue: Server Crashing with semtech/mu-search:0.10.0-beta.5 in docker-compose.yml #72

Open
cedricdcc opened this issue Oct 31, 2024 · 11 comments
Assignees
Labels
bug Something isn't working

Comments

@cedricdcc
Copy link
Contributor

After upgrading the semtech/mu-search image from 0.10.0-beta.3 to 0.10.0-beta.5 in the docker-compose.yml file, our previously functioning server setup is now crashing. Specifically, the mu-search container in the beta.5 version restarts continuously after completing indexing, while the beta.3 version does not exhibit this issue.

Context and Setup:

  1. Repository link to docker-compose line: docker-compose.yml#L85.
  2. Configuration files: The same configuration files are being used across both setups:
  3. Volume and Network Setup: All container data, networks, images, and data from linked volumes (Elasticsearch and Virtuoso) were completely reset/pruned between setups.

Problem Details:

  • Behavior in beta.5: The mu-search container restarts after indexing is complete.
  • Behavior in beta.3: The mu-search container does not restart and remains stable after indexing.
  • Environment Consistency: Aside from the mu-search image upgrade, all other environment variables, volumes, configurations, and settings remain unchanged between the two setups.

Expected Behavior:
The server setup should maintain stability post-indexing in beta.5, consistent with its behavior in beta.3.

Steps to Reproduce:

  1. Set up the environment using the docker-compose.yml with the semtech/mu-search:0.10.0-beta.5 image.
  2. Load the configuration files linked above.
  3. Monitor the mu-search container behavior after indexing.
@cedricdcc cedricdcc added the bug Something isn't working label Oct 31, 2024
@cedricdcc
Copy link
Contributor Author

UPDATE: when testing this on the docker-dev server the mu-search does not crash. It does output some UPDATE ERRORS regarding max yaml points in the logs

2024-11-06T10:21:19.391924044Z INFO [#1] UPDATE HANDLER -- Persisting update queue to disk (length: 0)
2024-11-06T10:21:19.638663437Z ERROR [#1] UPDATE HANDLER -- Failed to persist update queue to disk
2024-11-06T10:21:19.639079436Z ERROR [#1] UPDATE HANDLER -- org.snakeyaml.engine.v2.scanner.ScannerImpl.fetchMoreTokens(ScannerImpl.java:289): The incoming YAML document exceeds the limit: 3145728 code points. (Java::OrgSnakeyamlEngineV2Exceptions::YamlEngineException)

vocabserver-app-search-1_logs.txt

@nvdk
Copy link
Contributor

nvdk commented Nov 8, 2024

Hi on the last comment:
the error indicates a very large update queue, the yaml store is configured by default to not store a document larger than 3MB. So the error is indicative of mu-search not being able to follow the amount of changes and a very large amount of changes happening in a short time.

On the first issue: we've not been able to reproduce this

@cedricdcc
Copy link
Contributor Author

Thanks for the follow-up and for confirming the potential issue with large update queues and the default 3MB YAML storage limit.
Is there a way to increase the YAML storage size to accommodate larger updates, or would you recommend a configuration change to handle high-change volumes more gracefully?

@nvdk
Copy link
Contributor

nvdk commented Nov 12, 2024

Hi @cedricdcc,

Your report triggered some further investigation and we found an issue with beta.5 due to an underlying upgrade. Previously we had no size limit, but the yaml parsing library we use introduced one as a safety measure. We've just created a PR that bumps the default size limit to 20MB on mu-search. I expect this branch to be merged later this week

@nvdk
Copy link
Contributor

nvdk commented Nov 14, 2024

@cedricdcc this should be resolved by bumping mu-search to 0.10.0

@nvdk
Copy link
Contributor

nvdk commented Nov 18, 2024

If the issue is resolved, please close the issue to let us know :)

@cedricdcc
Copy link
Contributor Author

We are currently running a big test on our docker-dev server. In a couple of days we can be sure if this specific issue has been resolved :)

@cedricdcc
Copy link
Contributor Author

This zip file includes the following tools to aid in analyzing the logs and identifying the root cause of the issue:

  1. Docker Logs:

    • Logs from various containers involved in the system are included for review.
    • Notable logs are from vocabserver-app-search and other relevant services.
  2. Log Analysis Script (log_analysis.py):

    • This script processes the logs to:
      • Extract the latest log dates for each container.
      • Identify and count error occurrences, including UPDATE ERRORS.
    • It is configurable for further log investigations if needed.
  3. Container Information:

    • Metadata about the containers is included (container_info.csv) to provide context for the logs.

Out of these logs I have found that the YAML storage size issue has not been resolved.
I hope these files help you further investigate the issue
vocabserver_log_checks.zip

@nvdk
Copy link
Contributor

nvdk commented Nov 26, 2024

@cedricdcc can you confirm you bumped search to 0.10.0 ? this was not updated in the repository.

@cedricdcc
Copy link
Contributor Author

I can indeed confirm that in our docker-dev stack the mu-search component was updated to 0.10.0

@nvdk
Copy link
Contributor

nvdk commented Nov 26, 2024

This is also confirmed by the logs you provided since they mention exceeding 20MB size. I see it does actually manage to index everything aside from the warnings. So while not ideal, I guess it is manageable at the moment?

Obviously we will see how to best resolve this issue, at the very least we can be smarter when trying to write the queue (it now errors because we try to read it before writing it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants