Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance with Scientific Spec Sheets #656

Open
istranic opened this issue Nov 1, 2024 · 1 comment
Open

Performance with Scientific Spec Sheets #656

istranic opened this issue Nov 1, 2024 · 1 comment
Labels
question Further information is requested

Comments

@istranic
Copy link

istranic commented Nov 1, 2024

Has anyone tried this tool with scientific spec sheets like those provided for manufacturing tools, probes, measurement devices, etc.

Is there anything in the way the models are prompted or the way the data is processed that would make that not work well?

@dosubot dosubot bot added the question Further information is requested label Nov 1, 2024
Copy link

dosubot bot commented Nov 1, 2024

Hey there, @istranic! I'm a bot here to help you with bug fixes, answer questions, and guide you on becoming a contributor. Let's tackle this issue together while we wait for a human maintainer.

The tool has specific configuration options for processing scientific or technical documents, which can help in handling scientific spec sheets effectively. These include:

  • Parsing Configuration:

    • use_doc_details: Set to true to utilize document details during parsing.
    • chunk_size: Set to 7000 to define the size of each chunk of the document.
    • overlap: Set to 250 to specify the overlap between chunks.
  • Answer Configuration:

    • evidence_k: Set to 8 to determine the number of evidence pieces to consider.
    • answer_max_sources: Set to 3 to limit the maximum number of sources for an answer.
    • max_concurrent_requests: Set to 8 to control the number of concurrent requests.
  • Prompts Configuration:

    • use_json: Set to true to enable JSON format for prompts.
  • LLM Configuration:

    • Various rate limits are specified for different models, such as gpt-4o, gpt-4-turbo, and gpt-3.5-turbo, with limits ranging from 40,000 to 2,000,000 per minute.
  • Embedding Configuration:

    • rate_limit: Set to 1,000,000 per minute.

These configurations are designed to manage resources and performance effectively, which should help in processing scientific spec sheets. However, there is no specific information on limitations or issues with highly technical or domain-specific language in the repository [1][2].

To continue talking to Dosu, mention @dosu.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant