You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey there, @istranic! I'm a bot here to help you with bug fixes, answer questions, and guide you on becoming a contributor. Let's tackle this issue together while we wait for a human maintainer.
The tool has specific configuration options for processing scientific or technical documents, which can help in handling scientific spec sheets effectively. These include:
Parsing Configuration:
use_doc_details: Set to true to utilize document details during parsing.
chunk_size: Set to 7000 to define the size of each chunk of the document.
overlap: Set to 250 to specify the overlap between chunks.
Answer Configuration:
evidence_k: Set to 8 to determine the number of evidence pieces to consider.
answer_max_sources: Set to 3 to limit the maximum number of sources for an answer.
max_concurrent_requests: Set to 8 to control the number of concurrent requests.
Prompts Configuration:
use_json: Set to true to enable JSON format for prompts.
LLM Configuration:
Various rate limits are specified for different models, such as gpt-4o, gpt-4-turbo, and gpt-3.5-turbo, with limits ranging from 40,000 to 2,000,000 per minute.
Embedding Configuration:
rate_limit: Set to 1,000,000 per minute.
These configurations are designed to manage resources and performance effectively, which should help in processing scientific spec sheets. However, there is no specific information on limitations or issues with highly technical or domain-specific language in the repository [1][2].
Has anyone tried this tool with scientific spec sheets like those provided for manufacturing tools, probes, measurement devices, etc.
Is there anything in the way the models are prompted or the way the data is processed that would make that not work well?
The text was updated successfully, but these errors were encountered: