4 hour - incomplete run of: query_engine = AutoQueryEngine.from_parameters(documents=documents) #170

pdurusau · 2023-12-01T16:35:59Z

pdurusau
Dec 1, 2023

Stepping through https://abrahimzaman360.medium.com/introduction-to-autollm-c8cd31be2a5f in Google Colab, and at this step, using a text with 31K words, I killed the process after 4 hours. I can say the text is oddly formated:

“And for me,” said Thorin.

“And raspberry jam and apple-tart,” said
Bifur.

“And mince-pies and cheese,” said Bofur.

If I'm reading the OpenAI billing correctly, I was only charged $0.01, which I find hard to believe but anything is possible.

A newbie question I know but is the excessive spacing in the text having an impact on the parsing? It wouldn't in awk, or at least noticeably, but I don't know about here. The query engine was 3.5 turbo if I'm reading the source correctly. Thanks!

Answered by SeeknnDestroy

Dec 1, 2023

Hi @pdurusau,

Thanks for trying our package! Could you share your code for better support?

Note: AutoQueryEngine.from_parameters is deprecated. Please use from_defaults now.

Great news: our new release is out today! It includes updates to enhance user experience, including a progress bar for improved feedback during parsing. This should help in situations like yours. Update the package with pip install --upgrade autollm[readers] to access these enhancements.

Looking forward to your feedback!

Best,
Talha

View full answer

SeeknnDestroy · 2023-12-01T18:25:30Z

SeeknnDestroy
Dec 1, 2023
Maintainer

Hi @pdurusau,

Thanks for trying our package! Could you share your code for better support?

Note: AutoQueryEngine.from_parameters is deprecated. Please use from_defaults now.

Great news: our new release is out today! It includes updates to enhance user experience, including a progress bar for improved feedback during parsing. This should help in situations like yours. Update the package with pip install --upgrade autollm[readers] to access these enhancements.

Looking forward to your feedback!

Best,
Talha

0 replies

pdurusau · 2023-12-01T18:29:12Z

pdurusau
Dec 1, 2023
Author

It's the most basic script, btw, I copied out of another colab window, not the one that failed, so I did use .from_defaults(documents=documents)

Embarrassingly simple:

from getpass import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass("Enter your OpenAI API key:")

!pip install autollm

from google.colab import files
uploaded = files.upload()

from autollm import AutoQueryEngine, read_files_as_documents

documents = read_files_as_documents(input_dir="/content")

query_engine = AutoQueryEngine.from_defaults(documents=documents)

Ah, the file is attached.
hobbit.txt

Thanks for the quick response!

Patrick

0 replies

pdurusau · 2023-12-04T21:39:37Z

pdurusau
Dec 4, 2023
Author

Update, same code and file as before, ran 21 hours, 15 minutes before reporting Embedding Token Usage. Then ran until it crashed at 12+ GB of memory in Colab.

Would this have something to do with query_engine = AutoQueryEngine.from_defaults(
...
llm_max_tokens=None,

???

Not likely the best setting for default?

0 replies

SeeknnDestroy · 2023-12-05T15:59:59Z

SeeknnDestroy
Dec 5, 2023
Maintainer

Hi @pdurusau,

Thank you for reaching out and sharing the details of the issue you're experiencing with the AutoQueryEngine.from_defaults in your Google Colab environment. To better understand and assist you, could you please provide some additional information? Any screenshots, log messages, error messages etc..

I don't expect to llm_max_token parameter to be relevant with the current crash. Also you can share a reproducible code if you want. That would very much help me better investiage the issue.

5 replies

pdurusau Dec 6, 2023
Author

Part of what puzzles is the running of quite simple code against a plain text file, as I have posted and uploaded before.

The cause of the crash, at least the immediate cause, was a lack of memory. As it reported embedding token usage, RAM memory usage kept going up until it hit the 12.7 GB limit and crashed.

I was unable to go back further than the attached screenshot.

SeeknnDestroy Dec 6, 2023
Maintainer

@fcakyon any opinions on how to resolve the issue other than chunking the file into seperate docs?

fcakyon Dec 8, 2023
Maintainer

@pdurusau there are 97k words in your txt file. In such cases you should use larger chunk_size as 2048 or 3072.

@SeeknnDestroy if you could provide an example code with larger chunk size, it would be great.

SeeknnDestroy Dec 8, 2023
Maintainer

Sure @fcakyon, there are two strategies we can try:

Increase the chunk_size to reduce the number of embeddings created.
Use SimpleVectorStore which may be more memory-efficient for your use case.

Here's an example code snippet implementing these suggestions:

from autollm import AutoQueryEngine, read_files_as_documents

# Set a higher chunk size
chunk_size = 2048  # Adjust as needed, e.g., 3072
# Set in-memory vector store
vector_store_type = "SimpleVectorStore"

# Initialize the query engine with the SimpleVectorIndex
query_engine = AutoQueryEngine.from_defaults(documents, chunk_size=chunk_size, vector_store_type=vector_store_type)

# Rest of your code...

By following these steps, you should see an improvement in memory usage during processing. Please give this a try and let us know if it helps with the issue @pdurusau

fcakyon Dec 8, 2023
Maintainer

@pdurusau you can try these 2 suggestions separately.

Increase chunk_size (to 2048 or 3072)
Use SimpleVectorStore as vector store type
Increase chunk_size (to 2048 or 3072) and use SimpleVectorStore as vector store type

pdurusau · 2023-12-08T15:58:05Z

pdurusau
Dec 8, 2023
Author

Thanks! Running the suggestions now! Appreciate the help!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

4 hour - incomplete run of: query_engine = AutoQueryEngine.from_parameters(documents=documents) #170

{{title}}

Replies: 5 comments 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

4 hour - incomplete run of: query_engine = AutoQueryEngine.from_parameters(documents=documents) #170

pdurusau Dec 1, 2023

Replies: 5 comments · 5 replies

SeeknnDestroy Dec 1, 2023 Maintainer

pdurusau Dec 1, 2023 Author

pdurusau Dec 4, 2023 Author

SeeknnDestroy Dec 5, 2023 Maintainer

pdurusau Dec 6, 2023 Author

SeeknnDestroy Dec 6, 2023 Maintainer

fcakyon Dec 8, 2023 Maintainer

SeeknnDestroy Dec 8, 2023 Maintainer

fcakyon Dec 8, 2023 Maintainer

pdurusau Dec 8, 2023 Author

pdurusau
Dec 1, 2023

Replies: 5 comments 5 replies

SeeknnDestroy
Dec 1, 2023
Maintainer

pdurusau
Dec 1, 2023
Author

pdurusau
Dec 4, 2023
Author

SeeknnDestroy
Dec 5, 2023
Maintainer

pdurusau Dec 6, 2023
Author

SeeknnDestroy Dec 6, 2023
Maintainer

fcakyon Dec 8, 2023
Maintainer

SeeknnDestroy Dec 8, 2023
Maintainer

fcakyon Dec 8, 2023
Maintainer

pdurusau
Dec 8, 2023
Author