Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gpt-researcher as python module - messy file structure when installed in environment? #766

Open
danieldekay opened this issue Aug 16, 2024 · 9 comments

Comments

@danieldekay
Copy link
Contributor

danieldekay commented Aug 16, 2024

I installed the package into my environment:

(gpt-researcher-3.12.4) ➜  gpt-researcher-3.12.4 pip show gpt-researcher
Name: gpt-researcher
Version: 0.8.7
Summary: GPT Researcher is an autonomous agent designed for comprehensive online research on a variety of tasks.
Home-page: https://github.com/assafelovic/gpt-researcher
Author: Assaf Elovic
Author-email: [email protected]
License: MIT
Location: /home/username/Envs/gpt-researcher-3.12.4/lib/python3.12/site-packages
Requires: aiofiles, arxiv, beautifulsoup4, colorama, htmldocx, json-repair, langchain, langchain-community, langchain-openai, lxml-html-clean, markdown, md2pdf, mistune, pydantic, PyMuPDF, python-docx, python-dotenv, python-multipart, pyyaml, requests, tiktoken, unstructured, websockets
Required-by:

on inspection, something seems off:
(gpt-researcher-3.12.4) ➜ gpt_researcher-0.8.7.dist-info cat RECORD

yields lines like:

backend/websocket_manager.py,sha256=so4A_3bgQz_W6QfPxVO2xeXoyuyT6JTHW6izAdFbJfg,3790
gpt_researcher-0.8.7.dist-info/INSTALLER,sha256=zuuue4knoyJ-UwPPXg8fezS7VCrXJQrAP7zeNuwvFQg,4
gpt_researcher/config/__init__.py,sha256=2HBJ7lmGyZg8yjXwlK9TS1s5g_vSFtxpPNbXm9yRqmY,48
multi_agents/agents/__pycache__/publisher.cpython-312.pyc,,

To me that looks like package files are installed in site-packages instead of the site-packages\gpt-researcher subdirectory.

@0x11c11e
Copy link
Contributor

The paths shown are correct for the RECORD file, which lists all the files installed by the package. The fact that the files are not fully qualified paths is normal; Python uses these relative paths within the package directory.

@danieldekay
Copy link
Contributor Author

in my env these files are actually installed on the library root level, e.g. backend is next to gpt-researcher and not inside it.

@craig-matadeen
Copy link

in my env these files are actually installed on the library root level, e.g. backend is next to gpt-researcher and not inside it.

This is something I'm a little concerned about as well when I was looking through the examples. I think it would be better for the backend to be qualified by gpt_researcher and not outside of it to limit potential clashes of another package named "backend".

@poornagurram
Copy link

@assafelovic Is this intentional ?

potentially backend might conflict with other packages. If this was not intentional, I would like to take this up and fix.

@assafelovic
Copy link
Owner

Hey @poornagurram gpt_researcher is a pip package and is leveraged within the backend service. You can check it out for examples: https://docs.gptr.dev/docs/gpt-researcher/getting-started/how-to-choose

lmk if you have any improvement suggestions

@poornagurram
Copy link

poornagurram commented Oct 18, 2024

I see. We may need to change the overall structure or probably name the backend as gptr-backend and multi_agents as gptr-multiagents. Please find below the conflict which is arising with the current structure.

@assafelovic I quickly tried importing backend and multi-agent with gpt-researcher and without the same.

GPTR Installed

from backend import *
from multi_agents import *

works(not expected). This might conflict if someone has their own folders with these names.

Expected behaviour with GPTR Installed

from gpt_researcher.backend import *
from gpt_researcher.multi_agents import *

GPTR Uninstalled

from backend import *
from multi_agents import *

doesn't work. (expected)

Let me know what you think!

@assafelovic
Copy link
Owner

I'm still a bit confused. The naming conflicts with existing directories? That's the main issue?

@poornagurram
Copy link

Apologies for the confusion caused. Yes, that's the issue.

@craig-matadeen
Copy link

Hey @assafelovic, basically everything related to gpt-researcher, except for external dependencies should ideally be encapsulated within gpt-researcher because you can end up in a situation where a similar named package to backend clashes with the backend module of gpt-researcher.

For example, I created a new environment. If I ls the site packages folder, we'll get the following:

drwxr-xr-x 3 root root 4096 Oct 28 22:23 _distutils_hack
-rw-r--r-- 6 root root 151 Sep 18 18:45 distutils-precedence.pth
drwxr-xr-x 5 root root 4096 Oct 28 22:23 pip
drwxr-xr-x 2 root root 4096 Oct 28 22:23 pip-24.2.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:23 pkg_resources
-rw-r--r-- 3 root root 119 Oct 3 09:44 README.txt
drwxr-xr-x 9 root root 4096 Oct 28 22:23 setuptools
drwxr-xr-x 2 root root 4096 Oct 28 22:23 setuptools-75.1.0-py3.11.egg-info
drwxr-xr-x 5 root root 4096 Oct 28 22:23 wheel
drwxr-xr-x 2 root root 4096 Oct 28 22:23 wheel-0.44.0.dist-info

Then I install gpt-researcher:

pip install gpt-researcher

After installing gpt-researcher, the site-packages becomes:

drwxr-xr-x 5 root root 4096 Oct 28 22:25 aiofiles
drwxr-xr-x 3 root root 4096 Oct 28 22:25 aiofiles-24.1.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 aiohappyeyeballs
drwxr-xr-x 2 root root 4096 Oct 28 22:25 aiohappyeyeballs-2.4.3.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:25 aiohttp
drwxr-xr-x 2 root root 4096 Oct 28 22:25 aiohttp-3.10.10.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 aiosignal
drwxr-xr-x 2 root root 4096 Oct 28 22:25 aiosignal-1.3.1.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 annotated_types
drwxr-xr-x 3 root root 4096 Oct 28 22:25 annotated_types-0.7.0.dist-info
drwxr-xr-x 7 root root 4096 Oct 28 22:25 anyio
drwxr-xr-x 2 root root 4096 Oct 28 22:25 anyio-4.6.2.post1.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 arxiv
drwxr-xr-x 2 root root 4096 Oct 28 22:25 arxiv-2.1.3.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 attr
drwxr-xr-x 3 root root 4096 Oct 28 22:25 attrs
drwxr-xr-x 3 root root 4096 Oct 28 22:25 attrs-24.2.0.dist-info
drwxr-xr-x 6 root root 4096 Oct 28 22:26 backend
drwxr-xr-x 3 root root 4096 Oct 28 22:25 backoff
drwxr-xr-x 2 root root 4096 Oct 28 22:25 backoff-2.2.1.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 beautifulsoup4-4.12.3.dist-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 Brotli-1.1.0.dist-info
-rwxr-xr-x 1 root root 7455112 Oct 28 22:25 _brotli.cpython-311-x86_64-linux-gnu.so
-rw-r--r-- 1 root root 1866 Oct 28 22:25 brotli.py
drwxr-xr-x 5 root root 4096 Oct 28 22:25 bs4
drwxr-xr-x 3 root root 4096 Oct 28 22:25 certifi
drwxr-xr-x 2 root root 4096 Oct 28 22:25 certifi-2024.8.30.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 cffi
drwxr-xr-x 2 root root 4096 Oct 28 22:25 cffi-1.17.1.dist-info
-rwxr-xr-x 1 root root 1068624 Oct 28 22:25 _cffi_backend.cpython-311-x86_64-linux-gnu.so
drwxr-xr-x 5 root root 4096 Oct 28 22:25 chardet
drwxr-xr-x 2 root root 4096 Oct 28 22:25 chardet-5.2.0.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:25 charset_normalizer
drwxr-xr-x 2 root root 4096 Oct 28 22:25 charset_normalizer-3.4.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 click
drwxr-xr-x 2 root root 4096 Oct 28 22:25 click-8.1.7.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:25 colorama
drwxr-xr-x 3 root root 4096 Oct 28 22:25 colorama-0.4.6.dist-info
drwxr-xr-x 5 root root 4096 Oct 28 22:25 cryptography
drwxr-xr-x 3 root root 4096 Oct 28 22:25 cryptography-43.0.3.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 cssselect2
drwxr-xr-x 2 root root 4096 Oct 28 22:25 cssselect2-0.7.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 dataclasses_json
drwxr-xr-x 2 root root 4096 Oct 28 22:25 dataclasses_json-0.6.7.dist-info
drwxr-xr-x 6 root root 4096 Oct 28 22:25 dateutil
drwxr-xr-x 3 root root 4096 Oct 28 22:25 distro
drwxr-xr-x 2 root root 4096 Oct 28 22:25 distro-1.9.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:23 _distutils_hack
-rw-r--r-- 6 root root 151 Sep 18 18:45 distutils-precedence.pth
drwxr-xr-x 2 root root 4096 Oct 28 22:25 docopt-0.6.2.dist-info
-rw-r--r-- 1 root root 19946 Oct 28 22:25 docopt.py
drwxr-xr-x 13 root root 4096 Oct 28 22:25 docx
drwxr-xr-x 3 root root 4096 Oct 28 22:25 dotenv
drwxr-xr-x 4 root root 4096 Oct 28 22:25 emoji
drwxr-xr-x 2 root root 4096 Oct 28 22:25 emoji-2.14.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 eval_type_backport
drwxr-xr-x 2 root root 4096 Oct 28 22:25 eval_type_backport-0.2.0.dist-info
drwxr-xr-x 6 root root 4096 Oct 28 22:25 feedparser
drwxr-xr-x 2 root root 4096 Oct 28 22:25 feedparser-6.0.11.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:25 filetype
drwxr-xr-x 2 root root 4096 Oct 28 22:25 filetype-1.2.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 fitz
drwxr-xr-x 24 root root 4096 Oct 28 22:25 fontTools
drwxr-xr-x 2 root root 4096 Oct 28 22:25 fonttools-4.54.1.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 frozenlist
drwxr-xr-x 2 root root 4096 Oct 28 22:25 frozenlist-1.5.0.dist-info
drwxr-xr-x 14 root root 4096 Oct 28 22:26 gpt_researcher
drwxr-xr-x 2 root root 4096 Oct 28 22:26 gpt_researcher-0.10.2.dist-info
drwxr-xr-x 5 root root 4096 Oct 28 22:25 greenlet
drwxr-xr-x 2 root root 4096 Oct 28 22:25 greenlet-3.1.1.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:25 h11
drwxr-xr-x 2 root root 4096 Oct 28 22:25 h11-0.14.0.dist-info
drwxr-xr-x 8 root root 4096 Oct 28 22:25 html5lib
drwxr-xr-x 2 root root 4096 Oct 28 22:25 html5lib-1.1.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 htmldocx
drwxr-xr-x 2 root root 4096 Oct 28 22:25 htmldocx-0.0.6.dist-info
drwxr-xr-x 6 root root 4096 Oct 28 22:25 httpcore
drwxr-xr-x 3 root root 4096 Oct 28 22:25 httpcore-1.0.6.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:25 httpx
drwxr-xr-x 3 root root 4096 Oct 28 22:25 httpx-0.27.2.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 idna
drwxr-xr-x 2 root root 4096 Oct 28 22:25 idna-3.10.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 iso639
drwxr-xr-x 3 root root 4096 Oct 28 22:25 jiter
drwxr-xr-x 2 root root 4096 Oct 28 22:25 jiter-0.6.1.dist-info
drwxr-xr-x 5 root root 4096 Oct 28 22:25 joblib
drwxr-xr-x 2 root root 4096 Oct 28 22:25 joblib-1.4.2.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 json5
drwxr-xr-x 2 root root 4096 Oct 28 22:25 json5-0.9.25.dist-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 jsonpatch-1.33.dist-info
-rw-r--r-- 1 root root 29778 Oct 28 22:25 jsonpatch.py
drwxr-xr-x 3 root root 4096 Oct 28 22:25 jsonpath
drwxr-xr-x 2 root root 4096 Oct 28 22:25 jsonpath_python-1.0.6.dist-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 jsonpointer-3.0.0.dist-info
-rw-r--r-- 1 root root 10601 Oct 28 22:25 jsonpointer.py
drwxr-xr-x 3 root root 4096 Oct 28 22:25 json_repair
drwxr-xr-x 2 root root 4096 Oct 28 22:25 json_repair-0.30.0.dist-info
drwxr-xr-x 32 root root 4096 Oct 28 22:25 langchain
drwxr-xr-x 2 root root 4096 Oct 28 22:25 langchain-0.3.4.dist-info
drwxr-xr-x 31 root root 4096 Oct 28 22:25 langchain_community
drwxr-xr-x 2 root root 4096 Oct 28 22:26 langchain_community-0.3.3.dist-info
drwxr-xr-x 23 root root 4096 Oct 28 22:25 langchain_core
drwxr-xr-x 2 root root 4096 Oct 28 22:25 langchain_core-0.3.13.dist-info
drwxr-xr-x 7 root root 4096 Oct 28 22:25 langchain_openai
drwxr-xr-x 2 root root 4096 Oct 28 22:25 langchain_openai-0.2.4.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:25 langchain_text_splitters
drwxr-xr-x 2 root root 4096 Oct 28 22:25 langchain_text_splitters-0.3.0.dist-info
drwxr-xr-x 6 root root 4096 Oct 28 22:25 langdetect
drwxr-xr-x 2 root root 4096 Oct 28 22:25 langdetect-1.0.9.dist-info
drwxr-xr-x 9 root root 4096 Oct 28 22:25 langsmith
drwxr-xr-x 2 root root 4096 Oct 28 22:25 langsmith-0.1.137.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 loguru
drwxr-xr-x 2 root root 4096 Oct 28 22:25 loguru-0.7.2.dist-info
drwxr-xr-x 6 root root 4096 Oct 28 22:25 lxml
drwxr-xr-x 2 root root 4096 Oct 28 22:25 lxml-5.3.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 lxml_html_clean
drwxr-xr-x 2 root root 4096 Oct 28 22:25 lxml_html_clean-0.3.1.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 magic
drwxr-xr-x 4 root root 4096 Oct 28 22:25 markdown
drwxr-xr-x 2 root root 4096 Oct 28 22:25 markdown2-2.5.1.dist-info
-rw-r--r-- 1 root root 161240 Oct 28 22:25 markdown2.py
drwxr-xr-x 2 root root 4096 Oct 28 22:25 Markdown-3.7.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 marshmallow
drwxr-xr-x 2 root root 4096 Oct 28 22:25 marshmallow-3.23.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 md2pdf
drwxr-xr-x 2 root root 4096 Oct 28 22:25 md2pdf-1.0.1.dist-info
drwxr-xr-x 6 root root 4096 Oct 28 22:25 mistune
drwxr-xr-x 2 root root 4096 Oct 28 22:25 mistune-3.0.2.dist-info
drwxr-xr-x 5 root root 4096 Oct 28 22:26 multi_agents
drwxr-xr-x 3 root root 4096 Oct 28 22:25 multidict
drwxr-xr-x 2 root root 4096 Oct 28 22:25 multidict-6.1.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 multipart
drwxr-xr-x 2 root root 4096 Oct 28 22:25 mypy_extensions-1.0.0.dist-info
-rw-r--r-- 1 root root 6227 Oct 28 22:25 mypy_extensions.py
drwxr-xr-x 2 root root 4096 Oct 28 22:25 nest_asyncio-1.6.0.dist-info
-rw-r--r-- 1 root root 7490 Oct 28 22:25 nest_asyncio.py
drwxr-xr-x 26 root root 4096 Oct 28 22:25 nltk
drwxr-xr-x 2 root root 4096 Oct 28 22:25 nltk-3.9.1.dist-info
drwxr-xr-x 23 root root 4096 Oct 28 22:25 numpy
drwxr-xr-x 2 root root 4096 Oct 28 22:25 numpy-1.26.4.dist-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 numpy.libs
drwxr-xr-x 4 root root 4096 Oct 28 22:25 olefile
drwxr-xr-x 2 root root 4096 Oct 28 22:25 olefile-0.47.dist-info
-rw-r--r-- 1 root root 1350 Oct 28 22:25 OleFileIO_PL.py
drwxr-xr-x 9 root root 4096 Oct 28 22:25 openai
drwxr-xr-x 3 root root 4096 Oct 28 22:25 openai-1.52.2.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 orjson
drwxr-xr-x 3 root root 4096 Oct 28 22:25 orjson-3.10.10.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:25 oxmsg
drwxr-xr-x 3 root root 4096 Oct 28 22:25 packaging
drwxr-xr-x 2 root root 4096 Oct 28 22:25 packaging-24.1.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 PIL
drwxr-xr-x 2 root root 4096 Oct 28 22:25 pillow-11.0.0.dist-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 pillow.libs
drwxr-xr-x 5 root root 4096 Oct 28 22:23 pip
drwxr-xr-x 2 root root 4096 Oct 28 22:23 pip-24.2.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:23 pkg_resources
drwxr-xr-x 3 root root 4096 Oct 28 22:25 propcache
drwxr-xr-x 2 root root 4096 Oct 28 22:25 propcache-0.2.0.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:25 psutil
drwxr-xr-x 2 root root 4096 Oct 28 22:25 psutil-6.1.0.dist-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 pycache
drwxr-xr-x 4 root root 4096 Oct 28 22:25 pycparser
drwxr-xr-x 2 root root 4096 Oct 28 22:25 pycparser-2.22.dist-info
drwxr-xr-x 8 root root 4096 Oct 28 22:25 pydantic
drwxr-xr-x 3 root root 4096 Oct 28 22:25 pydantic-2.9.2.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 pydantic_core
drwxr-xr-x 3 root root 4096 Oct 28 22:25 pydantic_core-2.23.4.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 pydantic_settings
drwxr-xr-x 3 root root 4096 Oct 28 22:25 pydantic_settings-2.6.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 pydyf
drwxr-xr-x 2 root root 4096 Oct 28 22:25 pydyf-0.11.0.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:25 pymupdf
drwxr-xr-x 2 root root 4096 Oct 28 22:25 PyMuPDF-1.24.12.dist-info
drwxr-xr-x 8 root root 4096 Oct 28 22:25 pypdf
drwxr-xr-x 2 root root 4096 Oct 28 22:25 pypdf-5.1.0.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:25 pyphen
drwxr-xr-x 2 root root 4096 Oct 28 22:25 pyphen-0.16.0.dist-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 python_dateutil-2.8.2.dist-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 python_docx-1.1.2.dist-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 python_dotenv-1.0.1.dist-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 python_iso639-2024.10.22.dist-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 python_magic-0.4.27.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 python_multipart
drwxr-xr-x 3 root root 4096 Oct 28 22:25 python_multipart-0.0.16.dist-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 python_oxmsg-0.0.1.dist-info
-rw-r--r-- 1 root root 59 Oct 28 22:25 py.typed
drwxr-xr-x 2 root root 4096 Oct 28 22:25 PyYAML-6.0.2.dist-info
drwxr-xr-x 5 root root 4096 Oct 28 22:25 rapidfuzz
drwxr-xr-x 3 root root 4096 Oct 28 22:25 rapidfuzz-3.10.1.dist-info
-rw-r--r-- 3 root root 119 Oct 3 09:44 README.txt
drwxr-xr-x 3 root root 4096 Oct 28 22:25 regex
drwxr-xr-x 2 root root 4096 Oct 28 22:25 regex-2024.9.11.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 requests
drwxr-xr-x 2 root root 4096 Oct 28 22:25 requests-2.32.3.dist-info
drwxr-xr-x 10 root root 4096 Oct 28 22:25 requests_toolbelt
drwxr-xr-x 2 root root 4096 Oct 28 22:25 requests_toolbelt-1.0.0.dist-info
drwxr-xr-x 9 root root 4096 Oct 28 22:23 setuptools
drwxr-xr-x 2 root root 4096 Oct 28 22:23 setuptools-75.1.0-py3.11.egg-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 sgmllib3k-1.0.0.dist-info
-rw-r--r-- 1 root root 17788 Oct 28 22:25 sgmllib.py
drwxr-xr-x 2 root root 4096 Oct 28 22:25 six-1.16.0.dist-info
-rw-r--r-- 1 root root 34549 Oct 28 22:25 six.py
drwxr-xr-x 4 root root 4096 Oct 28 22:25 sniffio
drwxr-xr-x 2 root root 4096 Oct 28 22:25 sniffio-1.3.1.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 soupsieve
drwxr-xr-x 3 root root 4096 Oct 28 22:25 soupsieve-2.6.dist-info
drwxr-xr-x 15 root root 4096 Oct 28 22:25 sqlalchemy
drwxr-xr-x 2 root root 4096 Oct 28 22:25 SQLAlchemy-2.0.36.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:25 tenacity
drwxr-xr-x 2 root root 4096 Oct 28 22:25 tenacity-9.0.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 tests
drwxr-xr-x 13 root root 4096 Oct 28 22:25 test_unstructured
drwxr-xr-x 3 root root 4096 Oct 28 22:25 tiktoken
drwxr-xr-x 2 root root 4096 Oct 28 22:25 tiktoken-0.8.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 tiktoken_ext
drwxr-xr-x 3 root root 4096 Oct 28 22:25 tinycss2
drwxr-xr-x 2 root root 4096 Oct 28 22:25 tinycss2-1.4.0.dist-info
drwxr-xr-x 4 root root 4096 Oct 28 22:25 tqdm
drwxr-xr-x 2 root root 4096 Oct 28 22:25 tqdm-4.66.6.dist-info
drwxr-xr-x 2 root root 4096 Oct 28 22:25 typing_extensions-4.12.2.dist-info
-rw-r--r-- 1 root root 134451 Oct 28 22:25 typing_extensions.py
drwxr-xr-x 2 root root 4096 Oct 28 22:25 typing_inspect-0.9.0.dist-info
-rw-r--r-- 1 root root 28269 Oct 28 22:25 typing_inspect.py
drwxr-xr-x 15 root root 4096 Oct 28 22:25 unstructured
drwxr-xr-x 2 root root 4096 Oct 28 22:25 unstructured-0.16.3.dist-info
drwxr-xr-x 7 root root 4096 Oct 28 22:25 unstructured_client
drwxr-xr-x 2 root root 4096 Oct 28 22:25 unstructured_client-0.26.2.dist-info
drwxr-xr-x 6 root root 4096 Oct 28 22:25 urllib3
drwxr-xr-x 3 root root 4096 Oct 28 22:25 urllib3-2.2.3.dist-info
drwxr-xr-x 9 root root 4096 Oct 28 22:25 weasyprint
drwxr-xr-x 2 root root 4096 Oct 28 22:25 weasyprint-62.3.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 webencodings
drwxr-xr-x 2 root root 4096 Oct 28 22:25 webencodings-0.5.1.dist-info
drwxr-xr-x 7 root root 4096 Oct 28 22:25 websockets
drwxr-xr-x 2 root root 4096 Oct 28 22:25 websockets-13.1.dist-info
drwxr-xr-x 5 root root 4096 Oct 28 22:23 wheel
drwxr-xr-x 2 root root 4096 Oct 28 22:23 wheel-0.44.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 wrapt
drwxr-xr-x 2 root root 4096 Oct 28 22:25 wrapt-1.16.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 _yaml
drwxr-xr-x 3 root root 4096 Oct 28 22:25 yaml
drwxr-xr-x 3 root root 4096 Oct 28 22:25 yarl
drwxr-xr-x 2 root root 4096 Oct 28 22:25 yarl-1.17.0.dist-info
drwxr-xr-x 3 root root 4096 Oct 28 22:25 zopfli
drwxr-xr-x 2 root root 4096 Oct 28 22:25 zopfli-0.2.3.post1.dist-info

The main modules of the gpt-researcher python package are the two highlighted above in bold:

  • backend
  • gpt-researcher

If I then ls the backend folder, we have the following files:

-rw-r--r-- 1 root root 31 Oct 28 22:26 init.py
drwxr-xr-x 3 root root 4096 Oct 28 22:26 memory
drwxr-xr-x 2 root root 4096 Oct 28 22:26 pycache
drwxr-xr-x 5 root root 4096 Oct 28 22:26 report_type
drwxr-xr-x 3 root root 4096 Oct 28 22:26 server
-rw-r--r-- 1 root root 2810 Oct 28 22:26 utils.py

But if someone else installs a package named backend, pip install backend, the pip system may not generate an error saying that a package named backend already exists. It will install the named package and those files will end up in the existing backend folder created by the backend module of gpt-researcher package.

If we then ls the backend folder, we'll get the following:

drwxr-xr-x 3 root root 4096 Oct 28 22:27 fs
-rw-r--r-- 1 root root 2 Oct 28 22:27 init.py
drwxr-xr-x 3 root root 4096 Oct 28 22:26 memory
drwxr-xr-x 2 root root 4096 Oct 28 22:27 pycache
drwxr-xr-x 5 root root 4096 Oct 28 22:26 report_type
drwxr-xr-x 3 root root 4096 Oct 28 22:26 server
drwxr-xr-x 4 root root 4096 Oct 28 22:27 util
-rw-r--r-- 1 root root 2810 Oct 28 22:26 utils.py

Installing the package backend will overwrite any similarly named files with those of the installing package. For example, before I installed the package named backend, the __init__.py contained, from multi_agents import agents from the gpt-researcher package, but after installing the package backend, this gets replaced with just #, and thus now breaks any calls to from backend import agents

Ideally, everything related to the gpt-researcher package, should be qualified by the package name as is the case with packages typically installed via pip before such as

from gpt_researcher.backend import agents
from gpt_researcher import ....

Whilst it's probably unlikely for someone to install gpt-research and a package named backend, it is a potential case of failure in the future if someone does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants