Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Add needimport caching and needs_import_cache_size configuration #1297

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

chrisjsewell
Copy link
Member

@chrisjsewell chrisjsewell commented Sep 12, 2024

This PR introduces "lazy and size-bounded" caching for the reading of needs.json in the needimport directive.

This reads/writes to an in-memory cache, keyed on the path and mtime,
and is bounded by the needs_import_cache_size (configurable by the user),
which sets the maximum number of needs allowed in the cache,
to ensure we do not have large increases in build memory usage.


Note, in #1148 there was discussion of "centralised, pre-caching",
however, that is problematic because:

  1. It means all import sources have to be read in for every build/re-build, irrespective of whether they actually may be used
  2. this can introduce a noticeable increase in memory usage and time for re-builds
  3. for parallel builds, all of this data will be copied to every process, irrespective of whether that processes actually uses it, again meaning potentially large multipliers of memory usage

Copy link

codecov bot commented Sep 12, 2024

Codecov Report

Attention: Patch coverage is 90.90909% with 4 lines in your changes missing coverage. Please review.

Project coverage is 86.99%. Comparing base (4e10030) to head (3997aee).
Report is 53 commits behind head on master.

Files with missing lines Patch % Lines
sphinx_needs/directives/needimport.py 90.47% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1297      +/-   ##
==========================================
+ Coverage   86.87%   86.99%   +0.11%     
==========================================
  Files          56       60       +4     
  Lines        6532     6998     +466     
==========================================
+ Hits         5675     6088     +413     
- Misses        857      910      +53     
Flag Coverage Δ
pytests 86.99% <90.90%> (+0.11%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@chrisjsewell chrisjsewell marked this pull request as draft September 12, 2024 21:04
@arwedus
Copy link
Contributor

arwedus commented Sep 13, 2024

Hi @chrisjsewell , first of all thanks for tackling this topic, even before I had conducted the performance measurement :-)

Note, in #1148 there was discussion of "centralised, pre-caching", however, that is problematic because:

1. It means all import sources have to be read in for every build/re-build, irrespective of whether they actually may be used

How do we handle that the needs.json import source has changed? Sphinx will not consider the importing document changed, so in this case do we need to trigger a full re-build? Or does sphinx-needs have a way to handle this?

Is this maybe a general problem of needimport?

@chrisjsewell
Copy link
Member Author

chrisjsewell commented Sep 13, 2024

How do we handle that the needs.json import source has changed? Sphinx will not consider the importing document changed

@arwedus It informs sphinx of the documents dependency on the file (similar to e.g. literalinclude):

self.env.note_dependency(correct_need_import_path)

so yes sphinx will check if it's mtime is changed and, if so, re-build all dependant documents

@chrisjsewell chrisjsewell marked this pull request as ready for review September 13, 2024 18:47
Copy link
Member

@ubmarco ubmarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really help with many needimports of the same file.

sphinx_needs/directives/needimport.py Outdated Show resolved Hide resolved
data = needs_import_list["versions"][version]

# TODO this is not exactly NeedsInfoType, because the export removes/adds some keys
needs_list: dict[str, NeedsInfoType] = data["needs"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't know how to feel about calling a dictionary a list :)

needs_list = needs_list_filtered
# note we need to deepcopy here, as we are going to modify the data,
# but we want to ensure data referenced from the cache is not modified
needs_list = deepcopy(needs_list_filtered)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe there's a code logic options that combines the 2 deepcopy

Copy link
Member

@ubmarco ubmarco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more request from my side.


.. versionadded:: 3.1.0

Sets the maximum number of needs cached by the :ref:`needimport` directive,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ImportCache stores
self._cache: OrderedDict[tuple[str, float], dict[str, Any]] = OrderedDict()
where tuple[str, float] is path and mtime.
Items are popped until the dict size is less than needs_import_cache_size.
So not the number of needs is relevant, but the number of distinct needs.json paths (+mtime combination).
If I'm right, please rephrase to make that clearer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants