Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test Conda components harvest #568

Closed
qtomlinson opened this issue Apr 13, 2024 · 11 comments · Fixed by #575
Closed

Test Conda components harvest #568

qtomlinson opened this issue Apr 13, 2024 · 11 comments · Fixed by #575

Comments

@qtomlinson
Copy link
Collaborator

qtomlinson commented Apr 13, 2024

This documents the QA work that is required for the implementation of Conda component harvest:

  • Trigger harvest and verify that component is harvest correctly
    case 1. coordinates with version specified,
    case 2. latest version of component when version is not specified

  • related source component correctlly identified and harvested

  • Verify definition correctly computed:
    case 1, license detected in scancode package information.
    case 2 if license not detected in scancode package level, figure out license based on license files in /info directory.

  • Add integration test
    add a test case (coordinates) in testConfig.js + fixture for Conda in operation repo

@qtomlinson
Copy link
Collaborator Author

@lisahoong Please feel free to add on the cases that you can think of. Appreciate your help!

@qtomlinson
Copy link
Collaborator Author

qtomlinson commented Apr 15, 2024

Following our code review discussion, I would like to suggest testing the impact of implementing caching indexes (specifically repodata.json and channeldata.json) on the service's performance. This can be done by calling the /originConda endpoint to query Conda components. This can be currently tested on service dev deployment

@qtomlinson
Copy link
Collaborator Author

Integration Test Readme can be helpful. Please also feel free to put up a PR to enhance it for others. Sample PR might be helpful for adding an integration test case. All the test components are tested in definitionTest.js and attachmentTest.js. When there are attachments in your test definition, more fixtures may need to be added and loaded for the attachmentTest.js.

@lamarrr
Copy link

lamarrr commented Apr 29, 2024

the tests have been added to clearlydefined/operations#72 and verified

@qtomlinson
Copy link
Collaborator Author

Following our code review discussion, I would like to suggest testing the impact of implementing caching indexes (specifically repodata.json and channeldata.json) on the service's performance.

@lamarrr any update?

@lamarrr
Copy link

lamarrr commented May 4, 2024

I did send the memory impact of the crawler via email yesterday.
The service has a similar memory increase impact. perhaps you need it documented somewhere?
There are no existing tests to measure it as I had to look at the overall memory consumption of the crawler process in the docker container.

@qtomlinson
Copy link
Collaborator Author

qtomlinson commented May 6, 2024

This is to document the APIs that can be used during testing:
Post call to trigger harvest with payload
https://dev-api.clearlydefined.io/harvest

[
	{
		"tool": "component",
		"coordinates": "conda/conda-forge/-/numpy/1.13.0"
	},
	{
		"tool": "component",
		"coordinates": "conda/conda-forge/linux-aarch64/numpy/-py36"
	},
	{
		"tool": "component",
		"coordinates": "conda/conda-forge/-/tensorflow"
	}
]

Verify harvest results

Verify the definition

Test to verify /origins/conda API: the following calls should succeed

  • get call to /origins/conda/conda-forge/linux-64/tensorflow/revisions
  • get call to /origins/conda/conda-forge/tensorflow

@lamarrr
Copy link

lamarrr commented May 7, 2024

I have tested and verified the integration tests requested in the issue.
I've also measured the impact of the caches by looking at the process memory usage and found it to be minimal (128MB idle -> 1.4 GB when running -> 450MB for the linux-64 cache).

@qtomlinson
Copy link
Collaborator Author

Exception encountered during harvest on dev deployment

2024-05-04T00:07:45.890455016Z Error: ENOENT: no such file or directory, open '/tmp/cd/conda-forge-channelDataFile.json'
2024-05-04T00:07:45.890458517Z Emitted 'error' event on WriteStream instance at:
2024-05-04T00:07:45.890461917Z     at WriteStream.onerror (node:internal/streams/legacy:62:12)
2024-05-04T00:07:45.890465217Z     at WriteStream.emit (node:events:517:28)
2024-05-04T00:07:45.890468417Z     at WriteStream.emit (node:domain:489:12)
2024-05-04T00:07:45.890471717Z     at emitErrorNT (node:internal/streams/destroy:151:8)
2024-05-04T00:07:45.890474817Z     at emitErrorCloseNT (node:internal/streams/destroy:116:3)
2024-05-04T00:07:45.890478617Z     at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
2024-05-04T00:07:45.890482017Z   errno: -2,
2024-05-04T00:07:45.890485417Z   code: 'ENOENT',
2024-05-04T00:07:45.890488517Z   syscall: 'open',
2024-05-04T00:07:45.890491717Z   path: '/tmp/cd/conda-forge-channelDataFile.json'
2024-05-04T00:07:45.890495017Z }

@qtomlinson
Copy link
Collaborator Author

@lamarrr Please provide feedback on PR. Thanks!

@lamarrr
Copy link

lamarrr commented May 8, 2024

@qtomlinson what was the purpose of replacing the directory slash with the hyphen?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants