Test Conda components harvest #568

qtomlinson · 2024-04-13T02:13:53Z

This documents the QA work that is required for the implementation of Conda component harvest:

Trigger harvest and verify that component is harvest correctly
case 1. coordinates with version specified,
case 2. latest version of component when version is not specified
related source component correctlly identified and harvested
Verify definition correctly computed:
case 1, license detected in scancode package information.
case 2 if license not detected in scancode package level, figure out license based on license files in /info directory.
Add integration test
add a test case (coordinates) in testConfig.js + fixture for Conda in operation repo

qtomlinson · 2024-04-13T02:26:25Z

@lisahoong Please feel free to add on the cases that you can think of. Appreciate your help!

qtomlinson · 2024-04-15T17:38:44Z

Following our code review discussion, I would like to suggest testing the impact of implementing caching indexes (specifically repodata.json and channeldata.json) on the service's performance. This can be done by calling the /originConda endpoint to query Conda components. This can be currently tested on service dev deployment

qtomlinson · 2024-04-19T16:08:04Z

Integration Test Readme can be helpful. Please also feel free to put up a PR to enhance it for others. Sample PR might be helpful for adding an integration test case. All the test components are tested in definitionTest.js and attachmentTest.js. When there are attachments in your test definition, more fixtures may need to be added and loaded for the attachmentTest.js.

lamarrr · 2024-04-29T10:00:25Z

the tests have been added to clearlydefined/operations#72 and verified

qtomlinson · 2024-05-03T23:34:20Z

Following our code review discussion, I would like to suggest testing the impact of implementing caching indexes (specifically repodata.json and channeldata.json) on the service's performance.

@lamarrr any update?

lamarrr · 2024-05-04T00:06:30Z

I did send the memory impact of the crawler via email yesterday.
The service has a similar memory increase impact. perhaps you need it documented somewhere?
There are no existing tests to measure it as I had to look at the overall memory consumption of the crawler process in the docker container.

qtomlinson · 2024-05-06T22:24:51Z

This is to document the APIs that can be used during testing:
Post call to trigger harvest with payload
https://dev-api.clearlydefined.io/harvest

[
	{
		"tool": "component",
		"coordinates": "conda/conda-forge/-/numpy/1.13.0"
	},
	{
		"tool": "component",
		"coordinates": "conda/conda-forge/linux-aarch64/numpy/-py36"
	},
	{
		"tool": "component",
		"coordinates": "conda/conda-forge/-/tensorflow"
	}
]

Verify harvest results

getting harvest results:
- list harvest results: https://dev-api.clearlydefined.io/harvest/conda/conda-forge/linux-64/tensorflow?form=list
- get the harvest results: https://dev-api.clearlydefined.io/harvest/conda/conda-forge/linux-64/tensorflow/2.15.0-cuda120py311h5cbd639_3?form=raw

Verify the definition

get call to /definitions, e.g.
https://dev-api.clearlydefined.io/definitions/conda/conda-forge/linux-64/tensorflow/2.15.0-cuda120py311h5cbd639_3
can also verify definition in the web ui using the coordinates

Test to verify /origins/conda API: the following calls should succeed

get call to /origins/conda/conda-forge/linux-64/tensorflow/revisions
get call to /origins/conda/conda-forge/tensorflow

lamarrr · 2024-05-07T13:57:14Z

I have tested and verified the integration tests requested in the issue.
I've also measured the impact of the caches by looking at the process memory usage and found it to be minimal (128MB idle -> 1.4 GB when running -> 450MB for the linux-64 cache).

qtomlinson · 2024-05-07T17:01:16Z

Exception encountered during harvest on dev deployment

2024-05-04T00:07:45.890455016Z Error: ENOENT: no such file or directory, open '/tmp/cd/conda-forge-channelDataFile.json'
2024-05-04T00:07:45.890458517Z Emitted 'error' event on WriteStream instance at:
2024-05-04T00:07:45.890461917Z     at WriteStream.onerror (node:internal/streams/legacy:62:12)
2024-05-04T00:07:45.890465217Z     at WriteStream.emit (node:events:517:28)
2024-05-04T00:07:45.890468417Z     at WriteStream.emit (node:domain:489:12)
2024-05-04T00:07:45.890471717Z     at emitErrorNT (node:internal/streams/destroy:151:8)
2024-05-04T00:07:45.890474817Z     at emitErrorCloseNT (node:internal/streams/destroy:116:3)
2024-05-04T00:07:45.890478617Z     at process.processTicksAndRejections (node:internal/process/task_queues:82:21) {
2024-05-04T00:07:45.890482017Z   errno: -2,
2024-05-04T00:07:45.890485417Z   code: 'ENOENT',
2024-05-04T00:07:45.890488517Z   syscall: 'open',
2024-05-04T00:07:45.890491717Z   path: '/tmp/cd/conda-forge-channelDataFile.json'
2024-05-04T00:07:45.890495017Z }

qtomlinson · 2024-05-07T21:46:58Z

@lamarrr Please provide feedback on PR. Thanks!

lamarrr · 2024-05-08T14:59:01Z

@qtomlinson what was the purpose of replacing the directory slash with the hyphen?

qtomlinson mentioned this issue May 7, 2024

Fix ENOENT error during harvesting Conda components #575

Merged

elrayle closed this as completed in #575 May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Conda components harvest #568

Test Conda components harvest #568

qtomlinson commented Apr 13, 2024 •

edited

Loading

qtomlinson commented Apr 13, 2024

qtomlinson commented Apr 15, 2024 •

edited

Loading

qtomlinson commented Apr 19, 2024

lamarrr commented Apr 29, 2024 •

edited

Loading

qtomlinson commented May 3, 2024

lamarrr commented May 4, 2024

qtomlinson commented May 6, 2024 •

edited

Loading

lamarrr commented May 7, 2024

qtomlinson commented May 7, 2024

qtomlinson commented May 7, 2024

lamarrr commented May 8, 2024

Test Conda components harvest #568

Test Conda components harvest #568

Comments

qtomlinson commented Apr 13, 2024 • edited Loading

qtomlinson commented Apr 13, 2024

qtomlinson commented Apr 15, 2024 • edited Loading

qtomlinson commented Apr 19, 2024

lamarrr commented Apr 29, 2024 • edited Loading

qtomlinson commented May 3, 2024

lamarrr commented May 4, 2024

qtomlinson commented May 6, 2024 • edited Loading

lamarrr commented May 7, 2024

qtomlinson commented May 7, 2024

qtomlinson commented May 7, 2024

lamarrr commented May 8, 2024

qtomlinson commented Apr 13, 2024 •

edited

Loading

qtomlinson commented Apr 15, 2024 •

edited

Loading

lamarrr commented Apr 29, 2024 •

edited

Loading

qtomlinson commented May 6, 2024 •

edited

Loading