Replies: 4 comments
-
Thanks for bringing this up!
For 1. We may want to consider prefixing job ids with the provider, then we can do away with this check entirely. For 2. I think we just need better tests which potentially ensure that we are able to always get a result, and which test independently the failure cases |
Beta Was this translation helpful? Give feedback.
-
I think your solution for 1 can work really well 👍. For 2, yes, we need more testing. I've been trying to write more unit tests, but have been held back by issues. And I don't want to start writing tests for things that have pending issues, but we'll get there one day 😅. |
Beta Was this translation helpful? Give feedback.
-
Ahah yea there is a lot of testing ground to cover, in an ideal world we have 100% coverage but this is open source after all 🤓. To me priority ought to be getting our large city issue resolved and all else can take a back seat. |
Beta Was this translation helpful? Give feedback.
-
I think the majority of this issue should be resolved by changes within #136, however we may have a limitation of relying on the provider URIs here. We may want to add duplicate jobs to the CSV with a special annotation since it will still be possible if the Provider does not supply us independant URIs. |
Beta Was this translation helpful? Give feedback.
-
Hello Everyone,
Hope you are all doing well.
Just wanted to start a discussion about some things as I much rather discuss them first before making a PR.
I want to discuss mainly two issues.
This is on
jobfunnel.py:218
. This is currently what is holding #132 from being merged in. And I wonder if this is even the proper way of handling inter-scraper key conflicts? I could be missing something, but I think this function should not be throwing an exception. I say this because I'd think it is perfectly possible that two different sites may have the same job id and still have different jobs, right? This is what my intuition tells me, I don't have anything concrete to support this. And even if it so happened that two ids that are the same from two different sites means the same job, then I still don't think this function should throw an exception. It could just return abool
and log a warning. And if it must return an exception, it should at the very least be caught, which it is not at the moment. I hope these points make sense.This looks intentional and the purpose is clear: If there are no jobs in the dictionary, then return a non-zero code which means failure in most systems. The issue I see with this is that at least to me it seems that just because there are no jobs, it doesn't that there is a bug. This could very much be because of forces outside of JobFunnel; a CAPTCHA, the sites are down at the moment in time; the network is down at that moment in time, etc. I totally understand the intent of this, but I'm not sure if it is the best way of handling it because our CI could "fail" because of a CAPTCHA or because JobFunnel was't able to find any remote jobs, which is currently the case with our last GitHub Actions build.
Just wanted to start some discussion around this before I attempt to make any changes.
Cheers!
Beta Was this translation helpful? Give feedback.
All reactions