Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

For solving error for cases when github repo is found however subdir is None #13

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion package_locator/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ def get_base_repo_url(repo_url):

def search_for_github_repo(data):
urls = set()
url_pattern = re.compile(r"""https?:\/\/(?:www\.)?github\.com[^\s)|<|>]+""")
url_pattern = re.compile(r"""https?:\/\/(?:www\.)?github\.com[^\s)|<|>\"]+""")
Copy link
Owner

@nasifimtiazohi nasifimtiazohi Mar 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show me a test case where a backslash would help?
Also, are we sure that a backslash cannot legitimately appear within a URL?

data = flatten(data)
for k in data.keys():
if isinstance(data[k], str) and re.search(url_pattern, data[k]) and " " not in data[k]:
Expand Down
6 changes: 5 additions & 1 deletion package_locator/locator.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ def search_github_url_in_json_data(ecosystem, package, json_data):
subdir = locate_subdir(ecosystem, package, repo_url)
return repo_url, subdir
except Exception as e:
if repo_url!= None:
return repo_url, None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In search_github_url_in_json_data, we only return the repo_url if we could validate the repo by locating the package directory, otherwise we don't return anything.

The rationale here is that we can pass in any data to this function, e.g., package homepage, which may contain various GitHub URLs in the data for whatever reason. Therefore, a validation step here is a must.

Let me know if I'm missing anything.

continue
return None, None

Expand Down Expand Up @@ -123,4 +125,6 @@ def get_repository_url_and_subdir(ecosystem, package):
elif ecosystem == CARGO:
repo_url, subdir = get_cargo_location(package)

return get_base_repo_url(repo_url), postprocess_subdir(subdir)
if subdir != None:
return get_base_repo_url(repo_url), postprocess_subdir(subdir)
return get_base_repo_url(repo_url), None
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This is indeed a bug. I introduced it in the last commit on this file, it seems, but I think I processed my data set before introducing this bug. But we definitely need to fix this, and add a test case for this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a second thought, the change needs to happen within postprocess_subdir. That function requires an input validation check. I'll do that myself. But thanks a lot for catching the bug.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed it.