Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Custom Projects List Feature #14

Open
wants to merge 57 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
d26d6b1
Add fork:false to Github queries
mrthankyou Feb 11, 2021
2e58640
Initial work setting up custom LGTM project list curation
mrthankyou Feb 17, 2021
d4098d3
Clean up code and get basic cache parsing file setup
mrthankyou Feb 17, 2021
e69003a
Continued work on custom project lists feature
mrthankyou Feb 17, 2021
02f16a3
Fix misc issues
mrthankyou Feb 17, 2021
d3898cb
Add comment and ignore cache files
mrthankyou Feb 17, 2021
8e839a8
Refactor code
mrthankyou Feb 17, 2021
5c996fc
Reword text
mrthankyou Feb 17, 2021
7bd1bf0
Revert stars to accurate count
mrthankyou Feb 17, 2021
3f8d336
Remove comment
mrthankyou Feb 17, 2021
ca2bbc6
Update README.md
mrthankyou Feb 17, 2021
69543fb
Add custom project list feature to search term script
mrthankyou Feb 17, 2021
a687d35
Save only real projects to LGTM project lists
mrthankyou Feb 17, 2021
a4133cc
Remove unnecessary modules
mrthankyou Feb 17, 2021
b5ecc8a
Create cache folder if it already doesn't exist
mrthankyou Feb 18, 2021
c770a2f
Add draft for build in progress guard clause
mrthankyou Feb 18, 2021
b83bacf
Accept both proto and real projects
mrthankyou Feb 18, 2021
1b2982a
Add ProjectBuild and ProjectBuilds classes
mrthankyou Feb 19, 2021
0bdd4cc
Remove logs and add new request for proto projects
mrthankyou Feb 19, 2021
88e3793
Save more project data to cache files
mrthankyou Feb 19, 2021
f515563
Refactor how we move repos to LGTM lists
mrthankyou Feb 19, 2021
a853b98
Update README with LGTM build process info
mrthankyou Feb 19, 2021
01842f7
Add Python documentation for functions
mrthankyou Feb 21, 2021
2313cc3
Add comment
mrthankyou Feb 22, 2021
32d4fd9
Remove unnecessary comments
mrthankyou Feb 22, 2021
6c825f6
Add guard clauses and improved project filtering
mrthankyou Feb 22, 2021
3922973
Increase timer
mrthankyou Feb 22, 2021
a3cf8e2
Uncomment code
mrthankyou Feb 22, 2021
50fc91e
Remove unnecessary comment
mrthankyou Feb 22, 2021
2cd04b5
Add HTTP retries
mrthankyou Feb 22, 2021
c8e33ae
Remove unnecessary prints
mrthankyou Feb 22, 2021
58b4d1e
Fix various issues with moving repos to lists
mrthankyou Feb 23, 2021
08f1b7c
Add HTTP retries when retrieving a project
mrthankyou Feb 24, 2021
429c9ba
Add check for protoprojects
mrthankyou Feb 24, 2021
ba0e6f4
Handle exceptions from LGTM
mrthankyou Mar 3, 2021
85b368e
Delete test.py
mrthankyou Mar 3, 2021
cbe5fa5
Clarify API call to LGTM
mrthankyou Mar 3, 2021
1e40f13
Refactor how we build SimpleProjects
mrthankyou Mar 3, 2021
aa14305
Rename method
mrthankyou Mar 3, 2021
bedc587
Remove useless code
mrthankyou Mar 3, 2021
e362ef8
Rename ProjectBuild#name and refactor code
mrthankyou Mar 3, 2021
fda2a9f
Add SimpleProject#project_type method
mrthankyou Mar 3, 2021
b51fced
Continue refactoring how we determine LGTM project types
mrthankyou Mar 3, 2021
0b182b9
Rename ProjectBuild#id to #key
mrthankyou Mar 3, 2021
c6db487
Update comment on refactoring
mrthankyou Mar 3, 2021
9eb9c4a
Refactor SimpleProject to store the project type
mrthankyou Mar 3, 2021
0ba1576
Simplify logic in determining project state
mrthankyou Mar 3, 2021
476faeb
Add comments
mrthankyou Mar 3, 2021
2c0d44b
Refactor logic with guard clauses
mrthankyou Mar 3, 2021
574d0f6
Add unfollow_all_followed_projects.py script
mrthankyou Mar 3, 2021
803ebd7
Convert ProjectBuild to a subclass of SimpleProject
mrthankyou Mar 3, 2021
69b6614
Refactor simple project build to not raise error
mrthankyou Mar 3, 2021
89a25d7
Add checks confirming LGTM project is valid
mrthankyou Mar 3, 2021
f44b959
Fix misc errors
mrthankyou Mar 3, 2021
9ba28cf
Reword comment
mrthankyou Mar 4, 2021
2e1d595
Remove unnecessary code
mrthankyou Mar 4, 2021
5ea2b9a
Remove comments
mrthankyou Mar 4, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 30 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,9 +66,38 @@ python3 follow_repos_by_search_term_via_code_instances.py <LANGUAGE> <SEARCH_TER
python3 follow_repos_by_search_term.py <LANGUAGE> <SEARCH_TERM>

# Finds top repositories that have a minimum 500 stars and use the provided programming language.
python3 follow_top_repos_by_star_count.py <LANGUAGE>
python3 follow_top_repos_by_star_count.py <LANGUAGE> <CUSTOM_LIST_NAME>(optional)
```

## The Custom Projects Lists Feature
In developing these collection of scripts, we realized that when a user follows thousands of repos in their LGTM account, there is a chance that the LGTM account will break. You won't be able to use the query console and some API
calls will be broken.

To resolve this, we decided to create a feature users can opt-in. This feature called "Custom Projects Lists" does the
following:

- Follows all repos (aka project) in your LGTM account.
- Stores every project you follow in a txt file.
- At a later date (we suggest 24 hours), the user may run a follow-up command that will take the repos followed, add them to a LGTM custom list, and finally unfollow the projects in the user's LGTM account.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big fan of this!


Although these steps are tedious, this is the best work-around we've found. We avoid bricking the LGTM account when projects are placed in custom lists. Also, we typically wait 24 hours since if the project is new to LGTM it will want to first process the project and projects being processed can't be added to custom lists.

Finally, by having custom lists we hope that the security researcher will have an easier time picking which repos they want to test.

### How To Run The Custom Projects Lists Feature
In some of the commands above, you will see the <CUSTOM_LIST_NAME> option. This is optional for all
commands. This CUSTOM_LIST_NAME represents the name of a LGTM project list that will be created and used to add projects to. Any projects found from that command will then be added to the LGTM custom list. Let's show an example below to get a better idea of how this works:

1. Run a command passing in the name of the custom list name. The command below will follow Javascript repos and generate a cache file of every repo you follow for the project list called "cool_javascript_projects".

`python3 follow_top_repos_by_star_count.py javascript cool_javascript_projects`

2. Wait 1 - 24 hours.

3. Run the command below. This will take a cached file you created earlier, create a LGTM custom project list, add the projects to that project list, and finally unfollow the repositories in your LGTM account.

`python3 move_repos_to_lgtm_lists.py`

## Legal

The author of this script assumes no liability for your use of this project, including,
Expand Down
Empty file added cache/test.txt
Empty file.
2 changes: 1 addition & 1 deletion follow_repos_by_search_term.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ def find_and_save_projects_to_lgtm(language: str, search_term: str):
site = LGTMSite.create_from_file()

for date_range in utils.github_dates.generate_dates():
repos = github.search_repositories(query=f'language:{language} created:{date_range} {search_term}')
repos = github.search_repositories(query=f'language:{language} fork:false created:{date_range} {search_term}')

for repo in repos:
# Github has rate limiting in place hence why we add a sleep here. More info can be found here:
Expand Down
50 changes: 44 additions & 6 deletions follow_top_repos_by_star_count.py
Original file line number Diff line number Diff line change
@@ -1,28 +1,57 @@
# ## How the script currently works:
# - We first get all the github repos.
# - We then take each repo and follow the repository in lgtm
#
#
# ## changes that need to be made
# - Since we are adding lists to lgtm, we also need to store someplace every
# repo that we added to lgtm.
# - once teh script is done the list of lgtm saved projects will be stored in a txt file
# - after a period of time, say 24 hrs, we then run a companion script that moves
# lgtm followed projects into their own lists. this script will take the text file name
# and use that to create a list. it will then move the lgtm projects into that list and
# unfollow them from the lgtm list. this script can be used universally.
#
# - explicit changes:
# - current scripts:
# - each script must now accept a list arg that represents the list name that you want
# your repos to be saved to.
# - each script must now add the lgtm project id to a file that stores repos (txt file)
# - new script:
# - we need a script that will take a text file, loop through the text file,
# and for each item in the text file add the item to the lgtm list (the list name
# is derived from the name of the ext file)
#

from typing import List
from lgtm import LGTMSite

import utils.github_dates
import utils.github_api
import utils.cacher # utils.cacher.write_project_ids_to_file
import sys
import time

def save_project_to_lgtm(site: 'LGTMSite', repo_name: str):
def save_project_to_lgtm(site: 'LGTMSite', repo_name: str) -> dict:
print("Adding: " + repo_name)
# Another throttle. Considering we are sending a request to Github
# owned properties twice in a small time-frame, I would prefer for
# this to be here.
time.sleep(1)

repo_url: str = 'https://github.com/' + repo_name
site.follow_repository(repo_url)
project = site.follow_repository(repo_url)

print("Saved the project: " + repo_name)
return project

def find_and_save_projects_to_lgtm(language: str):
def find_and_save_projects_to_lgtm(language: str, custom_list_name: str) -> List[str]:
github = utils.github_api.create()
site = LGTMSite.create_from_file()
saved_project_ids: List[str] = []

for date_range in utils.github_dates.generate_dates():
repos = github.search_repositories(query=f'stars:>500 created:{date_range} sort:stars language:{language}')
repos = github.search_repositories(query=f'stars:>500 created:{date_range} fork:false sort:stars language:{language}')

for repo in repos:
# Github has rate limiting in place hence why we add a sleep here. More info can be found here:
Expand All @@ -32,7 +61,11 @@ def find_and_save_projects_to_lgtm(language: str):
if repo.archived or repo.fork:
continue

save_project_to_lgtm(site, repo.full_name)
saved_project = save_project_to_lgtm(site, repo.full_name)
saved_project_id = saved_project['realProject'][0]['key']
saved_project_ids.append(saved_project)

return saved_project_ids

if len(sys.argv) < 2:
print("Please provide a language you want to search")
Expand All @@ -41,4 +74,9 @@ def find_and_save_projects_to_lgtm(language: str):
language = sys.argv[1].capitalize()

print('Following the top repos for %s' % language)
find_and_save_projects_to_lgtm(language)
saved_project_ids = find_and_save_projects_to_lgtm(language)

# If the user provided a second arg then they want to create a custom list.
if len(sys.argv) < 3:
custom_list_name = sys.argv[2]
utils.cacher.write_project_ids_to_file(saved_project_ids, custom_list_name)
9 changes: 7 additions & 2 deletions lgtm.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@ def _make_lgtm_get(self, url: str) -> dict:
)
return r.json()

# Retrieves a user's projects
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the python standard for API doc comments?

Copy link
Contributor Author

@mrthankyou mrthankyou Feb 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you point me in that direction? I can google it but want to make sure we are on the same page. I'm not native to python :/

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@JLLeitschuh

Can you send that link to me? I tried looking online but can't find any defacto API doc comment rules.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

def get_my_projects(self) -> List[dict]:
url = 'https://lgtm.com/internal_api/v0.2/getMyProjects?apiVersion=' + self.api_version
data = self._make_lgtm_get(url)
Expand All @@ -43,10 +44,12 @@ def get_my_projects(self) -> List[dict]:
else:
raise LGTMRequestException('LGTM GET request failed with response: %s' % str(data))

# Given an org name, retrieve a user's projects under that org
def get_my_projects_under_org(self, org: str) -> List['SimpleProject']:
projects_sorted = LGTMDataFilters.org_to_ids(self.get_my_projects())
return LGTMDataFilters.extract_project_under_org(org, projects_sorted)

# This method handles making a POST request to the LGTM server
def _make_lgtm_post(self, url: str, data: dict) -> dict:
api_data = {
'apiVersion': self.api_version
Expand Down Expand Up @@ -74,6 +77,7 @@ def _make_lgtm_post(self, url: str, data: dict) -> dict:
else:
raise LGTMRequestException('LGTM POST request failed with response: %s' % str(data_returned))

# Given a project list id and a list of project ids, add the projects to the project list.
def load_into_project_list(self, into_project: int, lgtm_project_ids: List[str]):
url = "https://lgtm.com/internal_api/v0.2/updateProjectSelection"
# Because LGTM uses some wacky format for it's application/x-www-form-urlencoded data
Expand Down Expand Up @@ -104,14 +108,15 @@ def force_rebuild_project(self, simple_project: 'SimpleProject'):
except LGTMRequestException:
print('Failed rebuilding project. This may be because it is already being built. `%s`' % simple_project)

def follow_repository(self, repository_url: str):
def follow_repository(self, repository_url: str) -> dict:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked and can't find any case where returning data in the function breaks the code.

url = "https://lgtm.com/internal_api/v0.2/followProject"
data = {
'url': repository_url,
'apiVersion': self.api_version
}
self._make_lgtm_post(url, data)
return self._make_lgtm_post(url, data)

# Given a project id, unfollow the project
def unfollow_repository_by_id(self, project_id: str):
url = "https://lgtm.com/internal_api/v0.2/unfollowProject"
data = {
Expand Down
40 changes: 40 additions & 0 deletions move_repos_to_lgtm_lists.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# - new script:
# - we need a script that will take a text file, loop through the text file,
# and for each item in the text file add the item to the lgtm list (the list name
# is derived from the name of the ext file)
#

from typing import List
from lgtm import LGTMSite

import sys
import time
import os

cached_files = os.listdir("cache")
site = LGTMSite.create_from_file()

for cached_file in cached_files:
# This is dirty. Is there an easier way to do this?
cached_file = "cache/" + cached_file

project_list_name = cached_file.split(".")[0]

# We want to find or create a project list based on the the name of
# the text file that holds all of the projects we are currently following.
project_list_data = site.get_or_create_project_list(project_list_name)
project_list_id = project_list_data['realProject'][0]['key']
file = open("cache/" + cached_file, "r")

project_ids = file.read()
# With the project list id and the project ids, we now want to save the repos
# we currently follow to the project list
site.load_into_project_list(project_list_id, project_ids)

for project_id in project_ids:
print(project_id)
# The last thing we need to do is tidy up and unfollow all the repositories
# we just added to our project list.
site.unfollow_repository_by_id(project_id)

os.remove(cached_file)
36 changes: 36 additions & 0 deletions test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
from typing import List
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will remove this file once the code is more polished.

import sys
import os

print(os.listdir("cache"))
mrthankyou marked this conversation as resolved.
Show resolved Hide resolved


#
# projects: List[str] = []
#
# print(sys.argv)

# from lgtm import LGTMSite
# lgtm_site = LGTMSite.create_from_file()
#
# repo_url: str = 'https://github.com/google/jax'
#
# result = lgtm_site.follow_repository(repo_url)
# print("1111111111")
# print("1111111111")
# print("1111111111")
# print("1111111111")
# print("1111111111")
# project_id = result['realProject'][0]['key']
# print(project_id)
#
# print("1111111111")
# print("1111111111")
# print("1111111111")
# print("1111111111")
# project_list_id = lgtm_site.get_or_create_project_list("test_project_16")
# print(project_list_id)
#
# lgtm_site.load_into_project_list(project_list_id, [project_id])
#
# lgtm_site.unfollow_repository_by_id(project_id)
7 changes: 7 additions & 0 deletions utils/cacher.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
from typing import List

def write_project_ids_to_file(project_ids: List[str], file_name: str):
file = open("cache/" + file_name + ".txt", "a")
for project_id in project_ids:
file.write(project_id + "\n")
file.close()