Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add type annotations #37

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/python-package.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,16 @@ jobs:
python -m pip install --upgrade pip
python -m pip install flake8 pytest pytest-cov
python -m pip install --upgrade setuptools setuptools_scm wheel
python setup.py install
python -m pip install .
#- name: Lint with flake8
# run: |
# # stop the build if there are Python syntax errors or undefined names
# flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
# flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Check with Mypy
run: |
MYPYPATH=src python -m mypy --strict src tests
- name: Test with pytest
env:
DUMMY_GITHUBAPI_TOKEN: ${{ secrets.DUMMY_WORKFLOW_GITHUB_TOKEN }}
Expand Down
95 changes: 73 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,25 +13,10 @@ The diagram above illustrates the overall architecture of the Reposcanner toolki

## How to Install

First, clone the repository from GitHub:

```
git clone https://github.com/bssw-psip/reposcanner.git
```

Then install Reposcanner and run the test suite:

```
cd reposcanner
python3 -m venv ../repo-env # create a new virtual environment
. ../repo-env/bin/activate # activate the new virtual env
pip install -e . # create editable install
tox # run tests
pip install git+https://github.com/bssw-psip/reposcanner.git
```

If all tests pass, the installation was successful, and you are ready to go!


# How to Run

## Setup input files
Expand All @@ -48,14 +33,80 @@ reposcanner --credentials tutorial/inputs/credentials.yml --config tutorial/inpu

3. examine the output files written to `tutorial/outputs`


# How to extend functionality

1. Create a new source file, `src/reposcanner/<routine.py>`, including a class
based on the `ContributorAccountListRoutine`. See `stars.py` as an
example of the kind of modifications required.
In the early days, the only way to extend Reposcanner was to modify its source, but now Reposcanner can be extended externally as well. We recommend the external method for future projects, so we don't create tons of forks of Reposcanner for each new analysis.

1. Create a new source file, `my_module.py` or `my_package/my_module.py`.

2. Import `reposcanner.requests` and one of {`reposcanner.routine` or `reposcanner.analysis`}, depending on if you want to write a routine or an analysis.

3. Locate the most relevant subclass of `reposcanner.requests.BaseRequestModel` and one of {`reposcanner.routines.DataMiningRoutine` or `reposcanner.analyses.DataAnalysis`}. E.g., for a routine that requires GitHub API access, one would subclass `OnlineRoutineRequest` and `OnlineRepositoryRoutine`. Reference the minimal blank example in `reposcanner.dummy` or real-world examples in `reposcanner.contrib`.

4. Write a config file that refers to your routines and analyses. See the next section on configuration files.

5. Check that `my_module` or `my_package.my_module` is importable. E.g., `python -c 'from my_module import MyRoutineOrAnalysis'`.
- The current working directory is implicitly in the `PYTHONPATH`, so your module or package will be importable if you run Python and Reposcanner from the directory which contains your module or package
- If your module or package does not reside in the current working directory, you need to add it to your `$PYTHONPATH` for it to be importable: `export PYTHONPATH=/path/to/proj:$PYTHONPATH`. This only has to be done once for your entire shell session. Note that the `$PYTHONPATH` should have the path to the directory containing your module or package, not the path to your module or package itself. E.g., In the previous example, if you have `/path/to/proj/my_module.py` or `/path/to/proj/my_package/my_module.py`, set the `PYTHONPATH` to `/path/to/proj`.

6. Run Reposcanner.

# Input files


## config.yaml

The config file contains a list of routines and a list of analyses. Each routine or analysis is identified as `my_module:ClassName` or `my_module.my_package:ClassName`.

2. Add the new class name (for example `- StarGazersRoutine`) to the end of `config.yml`.
Within each routine, one can put a dictionary of keyword parameters that will get passed to that routine.

3. Run the test scan and inspect output to ensure your scan worked as intended.
```
routines:
- my_module:NameOfOneRoutine
- routine: my_module:NameOfAnotherOneRoutine
arg0: "foo"
arg1: [1, 2, 3]
analysis:
- my_module:NameOfOneRoutine
- my_module:NameOfAnotherOneRoutine
arg0: "foo"
arg1: [1, 2, 3]
```

# Contributing

## How to install in development mode

```
git clone https://github.com/bssw-psip/reposcanner.git
cd reposcanner
python3 -m venv ../repo-env # create a new virtual environment
. ../repo-env/bin/activate # activate the new virtual env
pip install -e . # create editable install
```

Note you will need to run `. ../repo-env/bin/activate` every time you start a
new shell to get this environment back.

You can use a type-checker like [mypy] to catch errors before runtime. Mypy can
catch variable name errors, type errors, function arity mismatches, and many
others.

[mypy]: https://www.mypy-lang.org/

To run mypy,

```
export MYPYPATH=${PWD}/src:$MYPYPATH
mypy --strict tests src
```

One can also use tests to build confidence in the correctness of the code.

```
export PYTHONPATH=${PWD}/src:$PYTHONPATH
pytest
```

You can pass `--exitfirst` to exit after the first failing test and
`--failed-first` to run the tests which failed last time first.
6 changes: 5 additions & 1 deletion setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -57,9 +57,13 @@ install_requires =
pydot
tqdm
numpy
windows-curses;platform_system=='Windows'
pytest-mock
pytest-cov
windows-curses;platform_system=='Windows'
mypy
types-tqdm
types-PyYAML
pandas-stubs

[options.packages.find]
where = src
Expand Down
Empty file added src/reposcanner/__init__.py
Empty file.
17 changes: 10 additions & 7 deletions src/reposcanner/analyses.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,14 @@
from abc import ABC, abstractmethod
from reposcanner.requests import BaseRequestModel
from reposcanner.response import ResponseModel
from typing import Dict, Optional, Any, Type


class DataAnalysis(ABC):
"""The abstract base class for all data analyses. Methods cover
the execution of analyses, rendering, and exporting of data."""

def canHandleRequest(self, request):
def canHandleRequest(self, request: BaseRequestModel) -> bool:
"""
Returns True if the routine is capable of handling the request (i.e. the
RequestModel is of the type that the analysis expects), and False otherwise.
Expand All @@ -16,14 +19,14 @@ def canHandleRequest(self, request):
return False

@abstractmethod
def getRequestType(self):
def getRequestType(self) -> Type[BaseRequestModel]:
"""
Returns the class object for the routine's companion request type.
"""
pass

@abstractmethod
def execute(self, request):
def execute(self, request: BaseRequestModel) -> ResponseModel:
"""
Contains the code for processing data generated by mining routines and/or from external
databases.
Expand All @@ -35,15 +38,15 @@ def execute(self, request):
"""
pass

def run(self, request):
def run(self, request: BaseRequestModel) -> ResponseModel:
"""
Encodes the workflow of a DataAnalysis object. The client only needs
to run this method in order to get results.
"""
response = self.execute(request)
return response

def hasConfigurationParameters(self):
def hasConfigurationParameters(self) -> bool:
"""
Checks whether the analysis object was passed configuration parameters,
whether valid or not. Routines are not required to do anything with parameters
Expand All @@ -55,7 +58,7 @@ def hasConfigurationParameters(self):
except BaseException:
return False

def getConfigurationParameters(self):
def getConfigurationParameters(self) -> Optional[Dict[str, Any]]:
"""
Returns the configuration parameters assigned to the analysis.
"""
Expand All @@ -65,7 +68,7 @@ def getConfigurationParameters(self):
except BaseException:
return None

def setConfigurationParameters(self, configParameters):
def setConfigurationParameters(self, configParameters: Dict[str, Any]) -> None:
"""
Assigns configuration parameters to a newly created analysis.
"""
Expand Down
Loading