bssw-psip · charmoniumQ · Jun 12, 2023 · Jun 12, 2023 · Jun 12, 2023 · Jun 12, 2023
diff --git a/.github/workflows/python-package.yml b/.github/workflows/python-package.yml
@@ -30,13 +30,16 @@ jobs:
         python -m pip install --upgrade pip
         python -m pip install flake8 pytest pytest-cov
         python -m pip install --upgrade setuptools setuptools_scm wheel
-        python setup.py install
+        python -m pip install .
     #- name: Lint with flake8
     #  run: |
     #    # stop the build if there are Python syntax errors or undefined names
     #    flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
     #    # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
     #    flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
+    - name: Check with Mypy
+      run: |
+        MYPYPATH=src python -m mypy --strict src tests
     - name: Test with pytest
       env:
         DUMMY_GITHUBAPI_TOKEN: ${{ secrets.DUMMY_WORKFLOW_GITHUB_TOKEN }}

diff --git a/README.md b/README.md
@@ -13,25 +13,10 @@ The diagram above illustrates the overall architecture of the Reposcanner toolki
 
 ## How to Install
 
-First, clone the repository from GitHub:
-
-```
-git clone https://github.com/bssw-psip/reposcanner.git
-```
-
-Then install Reposcanner and run the test suite:
-
 ```
-cd reposcanner
-python3 -m venv ../repo-env # create a new virtual environment
-. ../repo-env/bin/activate  # activate the new virtual env
-pip install -e .            # create editable install
-tox                         # run tests
+pip install git+https://github.com/bssw-psip/reposcanner.git
 ```
 
-If all tests pass, the installation was successful, and you are ready to go!
-
-
 # How to Run
 
 ## Setup input files
@@ -48,14 +33,80 @@ reposcanner --credentials tutorial/inputs/credentials.yml --config tutorial/inpu
 
 3. examine the output files written to `tutorial/outputs`
 
-
 # How to extend functionality
 
-1. Create a new source file, `src/reposcanner/<routine.py>`, including a class
-   based on the `ContributorAccountListRoutine`.  See `stars.py` as an
-   example of the kind of modifications required.
+In the early days, the only way to extend Reposcanner was to modify its source, but now Reposcanner can be extended externally as well. We recommend the external method for future projects, so we don't create tons of forks of Reposcanner for each new analysis.
+
+1. Create a new source file, `my_module.py` or `my_package/my_module.py`.
+
+2. Import `reposcanner.requests` and one of {`reposcanner.routine` or `reposcanner.analysis`}, depending on if you want to write a routine or an analysis.
+
+3. Locate the most relevant subclass of `reposcanner.requests.BaseRequestModel` and one of {`reposcanner.routines.DataMiningRoutine` or `reposcanner.analyses.DataAnalysis`}. E.g., for a routine that requires GitHub API access, one would subclass `OnlineRoutineRequest` and `OnlineRepositoryRoutine`. Reference the minimal blank example in `reposcanner.dummy` or real-world examples in `reposcanner.contrib`.
+
+4. Write a config file that refers to your routines and analyses. See the next section on configuration files.
+
+5. Check that `my_module` or `my_package.my_module` is importable. E.g., `python -c 'from my_module import MyRoutineOrAnalysis'`.
+  - The current working directory is implicitly in the `PYTHONPATH`, so your module or package will be importable if you run Python and Reposcanner from the directory which contains your module or package
+  - If your module or package does not reside in the current working directory, you need to add it to your `$PYTHONPATH` for it to be importable: `export PYTHONPATH=/path/to/proj:$PYTHONPATH`. This only has to be done once for your entire shell session. Note that the `$PYTHONPATH` should have the path to the directory containing your module or package, not the path to your module or package itself. E.g., In the previous example, if you have `/path/to/proj/my_module.py` or `/path/to/proj/my_package/my_module.py`, set the `PYTHONPATH` to `/path/to/proj`.
+
+6. Run Reposcanner.
+
+# Input files
+
+
+## config.yaml
+
+The config file contains a list of routines and a list of analyses. Each routine or analysis is identified as `my_module:ClassName` or `my_module.my_package:ClassName`.
 
-2. Add the new class name (for example `- StarGazersRoutine`) to the end of `config.yml`.
+Within each routine, one can put a dictionary of keyword parameters that will get passed to that routine.
 
-3. Run the test scan and inspect output to ensure your scan worked as intended.
+```
+routines:
+  - my_module:NameOfOneRoutine
+  - routine: my_module:NameOfAnotherOneRoutine
+    arg0: "foo"
+    arg1: [1, 2, 3]
+analysis:
+  - my_module:NameOfOneRoutine
+  - my_module:NameOfAnotherOneRoutine
+    arg0: "foo"
+    arg1: [1, 2, 3]
+```
+
+# Contributing
+
+## How to install in development mode
+
+```
+git clone https://github.com/bssw-psip/reposcanner.git
+cd reposcanner
+python3 -m venv ../repo-env # create a new virtual environment
+. ../repo-env/bin/activate  # activate the new virtual env
+pip install -e .            # create editable install
+```
+
+Note you will need to run `. ../repo-env/bin/activate` every time you start a
+new shell to get this environment back.
+
+You can use a type-checker like [mypy] to catch errors before runtime. Mypy can
+catch variable name errors, type errors, function arity mismatches, and many
+others.
+
+[mypy]: https://www.mypy-lang.org/
+
+To run mypy,
+
+```
+export MYPYPATH=${PWD}/src:$MYPYPATH
+mypy --strict tests src
+```
+
+One can also use tests to build confidence in the correctness of the code.
+
+```
+export PYTHONPATH=${PWD}/src:$PYTHONPATH
+pytest
+```
 
+You can pass `--exitfirst` to exit after the first failing test and
+`--failed-first` to run the tests which failed last time first.
diff --git a/setup.cfg b/setup.cfg
@@ -57,9 +57,13 @@ install_requires =
     pydot
     tqdm
 	numpy
+    windows-curses;platform_system=='Windows'
     pytest-mock
     pytest-cov
-    windows-curses;platform_system=='Windows'
+    mypy
+    types-tqdm
+    types-PyYAML
+    pandas-stubs
 
 [options.packages.find]
 where = src

diff --git a/src/reposcanner/__init__.py b/src/reposcanner/__init__.py
diff --git a/src/reposcanner/analyses.py b/src/reposcanner/analyses.py
@@ -1,11 +1,14 @@
 from abc import ABC, abstractmethod
+from reposcanner.requests import BaseRequestModel
+from reposcanner.response import ResponseModel
+from typing import Dict, Optional, Any, Type
 
 
 class DataAnalysis(ABC):
     """The abstract base class for all data analyses. Methods cover
     the execution of analyses, rendering, and exporting of data."""
 
-    def canHandleRequest(self, request):
+    def canHandleRequest(self, request: BaseRequestModel) -> bool:
         """
         Returns True if the routine is capable of handling the request (i.e. the
         RequestModel is of the type that the analysis expects), and False otherwise.
@@ -16,14 +19,14 @@ def canHandleRequest(self, request):
             return False
 
     @abstractmethod
-    def getRequestType(self):
+    def getRequestType(self) -> Type[BaseRequestModel]:
         """
         Returns the class object for the routine's companion request type.
         """
         pass
 
     @abstractmethod
-    def execute(self, request):
+    def execute(self, request: BaseRequestModel) -> ResponseModel:
         """
         Contains the code for processing data generated by mining routines and/or from external
         databases.
@@ -35,15 +38,15 @@ def execute(self, request):
         """
         pass
 
-    def run(self, request):
+    def run(self, request: BaseRequestModel) -> ResponseModel:
         """
         Encodes the workflow of a DataAnalysis object. The client only needs
         to run this method in order to get results.
         """
         response = self.execute(request)
         return response
 
-    def hasConfigurationParameters(self):
+    def hasConfigurationParameters(self) -> bool:
         """
         Checks whether the analysis object was passed configuration parameters,
         whether valid or not. Routines are not required to do anything with parameters
@@ -55,7 +58,7 @@ def hasConfigurationParameters(self):
         except BaseException:
             return False
 
-    def getConfigurationParameters(self):
+    def getConfigurationParameters(self) -> Optional[Dict[str, Any]]:
         """
         Returns the configuration parameters assigned to the analysis.
         """
@@ -65,7 +68,7 @@ def getConfigurationParameters(self):
         except BaseException:
             return None
 
-    def setConfigurationParameters(self, configParameters):
+    def setConfigurationParameters(self, configParameters: Dict[str, Any]) -> None:
         """
         Assigns configuration parameters to a newly created analysis.
         """