Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

changes for v0.6 #86

Merged
merged 38 commits into from
Oct 17, 2024
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
dd000b4
first version of the dependency lock based mode
Jul 26, 2024
b65ba5b
fix proxy
Sep 3, 2024
4258b98
add integration tests again
Sep 3, 2024
bb8bbda
fixing testcases
Sep 3, 2024
705a926
remove 2 testcases
Sep 3, 2024
6f1138b
keep 2 integration testcases
Sep 3, 2024
adfe2c5
finalize the wrapper and restructure the code
Sep 7, 2024
a302bf4
implement the functionalities discussed during the last meeting
Sep 8, 2024
1bc4e89
fixed Dockerfie to support cmd parameters easily
JJ-Author Sep 9, 2024
56c4d7a
only add the parameters for the certificate parameters if https inter…
Sep 9, 2024
700022e
fix log
Sep 10, 2024
9966b5e
adding -v to run the tests
Sep 10, 2024
dbb96ca
adding -v to run the tests
Sep 10, 2024
a5f872f
fix testcases
Sep 10, 2024
3711225
start poetry from workflow
Sep 10, 2024
f6814ab
fixes
Sep 10, 2024
415fceb
fixing some comments
Sep 27, 2024
2bf7e6d
add poetry instalation to README
Oct 4, 2024
7b4c919
rename get_ontology_from_request funtion
Oct 4, 2024
2d3ceaa
transform config to dict from tuple
Oct 4, 2024
4c98e27
Create Config dataclass and some cleanups
Oct 6, 2024
b284e74
Add testcases
Oct 6, 2024
b4057d4
fix startup command in README
Oct 15, 2024
945a6fe
modify wrapper function for get_request host and path
Oct 15, 2024
5f00b27
use enum for config
Oct 15, 2024
2110f4d
use enum for config
Oct 15, 2024
af082eb
update proxy logic function def and add do_intercept hook
Oct 15, 2024
8e232fd
update proxy logic function def and add do_intercept hook
Oct 15, 2024
995033c
Merge branch 'fixing_pr_comments' of https://github.com/kuefmz/ontolo…
Oct 15, 2024
3d8435d
fix wrapper for host and path
Oct 15, 2024
5a4c00a
fix downlaod archivo
Oct 15, 2024
93fb8be
fix tests
Oct 15, 2024
28c5b08
fix do_intercept
Oct 15, 2024
18b16e0
Merge pull request #1 from kuefmz/fixing_pr_comments
kuefmz Oct 16, 2024
7d4d3a9
move depencency.ttl to tests
Oct 16, 2024
de9b6cd
remove prints
Oct 16, 2024
abeebc8
Merge branch 'main' of https://github.com/kuefmz/ontology-time-machine
Oct 16, 2024
a1c47cf
rename block function
Oct 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion .github/workflows/pytest.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,9 @@ jobs:
echo "$CA_CERT" > ca-cert.pem
echo "$CA_KEY" > ca-key.pem
echo "$CA_SIGNING_KEY" > ca-signing-key.pem
- name: Start the proxy
run: |
poetry run python ontologytimemachine/custom_proxy.py &
- name: Test with pytest
run: |
poetry run pytest
poetry run pytest -v
4 changes: 1 addition & 3 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,4 @@ RUN pip install poetry==$POETRY_VERSION
RUN poetry config virtualenvs.create false
RUN poetry install --no-dev && rm pyproject.toml


CMD python3 -m proxy --ca-key-file ca-key.pem --ca-cert-file ca-cert.pem --ca-signing-key-file ca-signing-key.pem --hostname 0.0.0.0 --port $PORT --plugins ontologytimemachine.custom_proxy.OntologyTimeMachinePlugin

ENTRYPOINT ["python3", "ontologytimemachine/custom_proxy.py"]
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,3 +35,6 @@ cp ca-signing-key.pem ~/ontology-time-machine/ca-signing-key.pem
### Not working:
- curl -x http://0.0.0.0:8899 -H "Accept: text/turtle" --cacert ca-cert.pem http://ontologi.es/days#



python3 -m proxy --ca-key-file ca-key.pem --ca-cert-file ca-cert.pem --ca-signing-key-file ca-signing-key.pem --hostname IP --port 8899 --plugins ontologytimemachine.custom_proxy.OntologyTimeMachinePlugin --ontoFormat ntriples --ontoVersion originalFailoverLive --ontoPrecedence enforcedPriority
JJ-Author marked this conversation as resolved.
Show resolved Hide resolved
69 changes: 39 additions & 30 deletions ontologytimemachine/custom_proxy.py
JJ-Author marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
from proxy.http.proxy import HttpProxyBasePlugin
from proxy.http.parser import HttpParser, httpParserTypes
from proxy.http.parser import HttpParser
from proxy.common.utils import build_http_response
from proxy.http.methods import HttpMethods
from ontologytimemachine.utils.utils import proxy_logic, parse_arguments
from ontologytimemachine.utils.utils import check_if_archivo_ontology_requested
from ontologytimemachine.utils.utils import parse_arguments
from ontologytimemachine.utils.mock_responses import mock_response_403
from requests.exceptions import SSLError, Timeout, ConnectionError, RequestException
from ontologytimemachine.proxy_wrapper import HttpRequestWrapper
from ontologytimemachine.utils.proxy_logic import proxy_logic, is_ontology_request_only_ontology
from ontologytimemachine.utils.proxy_logic import is_archivo_ontology_request
from ontologytimemachine.utils.proxy_logic import if_intercept_host
from http.client import responses
import proxy
import sys
Expand All @@ -15,69 +16,69 @@
IP = '0.0.0.0'
PORT = '8899'

config = ({'format': 'turtle', 'precedence': 'enforcedPriority', 'patchAcceptUpstream': False}, 'originalFailoverLiveLatest', False, 'all', False, True, None, None)

logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

class OntologyTimeMachinePlugin(HttpProxyBasePlugin):
def __init__(self, *args, **kwargs):
logger.info('Init')
super().__init__(*args, **kwargs)
(self.ontoFormat, self.ontoVersion, self.only_ontologies,
self.https_intercept, self.inspect_redirects, self.forward_headers,
self.subject_binary_search_threshold) = parse_arguments()

(self.ontoFormat, self.ontoVersion, self.restrictedAccess,
self.httpsInterception, self.disableRemovingRedirects,
self.forward_headers, self.timestamp, self.manifest) = config
JJ-Author marked this conversation as resolved.
Show resolved Hide resolved

def before_upstream_connection(self, request: HttpParser):
logger.info('Before upstream connection hook')
logger.info(f'Request method: {request.method} - Request host: {request.host} - Request path: {request.path} - Request headers: {request.headers}')
wrapped_request = HttpRequestWrapper(request)

if request.method == b'CONNECT':
logger.info(f'HTTPS interception mode: {self.https_intercept}')
if wrapped_request.is_connect_request():
logger.info(f'HTTPS interception mode: {self.httpsInterception}')
# Only intercept if interception is enabled
if self.https_intercept in ['all', 'archivo']:
# Move this to the utils
if if_intercept_host(self.httpsInterception):
logger.info('HTTPS interception is on, forwardig the request')
return request
else:
logger.info('HTTPS interception is turned off')
return None
JJ-Author marked this conversation as resolved.
Show resolved Hide resolved


ontology_request = check_if_archivo_ontology_requested(request)
# If only ontology mode, return None in all other cases
if self.only_ontologies and not ontology_request:
if is_ontology_request_only_ontology(wrapped_request, self.restrictedAccess):
logger.warning('Request denied: not an ontology request and only ontologies mode is enabled')
self.queue_response(mock_response_403)
return None

if ontology_request:
if is_archivo_ontology_request(wrapped_request):
logger.debug('The request is for an ontology')
response = proxy_logic(request, self.ontoFormat, self.ontoVersion)
response = proxy_logic(wrapped_request, self.ontoFormat, self.ontoVersion, self.disableRemovingRedirects, self.timestamp, self.manifest)
self.queue_response(response)
return None
JJ-Author marked this conversation as resolved.
Show resolved Hide resolved
return request


def handle_client_request(self, request: HttpParser):
logger.info('Handle client request hook')
logger.info(f'Request method: {request.method} - Request host: {request.host} - Request path: {request.path} - Request headers: {request.headers}')

logger.debug(request.method)
if request.method == b'CONNECT':
wrapped_request = HttpRequestWrapper(request)
if wrapped_request.is_connect_request():
return request

ontology_request = check_if_archivo_ontology_requested(request)
if not ontology_request:
is_ontology_request = is_archivo_ontology_request(wrapped_request)
if not is_ontology_request:
logger.info('The requested IRI is not part of DBpedia Archivo')
return request

response = proxy_logic(request, self.ontoFormat, self.ontoVersion)
response = proxy_logic(wrapped_request, self.ontoFormat, self.ontoVersion, self.disableRemovingRedirects, self.timestamp, self.manifest)
self.queue_response(response)
JJ-Author marked this conversation as resolved.
Show resolved Hide resolved

return None


def handle_upstream_chunk(self, chunk: memoryview):
return chunk


def queue_response(self, response):
self.client.queue(
build_http_response(
Expand All @@ -93,15 +94,23 @@ def queue_response(self, response):

if __name__ == '__main__':

sys.argv += [
'--ca-key-file', 'ca-key.pem',
'--ca-cert-file', 'ca-cert.pem',
'--ca-signing-key-file', 'ca-signing-key.pem',
]
config = parse_arguments()

sys.argv = [sys.argv[0]]

# check it https interception is enabled
if config[3] != 'none':
JJ-Author marked this conversation as resolved.
Show resolved Hide resolved
sys.argv += [
'--ca-key-file', 'ca-key.pem',
'--ca-cert-file', 'ca-cert.pem',
'--ca-signing-key-file', 'ca-signing-key.pem',
]

sys.argv += [
'--hostname', IP,
'--port', PORT,
'--plugins', __name__ + '.OntologyTimeMachinePlugin'
]

logger.info("Starting OntologyTimeMachineProxy server...")
proxy.main()
101 changes: 101 additions & 0 deletions ontologytimemachine/proxy_wrapper.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
from abc import ABC, abstractmethod
from proxy.http.parser import HttpParser
import logging


logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)


class AbstractRequestWrapper(ABC):
def __init__(self, request):
self.request = request

@abstractmethod
def is_get_request(self) -> bool:
pass

@abstractmethod
def is_connect_request(self) -> bool:
pass

@abstractmethod
def is_head_request(self) -> bool:
pass

@abstractmethod
def is_https_request(self) -> bool:
pass

@abstractmethod
def get_request(self):
pass

@abstractmethod
def get_request_headers(self):
pass

@abstractmethod
def get_request_accept_header(self):
pass

@abstractmethod
def set_request_accept_header(self, mime_type):
pass

@abstractmethod
def get_ontology_from_request(self):
pass


class HttpRequestWrapper(AbstractRequestWrapper):
def __init__(self, request: HttpParser):
super().__init__(request)

def is_get_request(self) -> bool:
return self.request.method == b'GET'

def is_connect_request(self):
return self.request.method == b'CONNECT'

def is_head_request(self):
return self.request.method == b'HEAD'

def is_https_request(self):
return self.request.method == b'CONNECT' or self.request.headers.get(b'Host', b'').startswith(b'https')

def get_request(self):
return self.request

def get_request_headers(self):
headers = {}
for k, v in self.request.headers.items():
headers[v[0].decode('utf-8')] = v[1].decode('utf-8')
return headers

def get_request_accept_header(self):
logger.info('Wrapper - get_request_accept_header')
return self.request.headers[b'accept'][1].decode('utf-8')

def set_request_accept_header(self, mime_type):
self.request.headers[b'accept'] = (b'Accept', mime_type.encode('utf-8'))
logger.info(f'Accept header set to: {self.request.headers[b"accept"][1]}')

def get_ontology_from_request(self):
JJ-Author marked this conversation as resolved.
Show resolved Hide resolved
logger.info('Get ontology from request')
print(f'Request protocol: {self.request.protocol}')
print(f'Request host: {self.request.host}')
print(f'Request _url: {self.request._url}')
print(f'Request path: {self.request.path}')
JJ-Author marked this conversation as resolved.
Show resolved Hide resolved
if (self.request.method == b'GET' or self.request.method == b'HEAD') and not self.request.host:
for k, v in self.request.headers.items():
if v[0].decode('utf-8') == 'Host':
host = v[1].decode('utf-8')
path = self.request.path.decode('utf-8')
ontology = 'https://' + host + path
else:
host = self.request.host.decode('utf-8')
path = self.request.path.decode('utf-8')
ontology = str(self.request._url)
logger.info(f'Ontology: {ontology}')
return ontology, host, path
JJ-Author marked this conversation as resolved.
Show resolved Hide resolved
25 changes: 25 additions & 0 deletions ontologytimemachine/utils/dependency.ttl
JJ-Author marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
@prefix ex-version: <https://example.org/versioning/>
<https://example.org/ontology/> owl:imports <http://xmlns.com/foaf/spec/>, <http://purl.org/dc/terms/> ;
ex-version:current <https://example.org/ontology/2024-01-24> ;
ex-version:version
<https://example.org/ontology/2024-01-24> ,
[
ex-version:snapshot <https://databus.dbpedia.org/ontologies/w3.org/2020--example/2023.02.11-215415> ;
ex-version:file <https://archivo.dbpedia.org/download?o=https%3A//example.org/ontology/&f=ttl&v=2023.02.01-215415> ;
ex-version:dependency <http://xmlns.com/foaf/spec/20100101.html>, <https://databus.dbpedia.org/ontologies/w3.org/2020--dct/2020.05.23-215415> ;
] .

<https://example.org/ontology/2024-01-24>
ex-version:snapshot <https://databus.dbpedia.org/ontologies/w3.org/2020--example/2024.01.24-215415> ;
ex-version:file <https://archivo.dbpedia.org/download?o=https%3A//example.org/ontology/&f=ttl&v=2024.01.24-215415> ;
ex-version:dependency <http://xmlns.com/foaf/spec/20140114.html>, <https://databus.dbpedia.org/ontologies/w3.org/2020--dct/2020.05.23-215415> ;
]

<http://xmlns.com/foaf/spec/20100101.html> ex-version:snapshot <http://xmlns.com/foaf/spec/20100101.html> ;
ex-version:file <http://xmlns.com/foaf/spec/20100101.rdf> .

<http://xmlns.com/foaf/spec/20140114.html> ex-version:snapshot <http://xmlns.com/foaf/spec/20140114.html> ;
ex-version:file <http://xmlns.com/foaf/spec/20140114.rdf> .

<https://databus.dbpedia.org/ontologies/w3.org/2020--dct/2020.05.23-215415> ex-version:snapshot <https://databus.dbpedia.org/ontologies/w3.org/2020--dct/2020.05.23-215415> ;
ex-version:file <https://archivo.dbpedia.org/download?o=http%3A//purl.org/dc/terms/&f=ttl&v=2020.05.23-215415> .
Loading