Skip to content

Commit

Permalink
Initial implementation.
Browse files Browse the repository at this point in the history
  • Loading branch information
Roman Lehner committed Mar 30, 2021
0 parents commit 81844f4
Show file tree
Hide file tree
Showing 9 changed files with 308 additions and 0 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
__pycache__
.pytest_cache
env
.coverage
.vscode
115 changes: 115 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
# DoTproxy (DNS over TLS)
DoTProxy is an experimental project that intends to encrypt DNS queries with TLS. It origniated from a challenge I got from a job interview and happend to be a great opportunity to get myself more involved in network and coding topics.

# Run DoTproxy
## With Docker Compose
Use [docker-compose](docker-compose.yaml) to build and run the DoTproxy container in the background:

docker-compose up -d

To check the status of the container or shut it down:

docker-compose ps
docker-compose down

> **NOTE:** I decided to run the unit tests as part of the [dockerbuild](dockerfile) to force myself to never build a container without tested code. This might not be the common practice but I found it useful for local development. When running the build with docker-compose you should see the test run output in the console.
## Test DoTproxy with dig or nslookup
Use `dig` or `nslookup` to test the DNS query by specifying the host and port of DoTproxy:

nslookup google.com localhost -port 53
dig google.com @localhost -p 53

## Run from Source (UNIX)
DoTproxy was developed using `Python 3.9.1`. Create a virtual environment and run the python script with sudo (port 53 requires root permission):

pip install virtualenv
virtualenv --python /usr/bin/python3.9.1 env
source env/bin/activate
sudo python dot_proxy.py

## Run Unit Tests
To run [unit tests](unit_test.py) with coverage install the required libraries and run pytest:

pip install -r requirements.txt
pytest --cov dot_proxy unit_test.py

## Run E2E Tests
To run the [e2e test](e2e_client_test.py) first start DoTproxy then run with pytest:

docker-compose up -d
pytest e2e_client_test.py -o log_cli=true -o log_level=INFO
docker-compose down

## Run Load Test
To run the [load test](load_test.py) first start DoTproxy then run with pytest. The container stats can be viewed with `docker stats`:

docker-compose up -d
docker stats dot-proxy
pytest load_test.py -o log_cli=true -o log_level=INFO
docker-compose down

> **NOTE:** As of the current version the load testing script can only perform up to 250 concurrent requests. Above the test will get stuck as the threads are not processes (cause yet to be explored).
# Design Choises
The first draft of DoTproxy waits for a UDP connection on `localhost:53` and forwards the DNS query to `Cloudflare DNS` on `1.1.1.1:853` by a TLS encrypted TCP connection. Every request creates a new TCP socket and closes it after the response has been received.

> **Version v0.0.1:** The system in a blocking state being able to serve only a single request at a time. Based on the simplified [load test](#run-load-test) this version is able to handle 100 concurrent requests in about 8 seconds (12 queries per second).
> **Version v0.0.2:** Spins up TCP sockes for each DNS query so that the main thread can process the next incoming request on the UDP socket on port 53. Implemented with python [threading library](https://docs.python.org/3/library/threading.html). Based on the simplified [load test](#run-load-test) this version is able to handle 100 concurrent requests in about 0.5 seconds (200 queries per second).
More considerations and improvement points can be found in the [room for improvements](#room-for-improvements) section.

## Security Concerns
While DNS-over-TLS is encrypted, all DNS queries are performed exclusively over port 53 as per [RFC1035](https://tools.ietf.org/html/rfc1035#section-4.2.1). This allows attacks to specify a precise target where in contrast DNS-over-HTTPS is performed on port 443 which blends with other requests over the network and therefore might me more difficult to identify.

Depending on the system architecture there might be other attack surfaces such as a cache or the still unprotected network connection between DNS requester and DoTproxy. This might be especially significant for attacks from within the system.

## Integration to Distributed Systems
A production ready DoT proxy could be deployed in multiple ways. It could act as centralized DNS Gateway that any internal service can query. Assuming the number of queries is significantly high, DoTproxy would be required to be either performant enough and/or scalable. We would also have to think of availability and resilience of the system, as the proxy becomes a single point of failure.

Going the opposite direction, we could think of utilizing a container side-car pattern by attaching DoTproxy to running containerized services, enforcing services to perform DNS queries over DoTproxy. In that case the load on a single instance might be significantly reduced in contrast to the centralized architecture and we eliminate a single point of failure. While the design complexity of DoTproxy might be reduced, we might experience an increase of operational overhead in terms of overall cpu, memory and disk resources required, as well as maintaining and updating the system.

## Room for Improvements
Here are some improvements I could think of based on the current design presented:
- SSL certificate verification
- DNS query and response verification
- Caching
- Health and Readiness probing
- Error Handling
- Allow to query multiple DNS server

### SSL certificate verification
While the current system is performing a TLS handshake and encrypts data with the provided certificate by the server, there should be a mechanism to verify the correctness and validity of the certificate. The server might deliver an expired or untrusted certificate. At this point I would want to make sure that the certificate doesn't come from an attacker who might either read the data or redirect to malicious IP addresses.

### DNS response verification
The DNS query should be checked on correctness and formatting before the connection to the DNS server is established. Also the server response should be verified before returned to the requester.

### Caching
A caching system allows to reduce unecessary network exposure as well as reducing query time. Caching might introduce new complexities such as expiry, concurrency or invalid DNS entries.

### Health and Readiness probing
Currently DoTproxy dosn't have any specific ports for checking its functional state. While the single port 53 could be used, it would also block actual DNS requests from being processed.

### Error Handling
As mentioned in previous sections the current version does not have any specific error handlings. If something fails the socket is supposed to close the connection and a new request has to be made by the client. There aren't any verification steps and routines when something goes wrong.

### Allow to query multiple DNS server
DoTproxy only queries cloudflare DNS on `1.1.1.1:853`. We could set an environment variable to query different DNS servers and modify the query to the requirements of the provider if necessary.

# Resources
- [SPKI Certificate Theory](https://tools.ietf.org/html/rfc2693)
- [Domain Name Specifications - UDP and TCP usage](https://tools.ietf.org/html/rfc1035#section-4.2.1)
- [Python Socket Library](https://docs.python.org/3/library/socket.html#socket.socket.accept)
- [Socket Programming in Python - Real Python](https://realpython.com/python-sockets/)
- [Socket Programming in Python - Educative.io](https://www.educative.io/courses/grokking-computer-networking/N73706w7Br6)
- [Python SSL Library](https://docs.python.org/3/library/ssl.html)
- [Cloudflare 1.1.1.1 - Documentation](https://developers.cloudflare.com/1.1.1.1/)
- [What is an on-path attacker - Cloudflare blog](https://www.cloudflare.com/en-gb/learning/security/threats/on-path-attack/)
- [Zero Trust Security - Cloudflare blog](https://www.cloudflare.com/en-gb/learning/security/glossary/what-is-zero-trust/)
- [DoT vs DoH - Cloudflare blog](https://www.cloudflare.com/en-gb/learning/dns/dns-over-tls/)
- [OpenSSl client for SSL testing](https://docs.pingidentity.com/bundle/solution-guides/page/iqs1569423823079.html)
- [TLS Handshake - Cloudflare blog](https://www.cloudflare.com/en-gb/learning/ssl/what-happens-in-a-tls-handshake/)
- [Hexadecimal definition - Wikipedia](https://simple.wikipedia.org/wiki/Hexadecimal#:~:text=The%20hexadecimal%20numeral%20system%2C%20often,numbers%20and%20six%20extra%20symbols.)
- [Concurrency in Python - Real Python](https://realpython.com/python-concurrency/)
- [Let localhost be localhost - RFC](https://tools.ietf.org/html/draft-ietf-dnsop-let-localhost-be-localhost-02)
10 changes: 10 additions & 0 deletions docker-compose.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
version: "3.9"
services:
dot-proxy:
build: .
image: dot-proxy:0.0.2
container_name: dot-proxy
ports:
- target: 53
published: 53
protocol: udp
13 changes: 13 additions & 0 deletions dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
FROM python:3.9.1-alpine AS test
WORKDIR /dns-proxy
COPY requirements.txt .
RUN pip3 install -r requirements.txt
COPY dot_proxy.py .
COPY unit_test.py .
RUN pytest --cov dot_proxy unit_test.py

FROM python:3.9.1-alpine
EXPOSE 53/udp
WORKDIR /dns_proxy
COPY --from=test /dns-proxy/dot_proxy.py .
CMD ["python", "dot_proxy.py"]
74 changes: 74 additions & 0 deletions dot_proxy.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,74 @@
import socket
import ssl
import binascii
import logging
import threading

logging.basicConfig(level=logging.DEBUG)

PROXY_HOST = '0.0.0.0'
PROXY_PORT = 53

# cloudflare-dns.com
DNS_SERVER_HOST= '1.1.1.1'
DNS_SERVER_PORT = 853
DNS_SERVER_HOST_NAME = 'cloudflare-dns.com'

MAX_BYTE_SIZE = 512 # UDP messages are restricted to 512 bytes according to https://tools.ietf.org/html/rfc1035#section-4.2.1

# DNS calls over TCP require to add a prefix of 2 bytes representing the length of the UDP request according to the RFC standard https://tools.ietf.org/html/rfc1035#section-4.2.2.
def convert_udp_to_tcp(udp_message):
length_udp = len(udp_message)
tcp_prefix = binascii.unhexlify(hex(length_udp)[2:]) # we have to remove the first 2 digits of the hex string retured by hex() for unhexlify to work: e.g. 0x28 -> 28

if length_udp < 256: # below 256 (decimal) binascii will only return a single byte representation. We have to add a zero on top: e.g. \x28 -> \x00\x28
tcp_prefix = b'\x00' + tcp_prefix

tcp_request = tcp_prefix + udp_message
logging.debug(f'TCP prefix: {tcp_prefix} for UDP message length: {length_udp}')
logging.debug(f'TCP message: {tcp_request}')
return tcp_request

# Remove the first 2 bytes from the TCP message
def convert_tcp_to_udp(tcp_message):
udp_message = tcp_message[2:]
logging.debug(f'UDP message: {udp_message} for removed TCP prefix: {tcp_message[:2]}')
return tcp_message[2:]

# Creates a new TCP socket and establishes a TLS encrypted connection
# Once the DNS query response is received the response will be returned in UDP format
# and the socket connection will be closed.
def get_dns(udp_sock, client_addr, udp_request):
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as tcp_sock:
tcp_sock.connect((DNS_SERVER_HOST, DNS_SERVER_PORT))

context = ssl.SSLContext(ssl.PROTOCOL_TLS)
ssl_sock = context.wrap_socket(tcp_sock, server_hostname=DNS_SERVER_HOST_NAME)
# Todo: Verify SSL certificate
ssl_sock.sendall(convert_udp_to_tcp(udp_request))

tcp_response = ssl_sock.recv(MAX_BYTE_SIZE)
logging.info(f'Server {ssl_sock.getpeername()} Response: {tcp_response}')
# Todo: Verify DNS query response

udp_response = convert_tcp_to_udp(tcp_response)
udp_sock.sendto(udp_response, client_addr)
logging.info(f'Replied Client {client_addr} with {udp_response}')

# Waits for client to send DNS query and spins up a thread to serve the client so that the main process
# can take care of the next request.
def main():
with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as udp_sock:
udp_sock.bind((PROXY_HOST, PROXY_PORT))

while True:
logging.info('Wait for request...')
udp_request, client_addr = udp_sock.recvfrom(MAX_BYTE_SIZE)

logging.info(f'Serving Client: {client_addr}')
logging.debug(f'Client: {client_addr} Requests: {udp_request}')
query = threading.Thread(target=get_dns, args=(udp_sock, client_addr, udp_request))
query.start()

if __name__ == "__main__":
main()
23 changes: 23 additions & 0 deletions e2e_client_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# This client was used for testing and troubleshooting connectivity to the DoTproxy server.
# It sends a static dns request asking for `google.com` and returns the raw byte response
# from the DoTproxy server.
import socket
import logging

logging.basicConfig(level=logging.INFO)

MAX_BYTE_SIZE = 512

# not a very accurate test case. Response quite dynamic, but should contain the keyword if successful
def test_when_google_dns_is_queried_then_the_response_should_contain_google(query=1):
with socket.socket(socket.AF_INET, socket.SOCK_DGRAM) as sock:
logging.debug(f'Conneting...')
sock.connect(('127.0.0.1', 53))
logging.debug(f'Query: {query} Connected...')

udp_message = b'\xbd[\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x06google\x03com\x00\x00\x01\x00\x01'
sock.send(udp_message)

response = sock.recv(MAX_BYTE_SIZE)
logging.debug(f'Query {query} Server responded with: {response}')
assert b'google' in response
28 changes: 28 additions & 0 deletions load_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import logging
import threading
import pytest
import e2e_client_test

logging.basicConfig(level=logging.INFO)

CONCURRENT_QUERIES = 200 # on my machine the test gets stuck for more than about 280 threads

# Todo: Try thread pools with concurrent.features ThreadPoolExecutor
def test_rapid_fire():

logging.info(f'Generating {CONCURRENT_QUERIES} queries...')
queries = []
for num in range(CONCURRENT_QUERIES):
thread = threading.Thread(target=e2e_client_test.test_when_google_dns_is_queried_then_the_response_should_contain_google, args=(num + 1,))
thread.start()
logging.info(f'Fired Query {thread}')
queries.append(thread)

logging.info(f'Concurrent Queries Active: {threading.active_count() - 1}')

for thread in queries:
thread.join()
logging.info(f'Queries left: {threading.active_count() - 1}')

logging.info(f'Processed {len(queries)} queries... Done!')

10 changes: 10 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
attrs==20.3.0
coverage==5.4
iniconfig==1.1.1
packaging==20.9
pluggy==0.13.1
py==1.10.0
pyparsing==2.4.7
pytest==6.2.2
pytest-cov==2.11.1
toml==0.10.2
30 changes: 30 additions & 0 deletions unit_test.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import pytest
from dot_proxy import * # on my machine the test gets stuck for more than about 280 threads # on my machine the test gets stuck for more than about 280 threads

UDP_MESSAGE = b'\xbd[\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x06google\x03com\x00\x00\x01\x00\x01'
TCP_MESSAGE = b'\x00\x1c\xbd[\x01\x00\x00\x01\x00\x00\x00\x00\x00\x00\x06google\x03com\x00\x00\x01\x00\x01'

def test_when_requesting_dns_for_google_dot_com_then_tcp_message_should_prefix_2_bytes_with_length_of_udp_message_in_hex():
udp_message = UDP_MESSAGE
tcp_message = convert_udp_to_tcp(udp_message)
assert tcp_message == TCP_MESSAGE

def test_when_converting_from_udp_to_tcp_message_then_tcp_should_be_2_bytes_longer():
udp_message = UDP_MESSAGE
tcp_message = convert_udp_to_tcp(udp_message)
assert len(tcp_message) - len(udp_message) == 2

def test_when_converting_from_udp_to_tcp_should_be_of_type_byte():
udp_message = UDP_MESSAGE
tcp_message = convert_udp_to_tcp(udp_message)
assert type(tcp_message) == bytes

def test_when_converting_a_udp_message_with_a_length_of_16_to_the_power_of_4_then_the_conversion_should_prefix_0xFFFF_in_escape_characters_for_hex():
udp_message = b'w' * (65536 - 1)
tcp_message = convert_udp_to_tcp(udp_message)
assert tcp_message == b'\xff\xff' + udp_message

def test_when_tcp_message_converts_to_udp_message_then_the_prefixed_message_length_bytes_should_be_removed():
tcp_message = TCP_MESSAGE
udp_message = convert_tcp_to_udp(tcp_message)
assert udp_message == tcp_message[2:]

0 comments on commit 81844f4

Please sign in to comment.