GitHub - FANFANFAN2506/HTTP_Proxy_Server: Used C++ to create a robust HTTP/HTTPS proxy that processes simultaneous GET, POST, and CONNECT requests following the RFC7231 Protocol, forwarding to the origin server and facilitating correct reception of chunked data.

#README

General:

For the recvied data from the socket recv() functions, the standard send() and recv() will have a char * as a pass-in argument, if we use a string to store the data, and do a str.c_str() to conver to a const char * and send, it will encounter problem. As the c_str() will only generate the contents in stack, and we passed in the the pointer to the function, it will lose the control of the content, resulting in the recv or send fail. Instead, the contents are received and stored as a form of vector as this way it can be resized and accessed even convert to string in a convienient way.

Request:

For the request received, the Host Line will provide the target server's host name and port number, however, some request will not explicitly specify the port number, and they will treat the port number 80 as default for HTTP proxy. Thus the judgement should be given on wheter the port number is specified or not.
There are some fields defined in the Cache-control in the HTTP request, we need to parse them out in order to do the judgement on the caching and validation policy. However, not all requests will have all the fields, it is important to initialize with an appropriate value and treat it sepcially when latering judging.
Some fields like the max-age min-fresh max-stale are followed by an equal sign and a number, we need to parse these from string format to a int number, a function is abstracted out called findNumber, which is finding the fields name in Cache-Control field and then seperate them out depending on their different ending which might be , or \r\n. If there is no such field, the funciton will return -1. When we are calling this function we need to judge if the return value is a valid one or not, we should consider 0 is also a valid value defined in the request Cache-Control field.

Response:

Response may have a field called Transfer-Coding, where the response coding type will be defined, such as chunked data, in this case, the server will not send the data all at once, so we have to judge whether we need to do multiple receiving. Although we need a while loop to continuely receive the response, but it should end at a specific time. One solution is to set a time out, if we don't have receive for a timeout, we will stop receiving. Another solution, is performing the same logic, but we do the judgment on the return value of the recv() function, if we found the return value is 0, which means that the receive has ended, we could stop receving.
For the time related field we got in the HTTP response headers, it is in UTC time, which is different from the local time zone, so when we want to transfer from the time_t type into the string, we should transfer twice, once into GMT, then into asctime format for logging.
The Last-Modified field is also presented in date time format, we firstly convert it into time_t type and store it, however, later we found out that this field will only be used to reconstruct conditional request for validation, a string type is needed to be appended. As a result, we should avoid one more conversion, the string format should be good enough to store.

Proxy:

The original design for our Proxy class is one Proxy-instance will handle one request, including the receive request, caching, request for response and reply back, etc. We want to include one Request and one Response pointer fields in the class, but later we found out that the response will not be delete after one request finish, it should be stored in the Cache object for later use. Finally, only a pointer to Request object is reserved in the class.
To follow the RAII principle, we use unique_ptr to take charge of the all new Proxy objects and Response objects, so we don't need to explicitly delete them, when the execution goes out of the stack frame, the they will be automatically delete by the unique_ptr, even the program exited due to the exception, the objects will still be delted to avoid memory leak.
To follow the C++ OO design, we not only as mentioned above using the Class to store the data and do the operation, we try to define the fields as private, and to access the fields outside the class, we have the return_function declared as const try to avoid the modification to the field. The whole Proxy class is the one we do maniupulation, almost all the functions are written as the member function of the Proxy class.
The socket setup should set the SO_REUSEADDR option, to avoid the port number is occupied, we cannot bind it to the socket and listen from it.

CONNECT:

For the connect operation, when we recevied the first request, we need to connect to server first, and reply with a 200 OK response to the client after that, and continuously exchange the data recevied from the both end using a select() function. Here, if we detect either end is closed, we need to close the Tunnel.

POST:

Post method, will perform the same functionality as the first time GET request, so the function is abstracted out to perform the fetching operation. However, we need to consider the chunked problem, for the reason that our parser will fail if it checks the Content-Length doesn't match with the actual data recevied, so Chunked website should be handled specially.

GET:

For GET method, we need to check the cache first, and for the ones not cached, we need to fetch from the server, which is the exact operation as POST, but one more step needs to be added to check if it can be cached for future use.
If the asked response is cached, we need to check if the cached response is fresh or does it need validation from the request and response ends. If so, we need to construct a conditional request, with the Last-Modified and Etags field in the response, and modified the header into If-Modified-Since and If-None-Match send to server. Then we may received 200 OK which means we need to update the cached response, or it might be a 304 Not Modified which means we could directly send the cached response back.

chunk:

For the chunked data, we want to reduce the delay for client receiving the response, instead of receiving all the data and then send at once, we choose to send while we are receving the data, this is the same effect as we use the direct connection. However, we still store the chunked response together for the later caching use.

cache:

For the cache part, we first use the vector to store. However, when the cache size become large. The search speed is slow. Then, we see the leetcode 146 problem and develop a LRU cache based on linked list and map. The get and put then are all O(1) operation.

Others:

For the dameon process, we use the docker to call the bash file and the bash file contains the while true operation. Therefore, even the process dumped by some interrupt. It will restart and and keep running.

Name		Name	Last commit message	Last commit date
Latest commit History 116 Commits
Preworks		Preworks
docker-depoly		docker-depoly
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

General:

Request:

Response:

Proxy:

CONNECT:

POST:

GET:

chunk:

cache:

Others:

About

Releases

Packages

Languages

FANFANFAN2506/HTTP_Proxy_Server

Folders and files

Latest commit

History

Repository files navigation

General:

Request:

Response:

Proxy:

CONNECT:

POST:

GET:

chunk:

cache:

Others:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages