Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
Signed-off-by: Rahul Gopathi <[email protected]>
  • Loading branch information
RahulGopathi committed Sep 16, 2023
1 parent f8e4a3c commit 95ff703
Show file tree
Hide file tree
Showing 4 changed files with 118 additions and 4 deletions.
112 changes: 112 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,2 +1,114 @@
# AWS-Price-Database
Fast API with MySQL database created from AWS Bulk price API

## **Objective**

To create a MySQL database that stores metadata,
pricing, and types of all AWS services across all regions. Additionally, we
will expose an API that can accept various service attributes as input and
return specific product details and their pricing.

## **Database Schema**

Since AWS contains many services across regions, it is important to decide the schema based on the queries users tend to hit on.

### Possible queries

- Know all product families
- Know all the services under a product family
- Know all the services available in a region
- Know all the products of a particular service (e.g., Amazon S3)
- Know the price in different regions of a particular service
- Know all the products and their prices with a particular product attribute value under a service

Here is the schema that I have used to create the database:

- `product_family`
- `id`
- `name`
- `product`
- `id`
- `sku`
- FOREIGN KEY `product_family.id`
- `service_code` - Indexed
- `location` - Indexed
- `region_code` - Indexed
- `product_attributes` - Indexed a particular attribute(e.g., memory in RDS)
- `price`
- FOREIGN KEY `product.id`
- `pricePerUnit`
- `unit`
- `description`

To achieve Maximum Query Execution time of 50ms, Several Indexes have been added as mentioned above.

## **Creating the Database**
Here are the steps to Creating the Database and loading the data in it.
- Clone the repository
```
git clone https://github.com/RahulGopathi/AWS-Price-Database.git
```
- Create the virtual environment
```
python -m venv env
```
- Installing the Dependencies
```
pip install -r requirements.txt
```
- Activate the virtual environment
```
source env/bin/activate
```
- Create the `.env` file and add the following variables
```
cp .env.example .env
```
> Change the variables accordingly in your `.env` file
- Create and Load the Database
```
cd load-data/
python main.py index.json
```
> Note: Here the offer index file should be provided so that it will automatically download the offer files and load the data into the database. Offer index file can be found [here](https://pricing.us-east-1.amazonaws.com/offers/v1.0/aws/index.json)
- Run the FastAPI
```
cd ../
uvicorn main:app --reload
```
> Note: The API will be running on `http://localhost:8000` and the docs can be found at `http://localhost:8000/docs`. You can find the details of all the endpoints in the docs. That's it! You are good to go.
## **Evaluating the Performance**
To see how the schema performs, I have created the `query_exec_time.py` file which will run all the queries mentioned above and prints the execution time of each query.
You can run the file by using the following command:
```
cd load-data/
python query_exec_time.py
```
> Note: The above command will run all the queries and prints the execution time of each query. You can also run each query individually by commenting out the other queries.
Here is the output of the above command:
```
+----------------------------------------------------------------------------------+-----------------------+-----------------------+
| Query Description | Execution Time (ms) | No of Rows returned |
+==================================================================================+=======================+=======================+
| Know all product families | 16.4297 | 18 |
+----------------------------------------------------------------------------------+-----------------------+-----------------------+
| Know all the services under a product family | 0.7792 | 2 |
+----------------------------------------------------------------------------------+-----------------------+-----------------------+
| Know all the services available in a region | 5.945 | 2 |
+----------------------------------------------------------------------------------+-----------------------+-----------------------+
| Know all the products of a particular service | 21.7781 | 5691 |
+----------------------------------------------------------------------------------+-----------------------+-----------------------+
| Know the price in different regions of a particular service | 36.1409 | 5691 |
+----------------------------------------------------------------------------------+-----------------------+-----------------------+
| Know all the products and their prices with a particular product attribute value | 113.66 | 647 |
+----------------------------------------------------------------------------------+-----------------------+-----------------------+
```
> Note: The above results are obtained on a 8GB RAM, 2.3 GHz Dual-Core Intel Core i5 8th gen processor. This may vary depending on the system configuration.
## **Future Scope**
- The database can be further optimized by doing more analysis on the queries that are being hit on the database and adding more indexes accordingly.
- Displaying the results in a more user-friendly way by removing the unnecessary columns may improve the performance.
- Since, all the offer files are being downloaded and loaded into the RAM, it is currently impossible to load the data of all the services. So, we can load the data of only the services that are being used by the users. This can be solved by downloading the offer files into the disk and loading the data into the database by reading the offer files from the disk. We can also implement multi-threading to speed up the process of loading the data into the database through which multiple offer files can be loaded into the database at the same time.
2 changes: 1 addition & 1 deletion load-data/create_database.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ def create_database_and_tables(cursor):
# Create product_attributes index on the 'product' table
cursor.execute(
"""
CREATE INDEX idx_product_attributes ON product ((CAST(product_attributes->>'$.operation' AS CHAR(255))));
CREATE INDEX idx_product_attributes ON product ((CAST(product_attributes->>'$.memory' AS CHAR(255))));
"""
)

Expand Down
6 changes: 3 additions & 3 deletions load-data/query_exec_times.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,9 +98,9 @@ def measure_query_execution_time(query_description, query, parameters=None):
"Know all the products and their prices with a particular product attribute value",
query_6,
{
"attribute_name": "operation",
"service_code": "AmazonS3",
"attribute_value": "MRAP-Dtransfer",
"attribute_name": "memory",
"service_code": "AmazonRDS",
"attribute_value": "1024 GiB",
},
)

Expand Down
2 changes: 2 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -16,10 +16,12 @@ pycodestyle==2.11.0
pydantic==2.3.0
pydantic_core==2.6.3
pyflakes==3.1.0
python-decouple==3.8
requests==2.31.0
sniffio==1.3.0
SQLAlchemy==2.0.20
starlette==0.27.0
tabulate==0.9.0
typing_extensions==4.7.1
urllib3==2.0.4
uvicorn==0.23.2

0 comments on commit 95ff703

Please sign in to comment.