Youtube Fetch API

API to get latest videos of certain topic from Youtube (fetched from the Youtube Data API)

Design

Tech Stack

Backend : Flask
Database : PostgreSQL, Redis, Elastic Search
Tools : Celery, Celery Beat, Docker & Docker-Compose

Why This system is Scalable ?

Celery is better than cron jobs because it can be easily distributed across machines with a centralised cache (like ElasticCache by AWS).
Cache also stores exhausted keys status in multi key support to save network calls when celery is deployed on multiple instances.
Elastic Search is the most sought open source search tool. Leverages B+ Trees indexing at its core.
Bulk Insert in DB allows inserting large number of items In single attempt
APIs use Cache to reduce network I/O calls when fetching data from Elastic Search or DB.
More points below on how to optimize it further..

Project Structure

youtube_fetch         
|
├── Contains
|   └── docker-compose.yml                # Docker Compose File     
|   ├── .gitignore                        # Gitignore file to stop tracking unnecesarry files
│   ├── services                          
|       ├── yt-api                        
|           ├── Dockerfile                # Docker File
|           ├── entrypoint.sh             # Entrypoint for Docker Container
|           ├── requirements.txt          # Requirements file for the project
|           ├── yt-api                    
|               ├── project               
|                    ├── __init__.py      # Initialization file for all services of Youtube API service
|                    ├── .env             # Environment File
|                    ├── celerybeat.py    # Celery Beat (Scheduler) Configuration file
|                    ├── config.py        # Youtube service project configuration file
|                    ├── es_utils.py      # Elastic Search Utils file
|                    ├── models.py        # Models file for database
|                    ├── tasks.py         # Async tasks background file
|                    ├── utils.py         # Utilities file
|____________________

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Firstly, turn the Docker Daemon on:

git clone https://github.com/saket13/youtube_fetch
cd youtube_fetch
chmod +x services/yt-api/entrypoint.sh

Running Docker Containers, Creating DB and Elastic Search Index

docker-compose up -d --build
docker-compose exec web python manage.py create_db
docker-compose exec web python manage.py create_es_index

Screenshots

Search API:

Paginated Videos API:


Paginated View - Page-1	Paginated View - Page-2

Containers:

| Containers | Scheduler |

Testing

Use postman to do a GET request:

Query Params
URL_1 = http://127.0.0.1:5000/videos?page=1&limit=5 <br/>
URL_2 = http://127.0.0.1:5000//search?q=lanka

Here, params in URL-1 represent the page number and limit per page for pagination
In URL-2 query string to be searched.

Progress

Async Worker to add latest videos every min and store in DB with index
Paginated GET API to fetch videos in descending order of published date time
Basic search API to search the stored videos using their title and description
Dockerize the Project
Multi Key Support
Optimize search API for partial search in title or description

Further Optimizations (For this Use Case - To the best of my knowledge)

Application Level

Using AsyncIO and its libraries for handling HTTP requests asynchronously using event loop and coroutines.
Implementing Payload Compression to save amount of data transferred.
Decoupling fetching of videos from Youtube API and saving to DB using Redis and Celery, like a simple Pub-Sub to scale more.
Using a faster runtime of Python something like JIT compiler.
Sharing frequently accessed memory of application instances.

Infra Level

Use a load balancer and a number of instances to evenly distribute load and increase efficiency of APIs
Use Nginx as reverse proxy and gunicorn to manage multiple replicas of the app on same instance.
RDS should be centralized too and Master-slave architecture can also be used to distribute the load.
Redis Cluster should be used to avoid Redis failovers instead of single Redis node.
Using ELK stack for unified logging across the product.

And many more.......

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
images		images
services		services
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Youtube Fetch API

Design

Tech Stack

Why This system is Scalable ?

Project Structure

Getting Started

Prerequisites

Running Docker Containers, Creating DB and Elastic Search Index

Screenshots

Testing

Progress

Further Optimizations (For this Use Case - To the best of my knowledge)

About

Releases

Packages

Languages

saket13/youtube_fetch

Folders and files

Latest commit

History

Repository files navigation

Youtube Fetch API

Design

Tech Stack

Why This system is Scalable ?

Project Structure

Getting Started

Prerequisites

Running Docker Containers, Creating DB and Elastic Search Index

Screenshots

Testing

Progress

Further Optimizations (For this Use Case - To the best of my knowledge)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages