Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SJEC_SESSION1_4SO21AI039_Preetham #84

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 0 additions & 17 deletions Dockerfile

This file was deleted.

14 changes: 14 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,3 +65,17 @@ One Day workshop on understanding Docker, Web Scrapping, Regular Expressions, Po
| 04:00 - 04:30 | [`Introduction to Github`](/docs/introduction_to_git_commands.md)
| 04:30 - 04:45 | `Q & A`
| 04:45 - 05:00 | [`Wrapping Up`](/docs/workshop1_home_work.md)


The project focuses on extracting specific information—date, title, author, and content—from the "https://blog.python.org/" website using Beautiful Soup, a web scraping library in Python. The extracted data is stored in a PostgreSQL database for structured storage. To streamline the deployment and ensure consistency across different environments, the entire setup is containerized using Docker. Docker Compose is to manage multi-container Docker applications, making it easier to define and run the PostgreSQL database and the web scraping script within isolated containers. This approach not only simplifies the development process but also enhances scalability and maintainability, allowing for seamless updates and modifications. By containerizing the application, the project ensures a portable and reproducible environment, reducing the complexities associated with dependency management and environment configuration.

steps to follow:
1. To build the image "sudo docker-compose build"

2. To run th container "sudo docker-compose up"

3. To login into postgres "sudo docker exec -it <container id> psql -U postgres -d postgres"

4. SELECT * FROM scrapped_contents

5. To stop the container "sudo docker-compose down"
26 changes: 0 additions & 26 deletions docker-compose.yaml

This file was deleted.

31 changes: 31 additions & 0 deletions docker-compose.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
version: '3'

services:
db:
image: postgres:latest
container_name: python-blog-db
environment:
POSTGRES_DB: postgres
POSTGRES_USER: postgres
POSTGRES_PASSWORD: postgres
ports:
- "5434:5432"
volumes:
- db-data:/var/lib/postgresql/data

web:
build:
context: .
container_name: python-blog-scraper
depends_on:
- db
environment:
DB_NAME: postgres
DB_USER: postgres
DB_PASSWORD: postgres
DB_HOST: db
DB_PORT: "5432"

volumes:
db-data:
driver: local
14 changes: 14 additions & 0 deletions dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
# Use an official Python runtime as a parent image
FROM python:3.9-slim

# Set the working directory in the container
WORKDIR /app

# Copy only the necessary files into the container
COPY web_scraping.py /app/

# Install any needed packages specified in requirements.txt
RUN pip install --no-cache-dir requests bs4 psycopg2-binary

# Run the Python script when the container launches
CMD ["python", "./web_scraping.py"]
Binary file removed docs/gitflow.png
Binary file not shown.
131 changes: 0 additions & 131 deletions docs/introduction_to_docker.md

This file was deleted.

58 changes: 0 additions & 58 deletions docs/introduction_to_git_commands.md

This file was deleted.

Loading