Data_Analysis_LinkedIn_Job_Postings
This documentation will outline the usage of the LinkedIn_Job_Posting_Scrape. This project is set up as a tutorial for demonstration or learning purposes.
This project is intended to show how HTML text mining/scraping can be obtained, put into a database such as PostgreSQL, and used for data analysis.
Programs used in this project:
- Jupyter Notebook
- PostgreSQL
- Docker
- Python coding language, plus libraries
Skills used in this project:
- HTML text mining/scrapng
- Object orientated programming
- Data cleaning
- Applied Statisical Analysis
- Interpretation
Rquirements for running this container:
- Docker program
- Dockerfile - included
- docker-compose.yml - included
- Data_analysis_scraping_linkedin_job_posting-docker.ipynb - included
Direction for creating and running container:
- Create new directory
- Put supplied Dockerfile, docker-compose.yml, and Data_analysis_scraping_linkedin_job_posting-docker.ipynb inside of new directory
- On the command line navigate to directory
- Run command: docker-compose up --build
- In the output there will be a URL with an IP address, example: http://0.0.0.0:8888/lab?token=################################################
- Open this URL in a web browser, it willlaunch jupyter notebook.
- Before you run the notebook you will need to add your IP address. To get your IP address, run on the command line: Docker inspect containerID | grep "IPAddress" | egrep -o '[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}.[0-9]{1,3}'. You can get your postgres docker container ID by running on the command line: docker ps -a | grep "postgres:latest" | awk '{ print $1 }'.
- In the third block section replace: IP_Address = 'YOUR IP ADDRESS' with your actual IP address. Example: IP_Address = '127.0.0.0'
- Run each code block consecultively in order.
- Code will generate a scraped_table.txt file for demonstration, yet data analysis will be ran on the supplied scraped_table-Docker.txt for reproducibility. The scraped_table-Docker.txt file was generate from a previous run.