Skip to content

Latest commit

 

History

History
12 lines (6 loc) · 1.86 KB

File metadata and controls

12 lines (6 loc) · 1.86 KB

Google News Content Scrape and Analyze with Azure OpenAI Service (GPT-3)

This demo repository illustrates how to use Python to scrape news articles from Google based on a given keyword. The scraped articles are then processed by Azure OpenAI Service (AOAI)'s GPT-3 model, which generates concise summaries of the main points. The summaries are then formatted and sent via email using MailJet API.

This demo uses two Python libraries to scrape the latest news articles from Google and get their full text content. The first library is GoogleNews, which allows us to search for news articles based on a keyword and get their titles and URLs. The second library is Newspaper3k, which enables us to download the HTML pages of the articles and parse them to get their text content. For this demonstration, I decided to scrape the news about GPT, a family of powerful natural language models developed by OpenAI. This topic is very popular and hard to keep up with as normal humans, because there are so many new developments and applications of GPT every day.

This demo also shows how to use the Natural Language Toolkit (NLTK) library to perform chunking, a technique that divides long articles into smaller segments based on linguistic cues. This allows us to overcome the 4000-token limit of GPT-3, which is the maximum number of words that it can process at a time.

alt text

Enjoy!