This repository contains the source code for Build a Simple Telegram Bot Using Scrapped Data from a Website
which was conducted on Sunday, 26th of July 2020 for web.dev event.
-
Code Editor (Visual Studio Code)
-
Telegram Account
-
Git clone this into your local machine
-
cd to the folder
-
run
pipenv shell
to start the virtual environment -
run
pipenv install
to install dependencies inPipfile.lock
. This will installBeautifulSoup4
,requests
,Python Telegram Bot
modules -
Get the Telegram Bot Token by consulting BotFather
-
Copy
.env.example
file content and paste into into a new file.env
-
Copy your
Telegram Bot Token
and assign it toBOT_API_KEY
variable in.env
-
Look for your bot in
Telegram
and start a chat with it, it should be available in Telegram if you followed step 5 correctly -
In the project root folder, run
python bot.py
to start the bot -
Send
/start
to your bot!
-
python
,pip
, andpipenv
commands should be available in your terminal -
Add the above command in your System Environment Path (Windows) if it is not available
-
Make sure you have the correct Python Version (3.8)
-
Identify the link of the website you want to scrape, in this project case, the link is
https://www.jobcentrebrunei.gov.bn/web/guest/search-job?q={keyword}
-
Identify the html elements in the website you want to obtain, take note of the html tags, classes, and ids
-
Take a look at
jobCentre.py
, you'll learn how to extract relevant data in the website
company = div.find_all("div", class_="jp_job_post_right_cont")[0].find_all("p")[0].find_all("a")[0].text
- For example, the above code I want to extract the text in the first
<a>
tag, in the first<p>
tag, in the first<div>
tag with a class ofjp_job_post_right_cont
The theory of the Telegram Bot API is quite simple.
- When a user sent a text to your bot, the text will be stored in Telegram server
- Your bot server will continuously sends request to Telegram server to ask for an update
- If there is a new user text sent to your bot, Telegram will send to your bot this text(s)
- The new text is then will be put into an Updater object
- The
Updater
object will then create a Dispatcher object - The
Dispatcher
object will then dispatch the updated text into Handler objects to tell your bot how what function to perform based on the user's text - Based on the user's action, you can instruct the bot to assume in a particular state
- When a bot in a state, it could only perform the necessary
Handlers
defined in the state - After the bot performed the necessary functions, you can tell the bot again the assume in a state
tl;dr - Your bot fetch new text from Telegram server, the bot will then do the necessary functions to handle the text. Once it's done, it will then assume in a state
I want to bring your attention to the main()
function in the bot.py
file.
def main():
# put the updates in an Updater object
updater = Updater(token, use_context=True)
# dispatch the updates
dp = updater.dispatcher
# defining the handler
conv_handler = ConversationHandler(
# what to do when user start the program
entry_points=[
CommandHandler('start', start),
],
# definining the states your bot can assume
states={
# the INITIAL state, if your bot in this state, it could only understand the /start command
INITIAL: [
# what to do when the user sends /start to the bot
MessageHandler(Filters.regex('/start'), start),
# run the start() function when user sends /start
],
# if your bot in this state, it could only understand the keyword sent by the user
# sending /start to the bot if it is in this state will make it to query for /start keyword
# in job centre
SEARCH_VACANCIES_REPLY : [
# what to do when user replied with a keyword
MessageHandler(Filters.text, search_vacancies)
# run the search_vacancies() function when user reply with a keyword
],
},
# What to do if the bot don't understand anything
fallbacks=[MessageHandler(Filters.text, fallback)]
)
# add a handler to the Dispatcher object
dp.add_handler(conv_handler)
# poll telegram server and continuously request for updates
updater.start_polling()
updater.idle()
Now, if you take a look at MessageHandler(Filters.regex('/start'), start)
, you can see that I have instructed the bot to run start(update, context)
function when it is in INITIAL
state and a user sent /start
message to the bot. Any function that is passed to a handler is called callback function. For every callback function, they need to have the update
, and context
arguments in the function parameter. What we are interested at is the update
argument. In the update
argument, we can access any data that is passed by the user when responding the bot when the bot assume in a particular state. You can refer here to see what other things that you can access in the update
argument. The update
argument can also be used to send something to the user.
To send a message to the user, you can call update.message.reply_markdown(string)
.
def start(update, context):
response = "Please enter your keyword"
# ask the user to enter the keyword
update.message.reply_markdown(response)
# return the state of the bot
return SEARCH_VACANCIES_REPLY
Let's take a look at search_vacancies(update, context)
def search_vacancies(update, context):
# This is where we get the text the user keyed in
searched_keyword = update.message.text
# perform web scraping here
# reply to the user
update.message.reply_markdown(formatted_jobs_string)
return INITIAL
You can see that I retrieved the text the user entered using update.message.text
and put the text into searched_keyword
variable
-
Send an options that the user can select by using ReplyKeyboardMarkup
# Defining the state # The int you pass to range must be the count of the available states for your bot STATE_ONE, STATE_TWO, SELECT_OPTION_STATE = range(3) # Define your keyboard layout options_keyboard = [ ['Option 1'], ['Option 2'], ['Option 3'] ] # Define a ReplyKeyboardMarkup object options_keyboard_markup = ReplyKeyboardMarkup(options_keyboard, one_time_keyboard=True) # return the keyboard in your action def some_function(update, context): theText = update.message.text # perform something update.message.reply_markdown("Please select some options", reply_markup=options_keyboard_markup) return SELECT_OPTION_STATE def option_one_selected(update, context): # do something return INITIAL def option_two_selected(update, context): # do something return INITIAL def option_three_selected(update, context): # do something return INITIAL # add new state in your main() function def main(): ...... conv_handler = ConversationHandler( ...... states = { ....... SELECT_OPTION_STATE: [ MessageHandler(Filters.regex('^Option 1$'), option_one_selected), MessageHandler(Filters.regex('^Option 2$'), option_two_selected), MessageHandler(Filters.regex('^Option 3$'), option_three_selected), ], ....... } ...... )
This bot is actually quite simple and I haven't explored a lot of the functionalities outlined in the official documentation. I recommend everyone to have a look at the repository and try to learn from some of the examples in the repo. Personally, I refer to this example when developing this bot. You can have a look here to view other examples.
Have Fun!