This is a template for making reddit bots in python, and deploying them to AWS Lambda.
This bot template has some nice features:
- Everything is already set up ready to go. You just:
- run the cookiecutter template (explained below)
- set up Amazon cloud credentials on your computer
- run the deployment script
Bam! Now you have a working reddit bot. This particular reddit bot looks at
r/bottest
and responds to any self post that mentions the word potato. Now you can just go look at that specific potato logic, see how it works, and change it into what you want your bot to do. That's easier than writing everything yourself.
This bot also checks up on comments after it makes them. If your bot's comment gets downvoted to negative infinity, you'd want to know. This system sends you an email when your comment gets downvoted to negative.
This bot also checks on the post you commented on, after you comment. Perhaps the submitter read your comment and changed their post. You may want to update your comment to avoid looking silly.
It checks up on comments very frequently for new comments, and less frequently for old comments. This is to provide the perfect compromise between low latency updates, and not using the reddit API too much. (You'll get throttled if you try to check up on all your bot's posts, every minute)
Running your own server can be a lot of hassle, and results in downtime, and hardware costs. I initially ran my bot on a beaglebone server, but a lightening strike in my street broke the server. For most bots, you can run using serverless Lambda functions in Amazon's cloud, and you won't have to pay. (They have a free usage tier)
If you don't know what Amazon's "Lambda functions" are: You give them code, and they run it. You don't need to worry about an operating system (sudo apt-get install blah
). You don't need to worry about scaling. If your reddit bot wants to comment on a thousand posts per second today, and nothing tomorrow, Amazon will happily handle that. (Reddit will definitely throttle that, but my point is that you don't need to worry about scaling, and you only pay for the seconds of computation that you use.)
You can try to run a simple script as a cron job on a simple server. As mentioned earlier, there are downsides to a physical server, and a virtual machine is more expensive than lambda functions.
This tooling is very customisable, and can be extended to use any AWS resource. You want to send an SMS whenever your bot spots a certain type of post? That's easy, just modify the cloudformation template. You want to use an image recognition API? Same again, it's far easier than with other tools.
This tooling parallelises all aspects of lambda deployment:
- virtual environment creation
- zipping
- uploading
- testing
Let's start by just installing the template potatobot.
It responds to self posts in r/bottest which contain the word potato
.
- This repo is a cookiecutter template. Install the
cookiecutter
library. (Typically withpip install cookiecutter
orpip install cookiecutter --user
) - Create a new reddit account to run your bot as. (It's neater than using your own)
- Log in to reddit in your browser with this account. Click 'preferences', then select 'apps'
- Fill out the information. Once you have 'created' the app, take note of the information you see. You'll need it for the next step.
- In a terminal on a Linux computer, run:
cookiecutter https://github.com/mlda065/paragraphiser_bot_aws.git
. (I haven't tested this on a windows computer. It might work, I dunno. Windows is pretty useless) You'll be asked to fill out some information:bot_name
: The name of your bot. It must contain only letters, numbers, dashes and start with a letteraws_region
: Amazon's cloud has many regions. The cheapest is generallyus-east-1
. Here is the full list of possibilitiesdirectory_name
: The name of the folder you want to download the template into. The folder should not yet existemail_for_alerts
: When there's an error with your bot, or a comment by your bot is downvoted into negative territory, an email will be sent here.reddit_client_id
: Get this from the previous stepreddit_client_secret
: Get this from the previous stepreddit_password
: the password for your bot's reddit account (which you just created). This will be saved in plain text, so make sure you don't the same password for anything elsereddit_username
: the username of the reddit account you just createduser_agent
: An arbitrary string used to identify your bot, and which version. e.g. "My Potatobot v0.1"
The template bot scans self posts in r/bottest, and comments on that post if it contains the word 'potato'. If that post is later edited, it updates the comment.
- Set up your local machine with the credentials of your aws account if you haven't already, using
aws configure
Let's install the python packages we need for the tooling.
- Run
./makeForDeploy.sh
in a terminal. (Alternatively just install theboto3
library into your OS) - This will create a virtual environment in
./env
- Activate this virtual environment with
. ./env/bin/activate
This template comes with a fully fledged AWS deployment tooling. It's more general than just reddit bots or just lambda functions. Read 'How it works' to understand the detail'.
You can deploy to different stages (e.g. dev
vs prod
). If you don't know what that means, just use dev
.
To deploy the bot:
- run
python deploy.py -s dev
-s dev
tells the system to deploy to the dev stage. You can replacedev
withprod
or any other string.- This could take 5 minutes if it's your first run
You should see all green when the command finishes.
Submit a post in r/bottest and wait for the bot to respond. It's currently polling for new posts every 10 minutes, so you may have to wait 10 minutes.
If you don't want to wait for 10 minutes:
- log into the AWS console in a browser
- go to Lambda
- search for
checkForNew
- Select the corresponding lambda
- trigger it (you'll have to set up a 'test' configuration. This is fairly straightforward. Just use the default payload, or an empty payload.)
- If you see red in the console, read the error message there, then redeploy. For more advanced debugging methods, read 'Debugging and logs'
To change what the criteria is, go into data/util/common.py
.
generate_reply
takes in a praw submission object. (praw is the library used to talk to the reddit API).submission.id
is the unique identifier of each post. This is a short string which appears in the url in your browser when you look at the post.submission.is_self
isTrue
orFalse
depending on whether the post is a self post (text) or non-self post (link)submission.selftext
only exists for self posts. This is the body of the post.- read the comments at the top of this function, and look at the code that's there to understand it.
- Apply your bot logic here. If you want your bot to ignore this submission, return None. If you want your bot to comment on this post, return a dictionary like
{'original_reply':msg}
.msg
is the markdown formatted comment your bot will make. (You don't need to make it reply. The function callinggenerate_reply
will do that). You may include other key/value pairs within this dict. That exact dict will be passed later toupdate_reply
as thedata
argument
- After commenting on a post, the bot comes back to see how it's gone.
- The checks are frequent for recent comments, and infrequent for old comments. To see the specific times that the bot goes and checks on it's past comments, look in function
schedule_checks
indata/lambda/checkForNew/main.py
- It looks at the score of the comment. This is upvotes - downvotes, plus comment replies. If someone replies to your bot's comment with 'good bot' or 'bad bot', that counts as an up/down vote. If the post is voted into negative score, you'll get an email to alert you.
- It looks at the submission again. Perhaps the OP modified their post after reading your bot's comment. If so, maybe your bot's comment is no longer applicable. If you want to update your comment, return a dictionary which includes
{'updated_reply':msg}
(among other keys). Your comment will be updated to say the contents ofmsg
. This dict will be returned the next timeupdate_reply
is called.
- The checks are frequent for recent comments, and infrequent for old comments. To see the specific times that the bot goes and checks on it's past comments, look in function
- Add any tests you want to do to validate your code into the
unit_tests
function indata/util/common.py
- The example uses a mako template to put data into the reply. Look at the code for
update_reply
to see how this works. The files of the templates aredata/util/replyTemplateNew.mako
andreplyTemplateUpdate.mako
The full deployment can take about 5 minutes, which is a frustratingly long deployment cycle. You can skip steps. There are 4 steps:
- The virtual environment of each lambda function is built by executing the
makescript.sh
file within each lambda function's folder indata/lambda/
. You only need to run this if you have modified themakescript.sh
file (to add new python libraries), or modified anything indata/util
ordata/credentials
(which gets copied in to thedata/lambda/x/include
folders). If you have not done either of those things since your last deployment, use the-b
flag. e.g.python deploy.sh -s prod -b
- Each lambda gets zipped up into a
.zip
file. If you have modified the python code in anydata/lambda/x/main.py
file, then you need to do this step. If you needed to do the previous step, then you need to do this step. Otherwise you can skip this step with the-z
flag. e.g.python deploy.py -s prod -bz
. - The zip of each lambda gets uploaded to S3. If you needed to do either of the previous steps, you need to do this step. If you only changed the cloudformation template (
data/cloudformation/stack.yaml
), then you can skip this step with the-u
flag. e.g.python deploy.py -s prod -bzu
- The cloudformation script is applied to the stack. The deployment script pushes whatever is the latest version of each zip from S3. (This is something normal cloudformation does not do. A benefit of using this tooling.) This step can't be skipped, it is compulsory. Unfortunately this step is very super quick.
If you're finding that the upload step is taking too long, consider doing your development on an EC2 virtual machine. Uploads to S3 are faster that way.
After thorough testing, the subreddit the bot browses in can be changed by going into data/cloudformation/stack.yaml
. Search for subreddits
.
Replace "bottest"
with your new subreddit name(s).
If you want the bot to be active in multiple subreddits, seperate them with a comma. Leave off the "r/"
There are unit tests which are conducted whenever a lambda is deployed. If the tests fail, you will be told, and will see the output of one of the tests.
In the main.py
file for each lambda (data/lambda/x/main.py
) there is a function called lambda_handler()
. The general format is:
def lambda_handler(event,contex):
if ('unitTest' in event) and event['unitTest']:
print('Running unit tests')
# add your unit tests here
else:
print('Running main (non-test) handler')
main() # this handles the actual workload
The deployment tooling invokes the lambda function in the lambda environment, with all the same permissions as normal.
When the tests fail, you will see the output of one of the failed tests printed by the deployment tooling.
If you want to see more detailed logs of the tests, or of the production invocations,
open up AWS in your browser, and go to Cloudwatch. Select Logs, then filter by
/aws/lambda/<botname>-stack-
.
There are 5 lambda functions.
- checkForNew
- This polls reddit every 10 minutes for new posts. This frequency is set in
data/cloudformation/stack.yaml
- Each new post is passed to
util/common.py
generate_reply()
to determine whether the bot should reply to it or not. If so:- the bot replies
- data about the post and the reply is saved in the
botname-stack-postHistory
dynamodb
table - many entries are saved in the
botname-stack-schedule
table. One for each time that the bot should come back and check the comment and the post. One feature of this system is that these checks are very frequent for new comments, and less frequent for old comments. The large integer in the index of this table is a unix timestamp.
- This polls reddit every 10 minutes for new posts. This frequency is set in
- poll
- every 60 seconds this function looks at what's in
botname-stack-schedule
. If there are any timestamps in the past or within the next 60 seconds, this bot checks the posts corresponding to those timestamps. (Duplicates merged) It doesn't check it directly. For each post,poll
invokescheckOldOne
with a payload that contains one post id.
- every 60 seconds this function looks at what's in
- checkOldOne
- This is invoked by
poll
. Or can be manually invoked with a payload of{"post_id":"xxxxxxx"}
if you want to check on a particular post. (Wherexxxxxxx
is the post id in the url of that submission) - This checks whether your bot's comment on that post was downvoted below 0. If so you get sent an email.
- This also checks whether the original post has been changed.
- This is invoked by
- errorHandler
- This monitors whether any other functions fail (i.e. raise a python exception). If they do, you get sent an email. But if they fail again, you won't get sent a second email. This is deliberate, because you don't want to get an email every minute if
poll
keeps encountering the same error. Once you deploy, the emails will start again.
- This monitors whether any other functions fail (i.e. raise a python exception). If they do, you get sent an email. But if they fail again, you won't get sent a second email. This is deliberate, because you don't want to get an email every minute if
- failer
- This lambda is not normally invoked. If you go into the console and invoke it manually, it will fail. This is so that you can test that the errorHandler is working.
As mentioned earlier, the deployment tooling is a fully customisable, generalisable AWS deployment kit. I've looked into alternatives (e.g. Zappa, serverless.com) and I couldn't find anything that did quite what I want.
Here's how it works:
- The data for each lambda function is saved in a folder in
data/lambda
data/lambda/$LAMBDA/makescript.sh
is a bash script to create a virtual environment. This is useful if you want to create a different virtual environment for each lambda in your project. (e.g. if Lambda A requires large library X, and Lambda B requires large library Y, and X+Y is too large for lambda). This is invoked by the deployment tooling if you don't use the-b
flag. The makescripts are also how files fromdata/util
ordata/credentials
get copied into each lambda. Those folders are for any code which is used by multiple lambdas. The deployment tooling executes the makescripts for all lambdas in parallel.data/lambda/$LAMBDA/main.py
has the main logic, and any unit tests.
- Each lambda gets zipped up into a
lambda.zip
file. This is the virtual environment (created by themakescript.sh
),main.py
and anything indata/lambda/$LAMBDA/include
. The deployment tooling zips all lambdas in parallel. - The zip of each lambda gets uploaded to S3, all in parallel. The bucket should be set to have versioning turned on.
- The cloudformation template (
data/cloudformation.stack.yaml
) is applied. (Either stack creation or change set). Normally if you upload a new version of the lambda zip to S3 and apply a cloudformation template, it will not update the lambda to use what's in S3. This tooling keeps track of the version of lambda you just uploaded, and passes that as a parameter to the cloudformation yaml template. Everything for each project is kept contained within one cloudformation template. (except the S3 bucket) - After the latest zip is pushed into lambda, all previous versions of that lambda are deleted. This is important because otherwise you'd build up many GB of version history in S3, which would cost you money. (If you delete a file in S3, previous versions are still there, but the browser interface makes you think they're gone)
I tried to handle the secrets for reddit properly. I really did. But it's bloody hard to pass secrets into Amazon's lambda function with enterprise-grade security. Amazon's key handler is really confusing to use.
So I gave up and just saved the credentials for reddit in credentials/praw.ini
, which is copied into data/lambda/checkForOne/include/praw.ini
and data/lambda/checkOldOne/include/praw.ini
by the deployment tooling. Then it's added to the zip file for the lambda function.
This is kind of bad practice, but hey, this is just a reddit bot. That's why you should make a new reddit user just for the bot, and not use the same password anywhere else.
The AWS IAM permissions of the resources are very loose. I was too lazy to tighten them. (If you feel up for it, go change ExecutionRole
in data/cloudformation/stack.yaml
).
But why bother? If you have nothing in this AWS account except for your reddit bot, what's the worst case scenario? Someone takes control of your reddit bot.
If that happens, log in to reddit and change your bot's password. Tada!
If you do have other stuff in the same AWS account, well ... don't! It's good practice to have a seperate AWS account for each project. It provides excellent security through iron-clad partitioning. It also means that if you hit a limit in one project (e.g. max writes to dynamodb), your other projects won't come grinding to a halt.
- Log into AWS in a browser.
- Go to "Cloudformation"
- Delete the stack with the relevant name
- Go delete the S3 bucket with the code
That's it! My tooling keeps everything together.
If my documentation isn't clear enough, or you have a particular request, just create an 'issue' in this repository.
- add S3 bucket name to cookiecutter
- deployment tool should check S3 bucket does versioning
- cloudformation should roll back if the unit tests fail (note: this requires keeping old versions of each zip)