INSTALL.backend

** STEP 1: INSTALL MOST OF THE STUFF **

To run the backend, you will need:

- a Redis 2.8+ server
- a CouchDB server
- a Ruby 1.9 installation (use of rvm is suggested)
- ZeroMQ 4.0.5 (earlier API-compatible versions may work, but they have not been tested)
- Bundler
- ExecJS supported runtime (for the dashboard)
  (see https://github.com/sstephenson/execjs)
- Python 3.6+ and websockets 7.0 (for the dashboard WebSocket)

(Little known fact: ArchiveBot is made to be as hard as possible to set
up. If you have trouble with these instructions, drop by in IRC for
help, file an issue, or submit improvements through a pull request. You
can also take a look at the .travis.yml integration test config file.)

Quick install, for Debian and Debian-esque systems like Ubuntu:

    sudo apt-get update
    sudo apt-get install bundler couchdb git tmux python3
      (if you might build ZeroMQ from source, add the next line:)
    sudo apt-get install libtool pkg-config build-essential autoconf automake libzmq-dev
    git clone https://github.com/ArchiveTeam/ArchiveBot.git
    cd ArchiveBot
    git submodule update --init
    bundle install
    pip install websockets==7.0  # Or apt install python3-websockets, or whichever method you prefer, but it must be version 7.0.


** STEP 2: INSTALL REDIS **

Next, install Redis. You can build it from source or you can attempt to use a package. 
If you want to try a package, do:

   sudo apt-get install redis-server

If you want to build from source, here's how you can do that, using version 2.8.17 
on Debian/Ubuntu as an example:

    sudo apt-get install build-essential tcl8.5
    wget http://download.redis.io/releases/redis-2.8.17.tar.gz
    tar xzf redis-2.8.17.tar.gz
    cd redis-2.8.17
    make
    make test
    sudo make install

If you also want to set up Redis as a daemonized (always-running) service on your 
Debian/Ubuntu machine on port 6379, follow up with this:

    cd utils
    sudo ./install_server.sh
    (and then hit enter a bunch of times to accept the default values)


** STEP 3: CONFIGURE COUCHDB **

Next we need to configure CouchDB.  But first, check to make sure it installed 
correctly (which should have been done back in step 1) and that it is currently 
running on your machine, by typing this:

    curl http://127.0.0.1:5984/

If CouchDB is indeed running, you should get back a message that looks something 
like this:

    {"couchdb":"Welcome","uuid":"610e43c2778c3be750ad5fff8cadd108","version":"1.5.0",
    "vendor":{"version":"14.04","name":"Ubuntu"}}

Now we need to load up CouchDB with the "archivebot" and "archivebot_logs" databases.  
You can do this from the command line:

    curl -X PUT http://127.0.0.1:5984/archivebot
    curl -X PUT http://127.0.0.1:5984/archivebot_logs

If that works, you should get this back as a response each time:

    {"ok":true}

Now, go to the db/design_docs folder in ArchiveBot:

   cd db/design_docs

(You might have installed it somewhere like /home/archivebot/ArchiveBot/db/design_docs .)

The four design documents in there need to be uploaded to the new archivebot database 
you just created. You can use CURL or you can use the Futon web interface at 
http://localhost:5984/_utils/index.html where you can copy and paste the content 
of the JSON files into new documents manually.  If you want to use CURL instead, 
do this:

    grep -v _rev archive_urls.json > /tmp/archive_urls.json
    grep -v _rev ignore_patterns.json > /tmp/ignore_patterns.json
    grep -v _rev jobs.json > /tmp/jobs.json
    grep -v _rev user_agents.json > /tmp/user_agents.json
    curl -X PUT http://127.0.0.1:5984/archivebot/_design/archive_urls -d @/tmp/archive_urls.json
    curl -X PUT http://127.0.0.1:5984/archivebot/_design/ignore_patterns -d @/tmp/ignore_patterns.json
    curl -X PUT http://127.0.0.1:5984/archivebot/_design/jobs -d @/tmp/jobs.json
    curl -X PUT http://127.0.0.1:5984/archivebot/_design/user_agents -d @/tmp/user_agents.json


** STEP 4: SET UP THE IRC SERVER **

Finally, you're going to need to install an IRC server (until such time as the 
ArchiveBot code is changed to allow for alternate ways of sending it instructions, 
such as Twitter).  On Debian/Ubuntu, do this:

    sudo apt-get install ircd-hybrid
    sudo /etc/init.d/ircd-hybrid restart

If you need to add the config file, it is here:

   sudo pico /etc/ircd-hybrid/ircd.conf

If you don't have a command line IRC client, and you want one for ease of use, 
you can optionally install IRSSI:

   sudo apt-get install irssi

Once that's all in place, run the following:

    redis-server
      (unless it's already running -- and make sure that it does not have a password)
    cd /home/archivebot/ArchiveBot/bot
    bundle exec ruby bot.rb \
      -s 'irc://127.0.0.1:6667' \
      -r 'redis://127.0.0.1:6379/0' \
      -c '#archivebot' -n 'MyArchiveBot'

This means that the 'MyArchiveBot' bot should join the #archivebot IRC channel, which is 
running on the IRC server that you just set up.

Congrats, you now have a bouncing baby bot!


** STEP 5: SET UP THE WEB DASHBOARD **

You can run the dashboard webapp on the same machine, or a different machine, or skip it 
altogether.  It's up to you.  If you want to run it, then from the root of ArchiveBot's 
repository (which is usually /home/archivebot/ArchiveBot/), run:

    cd /home/archivebot/ArchiveBot/
    export REDIS_URL=redis://127.0.0.1:6379/0
    export UPDATES_CHANNEL=updates
    export FIREHOSE_SOCKET_URL=tcp://127.0.0.1:12345
    plumbing/updates-listener | plumbing/log-firehose

In another terminal, run

    bundle exec ruby dashboard/app.rb -u http://127.0.0.1:8080
       (replace 127.0.0.1 with your web dashboard host's IP address, if needed)

For the WebSocket, in another terminal, run:

    export FIREHOSE_SOCKET_URL=tcp://127.0.0.1:12345
    plumbing/firehose-client | python3 dashboard/websocket.py

websocket.py will print debugging info if there is an environment variable WSDEBUG=1.


** STEP 6: LOGS AND MAINTENANCE STUFF **

The last part of ArchiveBot is a set of maintenance tasks.  They are currently 
split between the cogs and plumbing directories; eventually, they will all 
move to plumbing.

In cogs:

1. Configure twitter_conf.json if you want to post Twitter Tweets.
2. Run the cogs with bundle exec ruby cogs/start.rb.

In plumbing:

1. bundle install (yes, again -- the plumbing currently has its own Gemfile)
2. export REDIS_URL=redis://127.0.0.1:6379/0
3. export UPDATES_CHANNEL=updates
4. In separate terminals, tmux panes, or screen sessions, run
   a. ./analyzer
   b. ./trimmer > /dev/null
   c. COUCHDB_URL=http://127.0.0.1:5984/db-name ./recorder

The trimmer prints all the data it trims to standard output in the form

  IDENT JSON
  IDENT JSON
  ...

For the EFNet ArchiveBot, we redirect it to /dev/null because we currently 
don't do anything with that data.

To upgrade, run `git pull` and restart all programs.

bot.rb, dashboard/app.rb, and cogs/start.rb accept a --help option. Run
them with --help to see accepted options.