Adding Postgres Auto-Reconnect to the Logotron's IRC Bot

January 6th, 2022

I've been using asciilifeform's logotron to power my IRC logger for a while now. The setup consists of an IRC bot and a Flask-based web app, both connected to a Postgres database. I run it on a simple Rockchip machine alongside my Bitcoin network crawler, which also uses Postgres. For some reason every couple of weeks1 the bot would lose its DB connection and have to be manually restarted2 in order to continue eating log lines. Same with the crawler. Potentially there's a performance issue somewhere, causing the connection to timeout and close in some cases. I know it's not the logotron itself, since asciilifeform ran the same code on the same box for almost a year and never had the DB connection issue, but perhaps both apps sharing the same small RK is just too much for it to handle at times.

In any case, it was a problem for me, as well as others—and I figured it couldn't hurt for the bot to have auto-db-reconnect functionality—so I came up with a small patch to remedy the issue. Below is the tidied up version of what I currently have running.

1 @@ -72,6 +72,8 @@
2 DB_Name = cfg.get("db", "db_name")
3 DB_User = cfg.get("db", "db_user")
4 DB_DEBUG = cfg.get("db", "db_debug")
5 + DB_Reconn_Tries = int(cfg.get("db", "db_reconnect_max_tries"))
6 + DB_Reconn_Delay = int(cfg.get("db", "db_reconnect_delay"))
7 # Logism:
8 Base_URL = cfg.get("logotron", "base_url")
9 App_Root = cfg.get("logotron", "app_root")
10 @@ -85,21 +87,54 @@
11
12 ##############################################################################
13
14 # Connect to the given DB
15 -try:
16 - db = psycopg2.connect("dbname=%s user=%s" % (DB_Name, DB_User))
17 -except Exception:
18 - print "Could not connect to DB!"
19 - logging.error("Could not connect to DB!")
20 - exit(1)
21 -else:
22 - logging.info("Connected to DB!")
23 +db = None
24 +
25 +def conn_db():
26 + global db
27 +
28 + tries = DB_Reconn_Tries
29 +
30 + while True:
31 + # Connect to the given DB
32 + try:
33 + db = psycopg2.connect("dbname=%s user=%s" % (DB_Name, DB_User))
34 + except Exception:
35 + print "Could not connect to DB!"
36 + logging.error("Could not connect to DB!")
37 + if tries > 0 or DB_Reconn_Tries == -1:
38 + tries = tries - 1
39 + time.sleep(DB_Reconn_Delay)
40 + continue
41 + else:
42 + exit(1)
43 + else:
44 + logging.info("Connected to DB!")
45 + break
46 +
47 +conn_db()
48
49 ##############################################################################
50
51 def close_db():
52 db.close()
53
54 +def ensure_db_is_alive():
55 + # Ping the db to ensure it's alive and connected
56 + logging.debug("Checking DB connection status...")
57 + try:
58 + cur = db.cursor()
59 + cur.execute('SELECT 1')
60 + except (psycopg2.OperationalError, psycopg2.InterfaceError) as e:
61 + pass
62 +
63 + # If connection is alive db.closed will equal 0
64 + if db.closed == 0:
65 + return True
66 +
67 + # Otherwise, attempt to reconnect
68 + logging.debug("No DB Connection!")
69 + conn_db()
70 +
71 def exec_db(query, args=()):
72 cur = db.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
73 if (DB_DEBUG): logging.debug("query: '{0}'".format(query))
74 @@ -491,6 +526,8 @@
75 def save_line(time, chan, speaker, action, payload):
76 ## Put in DB:
77 try:
78 + ensure_db_is_alive()
79 +
80 # Get index of THIS new line to be saved
81 last_idx = query_db(
82 '''select idx from loglines where chan=%s

With this running locally I was able to bring down the DB, send a few IRC messages, restart the DB, and have the messages that were sent while the DB was offline recorded once the DB came back up. I'm going to give it a few weeks in production to see how it fares. If it does indeed solve the issue (and not cause any others) I'll package it as a vpatch and publish it here along with the rest of the logotron tree.

  1. Sometimes longer, but also sometimes much shorter. I suspect it has to do with the usage/load of the sites. []
  2. This happened often enough that I even wrote a small script to pull from asciilifeform's channel logs when mine fell behind:

    #!/bin/bash
    
    batch_size=500
    
    start_line=$1
    end_line=$2
    chan_url=$3
    results_file=$4
    
    if [ $1 -ge $2 ]; then
            echo "end line must be greater than the start line"
            exit 1
    fi
    
    num_lines=$(( $end_line - $start_line ))
    num_batches=$(( $num_lines / ($batch_size + 1) + 1 ))
    last_batch_size=$(( $num_lines % ($batch_size + 1) ))
    
    echo "starting at ${start_line}, ending at ${end_line}, for a total of ${num_lines} lines"
    echo "will download from ${chan_url}?istart=${start_line}&iend=${end_line} in ${num_batches} batch(es). last batch will be ${last_batch_size}"
    
    batch=1
    while [ $start_line -lt $end_line ]; do
            if [ $batch -eq $num_batches ]; then
                    cur_end=$(( $start_line + $last_batch_size ))
            else
                    cur_end=$(( $start_line + $batch_size ))
            fi
    
            echo "pulling ${start_line} - ${cur_end}"
            curl -s "${chan_url}?istart=${start_line}&iend=${cur_end}" >> $results_file
    
            start_line=$(( $cur_end + 1 ))
            batch=$(( batch + 1 ))
    done
    

    []

« Patch Fixes for the Logotron and Bitdash Crawler
New Logotoron Patches: Bug Fixes and Enhancements »

One Comment

  1. [...] patch just formalizes the fix I added to my bot to deal with the recurring postgres connection timeouts3. Despite only 50% of [...]

Leave a Reply

*
*

You can use the following HTML tags in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>