Adding Postgres Auto-Reconnect to the Logotron's IRC Bot

January 6th, 2022

I've been using asciilifeform's logotron to power my IRC logger for a while now. The setup consists of an IRC bot and a Flask-based web app, both connected to a Postgres database. I run it on a simple Rockchip machine alongside my Bitcoin network crawler, which also uses Postgres. For some reason every couple of weeks¹ the bot would lose its DB connection and have to be manually restarted² in order to continue eating log lines. Same with the crawler. Potentially there's a performance issue somewhere, causing the connection to timeout and close in some cases. I know it's not the logotron itself, since asciilifeform ran the same code on the same box for almost a year and never had the DB connection issue, but perhaps both apps sharing the same small RK is just too much for it to handle at times.

In any case, it was a problem for me, as well as others—and I figured it couldn't hurt for the bot to have auto-db-reconnect functionality—so I came up with a small patch to remedy the issue. Below is the tidied up version of what I currently have running.

1	@@ -72,6 +72,8 @@
2	DB_Name = cfg.get("db", "db_name")
3	DB_User = cfg.get("db", "db_user")
4	DB_DEBUG = cfg.get("db", "db_debug")
5	+ DB_Reconn_Tries = int(cfg.get("db", "db_reconnect_max_tries"))
6	+ DB_Reconn_Delay = int(cfg.get("db", "db_reconnect_delay"))
7	# Logism:
8	Base_URL = cfg.get("logotron", "base_url")
9	App_Root = cfg.get("logotron", "app_root")
10	@@ -85,21 +87,54 @@
11
12	##############################################################################
13
14	# Connect to the given DB
15	-try:
16	- db = psycopg2.connect("dbname=%s user=%s" % (DB_Name, DB_User))
17	-except Exception:
18	- print "Could not connect to DB!"
19	- logging.error("Could not connect to DB!")
20	- exit(1)
21	-else:
22	- logging.info("Connected to DB!")
23	+db = None
24	+
25	+def conn_db():
26	+ global db
27	+
28	+ tries = DB_Reconn_Tries
29	+
30	+ while True:
31	+ # Connect to the given DB
32	+ try:
33	+ db = psycopg2.connect("dbname=%s user=%s" % (DB_Name, DB_User))
34	+ except Exception:
35	+ print "Could not connect to DB!"
36	+ logging.error("Could not connect to DB!")
37	+ if tries > 0 or DB_Reconn_Tries == -1:
38	+ tries = tries - 1
39	+ time.sleep(DB_Reconn_Delay)
40	+ continue
41	+ else:
42	+ exit(1)
43	+ else:
44	+ logging.info("Connected to DB!")
45	+ break
46	+
47	+conn_db()
48
49	##############################################################################
50
51	def close_db():
52	db.close()
53
54	+def ensure_db_is_alive():
55	+ # Ping the db to ensure it's alive and connected
56	+ logging.debug("Checking DB connection status...")
57	+ try:
58	+ cur = db.cursor()
59	+ cur.execute('SELECT 1')
60	+ except (psycopg2.OperationalError, psycopg2.InterfaceError) as e:
61	+ pass
62	+
63	+ # If connection is alive db.closed will equal 0
64	+ if db.closed == 0:
65	+ return True
66	+
67	+ # Otherwise, attempt to reconnect
68	+ logging.debug("No DB Connection!")
69	+ conn_db()
70	+
71	def exec_db(query, args=()):
72	cur = db.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
73	if (DB_DEBUG): logging.debug("query: '{0}'".format(query))
74	@@ -491,6 +526,8 @@
75	def save_line(time, chan, speaker, action, payload):
76	## Put in DB:
77	try:
78	+ ensure_db_is_alive()
79	+
80	# Get index of THIS new line to be saved
81	last_idx = query_db(
82	'''select idx from loglines where chan=%s

With this running locally I was able to bring down the DB, send a few IRC messages, restart the DB, and have the messages that were sent while the DB was offline recorded once the DB came back up. I'm going to give it a few weeks in production to see how it fares. If it does indeed solve the issue (and not cause any others) I'll package it as a vpatch and publish it here along with the rest of the logotron tree.

Sometimes longer, but also sometimes much shorter. I suspect it has to do with the usage/load of the sites. [↩]

This happened often enough that I even wrote a small script to pull from asciilifeform's channel logs when mine fell behind:

#!/bin/bash

batch_size=500

start_line=$1
end_line=$2
chan_url=$3
results_file=$4

if [ $1 -ge $2 ]; then
        echo "end line must be greater than the start line"
        exit 1
fi

num_lines=$(( $end_line - $start_line ))
num_batches=$(( $num_lines / ($batch_size + 1) + 1 ))
last_batch_size=$(( $num_lines % ($batch_size + 1) ))

echo "starting at ${start_line}, ending at ${end_line}, for a total of ${num_lines} lines"
echo "will download from ${chan_url}?istart=${start_line}&iend=${end_line} in ${num_batches} batch(es). last batch will be ${last_batch_size}"

batch=1
while [ $start_line -lt $end_line ]; do
        if [ $batch -eq $num_batches ]; then
                cur_end=$(( $start_line + $last_batch_size ))
        else
                cur_end=$(( $start_line + $batch_size ))
        fi

        echo "pulling ${start_line} - ${cur_end}"
        curl -s "${chan_url}?istart=${start_line}&iend=${cur_end}" >> $results_file

        start_line=$(( $cur_end + 1 ))
        batch=$(( batch + 1 ))
done

[↩]

This entry was posted on Thursday, January 6th, 2022 at 6:48 p.m. and is filed under bitdash.io, logotron, postgres. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Comment

New Logotoron Patches: Bug Fixes and Enhancements « billymg says:

January 22, 2022 at 1:17 p.m.

[...] patch just formalizes the fix I added to my bot to deal with the recurring postgres connection timeouts3. Despite only 50% of [...]

billymg

Pages

Recent Posts

Recent Comments

Archives

Categories

Meta

Adding Postgres Auto-Reconnect to the Logotron's IRC Bot

One Comment

Leave a Reply