Adding Postgres Auto-Reconnect to the Logotron's IRC Bot

January 6th, 2022

I've been using asciilifeform's logotron to power my IRC logger for a while now. The setup consists of an IRC bot and a Flask-based web app, both connected to a Postgres database. I run it on a simple Rockchip machine alongside my Bitcoin network crawler, which also uses Postgres. For some reason every couple of weeks1 the bot would lose its DB connection and have to be manually restarted2 in order to continue eating log lines. Same with the crawler. Potentially there's a performance issue somewhere, causing the connection to timeout and close in some cases. I know it's not the logotron itself, since asciilifeform ran the same code on the same box for almost a year and never had the DB connection issue, but perhaps both apps sharing the same small RK is just too much for it to handle at times.

In any case, it was a problem for me, as well as others—and I figured it couldn't hurt for the bot to have auto-db-reconnect functionality—so I came up with a small patch to remedy the issue. Below is the tidied up version of what I currently have running.

1 @@ -72,6 +72,8 @@
2 DB_Name = cfg.get("db", "db_name")
3 DB_User = cfg.get("db", "db_user")
4 DB_DEBUG = cfg.get("db", "db_debug")
5 + DB_Reconn_Tries = int(cfg.get("db", "db_reconnect_max_tries"))
6 + DB_Reconn_Delay = int(cfg.get("db", "db_reconnect_delay"))
7 # Logism:
8 Base_URL = cfg.get("logotron", "base_url")
9 App_Root = cfg.get("logotron", "app_root")
10 @@ -85,21 +87,54 @@
11
12 ##############################################################################
13
14 # Connect to the given DB
15 -try:
16 - db = psycopg2.connect("dbname=%s user=%s" % (DB_Name, DB_User))
17 -except Exception:
18 - print "Could not connect to DB!"
19 - logging.error("Could not connect to DB!")
20 - exit(1)
21 -else:
22 - logging.info("Connected to DB!")
23 +db = None
24 +
25 +def conn_db():
26 + global db
27 +
28 + tries = DB_Reconn_Tries
29 +
30 + while True:
31 + # Connect to the given DB
32 + try:
33 + db = psycopg2.connect("dbname=%s user=%s" % (DB_Name, DB_User))
34 + except Exception:
35 + print "Could not connect to DB!"
36 + logging.error("Could not connect to DB!")
37 + if tries > 0 or DB_Reconn_Tries == -1:
38 + tries = tries - 1
39 + time.sleep(DB_Reconn_Delay)
40 + continue
41 + else:
42 + exit(1)
43 + else:
44 + logging.info("Connected to DB!")
45 + break
46 +
47 +conn_db()
48
49 ##############################################################################
50
51 def close_db():
52 db.close()
53
54 +def ensure_db_is_alive():
55 + # Ping the db to ensure it's alive and connected
56 + logging.debug("Checking DB connection status...")
57 + try:
58 + cur = db.cursor()
59 + cur.execute('SELECT 1')
60 + except (psycopg2.OperationalError, psycopg2.InterfaceError) as e:
61 + pass
62 +
63 + # If connection is alive db.closed will equal 0
64 + if db.closed == 0:
65 + return True
66 +
67 + # Otherwise, attempt to reconnect
68 + logging.debug("No DB Connection!")
69 + conn_db()
70 +
71 def exec_db(query, args=()):
72 cur = db.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
73 if (DB_DEBUG): logging.debug("query: '{0}'".format(query))
74 @@ -491,6 +526,8 @@
75 def save_line(time, chan, speaker, action, payload):
76 ## Put in DB:
77 try:
78 + ensure_db_is_alive()
79 +
80 # Get index of THIS new line to be saved
81 last_idx = query_db(
82 '''select idx from loglines where chan=%s

With this running locally I was able to bring down the DB, send a few IRC messages, restart the DB, and have the messages that were sent while the DB was offline recorded once the DB came back up. I'm going to give it a few weeks in production to see how it fares. If it does indeed solve the issue (and not cause any others) I'll package it as a vpatch and publish it here along with the rest of the logotron tree.

  1. Sometimes longer, but also sometimes much shorter. I suspect it has to do with the usage/load of the sites. []
  2. This happened often enough that I even wrote a small script to pull from asciilifeform's channel logs when mine fell behind:

    #!/bin/bash
    
    batch_size=500
    
    start_line=$1
    end_line=$2
    chan_url=$3
    results_file=$4
    
    if [ $1 -ge $2 ]; then
            echo "end line must be greater than the start line"
            exit 1
    fi
    
    num_lines=$(( $end_line - $start_line ))
    num_batches=$(( $num_lines / ($batch_size + 1) + 1 ))
    last_batch_size=$(( $num_lines % ($batch_size + 1) ))
    
    echo "starting at ${start_line}, ending at ${end_line}, for a total of ${num_lines} lines"
    echo "will download from ${chan_url}?istart=${start_line}&iend=${end_line} in ${num_batches} batch(es). last batch will be ${last_batch_size}"
    
    batch=1
    while [ $start_line -lt $end_line ]; do
            if [ $batch -eq $num_batches ]; then
                    cur_end=$(( $start_line + $last_batch_size ))
            else
                    cur_end=$(( $start_line + $batch_size ))
            fi
    
            echo "pulling ${start_line} - ${cur_end}"
            curl -s "${chan_url}?istart=${start_line}&iend=${cur_end}" >> $results_file
    
            start_line=$(( $cur_end + 1 ))
            batch=$(( batch + 1 ))
    done
    

    []

Patch Fixes for the Logotron and Bitdash Crawler

September 16th, 2021

Two quick patches to fix two small bugs, one for the logotron and one for the bitdash crawler. The logotron patch fixes a css bug in the classic theme where multiple selected loglines would have their highlight rendered incorrectly. The crawler patch fixes the name of the unique index on the host field and drops the explicit creation of this index in the schema as it is created automatically for the unique-constrained field.

Patches and Signatures

Logotron

bitdash_crawler_fix_idx_name.vpatch
bitdash_crawler_fix_idx_name.vpatch.billymg.sig

Crawler

fix_multiln_hlite.kv.vpatch
fix_multiln_hlite.kv.vpatch.billymg.sig

The complete V-Trees for the logotron and crawler have also been updated to include these patches.

Bitdash Crawler: A Watchglass-Based Bitcoin Network Crawler

August 10th, 2021

Introducing the Bitdash Crawler, a simple Bitcoin network crawler that leverages Watchglass to interface with the Bitcoin network protocol. I decided to write this tool because other Bitcoin network monitoring tools no longer publish results for TRB, which used to be the only reason I bothered checking those sites in the first place. What I am publishing here is the source for the crawler, which continuously scans the Bitcoin network by recursively sending version and getaddr messages. The results are then stored in a Postgres database. You can browse the results at bitdash.io/nodes or you can download and run this program yourself to create your own copy of the network map.

Crawler Design

A main thread starts up three worker threads and a "heartbeat" thread which logs program state at a set interval (useful for monitoring/debugging).

fill_probe_queue
  • Responsible for refilling the probe_queue whenever it is completely emptied.
  • Fills the queue from the list of known nodes in the DB. If the DB is empty (e.g. on the first run or if manually cleared) it will read from the list of seed nodes in nodes.txt.
probe_nodes
  • Takes jobs (nodes to be probed) from the probe_queue and spins up a probe_node thread for each one (up to the limit set in the config via max_sockets).
  • The probe_node child thread attempts to open a socket connection with the host and send a version message. If successful it then sends a getaddr message to ask for a list of connected peers.
  • When the requests either succeed, fail, or timeout the results are then added to the result_queue. Spam nodes are not added to the result_queue.
insert_results
  • Takes jobs (results to be inserted into the db) from the result_queue and calls the method for inserting them into the database
  • Also processes any peers included in the result and adds those that do not exist in the DB to the probe_queue

That's it. No kubernetes cloud vps botnet swarms or whatever the cool kids are doing these days—and it completely ignores IPv6 and "Tor" address spaces. It's just a single Python 2.7 script that'll run fine on a low power ARM processor with 2GB of RAM. On my modest Rockchip with max_sockets set to 800 it completes a full scan (including ~half a million spam peers—I have not implemented blacklisting yet) in around 90 minutes (between 80 and 130 depending on number of spam peers unearthed in a pass). And this is the same server that is also hosting the www as well as the Bitdash logs mirror.

A Note About the 'max_sockets' Throttle

I currently have mine set to 800, this is with ~19k already discovered nodes (any node which has at least once responded with a version message) in the database. It should probably be set to a lower number on the first run (maybe start with 50) and monitor to make sure you aren't DDoS'ing the nodes in the nodes.txt seed list. Recall that a "pass" won't finish until all of the initial nodes and recursively added peers have been scanned once, so if the seed nodes provide good peer lists you may end up with a good chunk of the network after the first pass. If you're getting lots of peers and it feels like it'll take forever to complete the first pass, you can kill the crawler, up the max_sockets, and restart the script.

Watchglass as a Library

As mentioned, I did not write the pieces that handle the actual Bitcoin protocol communication, for that I used asciilifeform's Watchglass. However, Watchglass in its current form includes an IRC bot and is set up to be run as its own script. For my purposes I wanted only the Bitcoin protocol portion without the IRC bot. Asciilifeform pointed me to wedger.py which is pretty close but it also contains a bombard_node method and is set up to be configured via constants in the file.

What I ended up doing is including a "watchglass.py" in the crawler genesis that is made up strictly of the functions and helper methods required for interfacing with the Bitcoin network protocol and is designed to be imported as a library in other Python scripts (i.e. it does not instantiate its own logger and it does not read from a config or set of configurable constants). Where necessary I updated function signatures to allow for passing in parameters which previously would have been set in a config. I also added one method to watchglass.py that I feel is generic enough to be included in the library, unpack_ver_msg, which takes a raw payload and returns the discrete values as a dictionary.

Including this crawler there are now three applications that leverage the Watchglass protocol methods. My thinking is that perhaps it can be moved to its own V-Tree and the Watchglass IRC bot, the wedger, and this crawler can be updated to rely on the Watchglass library.

Roadmap: Crawler

  • Implement an exponential backoff for querying nodes that 1) fail to respond and 2) have never responded. This should significantly speed up batch time and reduce the number of max_sockets needed to hit a given target processing interval.
  • Implement a geoIP-lookup feature even though this will require the use of a 3rd party service.
  • Implement storage of long term historical snapshots somewhere in the database. Currently it only stores the last 25 results for each known host (configurable) but if one wanted to display, for example, a graph of TRB nodes over time they would not be able to get the data they need from the current schema.

Roadmap: Website

  • Responsive CSS so that it works well on different screen sizes.
  • An improved homepage with a collection of key overview metrics.
  • Drill down filters for status, version, user agent, and species.

Patch and Signature

bitdash_crawler_genesis.vpatch
bitdash_crawler_genesis.vpatch.billymg.sig