Monitoring PostgreSQL Replication Lag with Monit

Dylan Wood

For some time, we have been utilizing PostgreSQL’s hot standby replication feature in both our staging and production environments. Currently, the hot standby serves three functions:

  1. Standby server for maximum uptime if the master fails.
  2. Disaster recovery if the master fails completely.
  3. Read-only batch operations like taking nightly backups.

All three of these functions are critical to the safety of our data, so we need to be sure that the master and slave are properly communicating at all times. We use MonitMonit and M/Monit for most of our application and server monitoring. Monit is a daemon that runs on each of our servers, and performs checks at regular intervals. M/Monit is a centralized dashboard and alert service to which all of the Monit instances report. To help ensure that we get alerts even if our network is completely offline, our M/Monit host is hosted by AWS.

Because replication is so important, I have taken a belt and suspenders approach to monitoring the replication lag. This means that Monit is checking the replication status on both the master and the slave servers. The approach uses Monit’s check program functionality to run a simple python script. If the script exits with an error (non-zero) status, then Monit will send an alert to our M/Monit server. M/Monit will then send emails and slack notifications to us.

On to the code:

On the master server:

/etc/monit/conf.d/pg-master-replication-check

1
2
3
4
5
check program replication_check
with path "/coins/pg-monitoring/master-replication-check.py"
as uid "{{postgresql_service_user}}"
and gid "webadmins"
if status != 0 for 3 cycles then alert

/coins/pg-monitoring/master-replication-check.py

This script queries the database to ascertain that it is in the right state (WAL streaming), and that the replication position reported by the slave is in line with that expected by the master.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/usr/bin/python
import subprocess
repLagBytesLimit = 128
try:
repModeRes = subprocess.check_output('psql -t -p {{postgresql_port}} -c "SELECT state FROM pg_stat_replication"', shell=True)
isInRepMode = repModeRes.strip() == 'streaming'
repLagRes = subprocess.check_output('psql -t -p {{postgresql_port}} -c "SELECT pg_xlog_location_diff(sent_location, replay_location) FROM pg_stat_replication"', shell=True)
repLagBytes = float(repLagRes)
except subprocess.CalledProcessError as e:
print "Error retrieving stats: {0}".format(e)
exit(1)
if isInRepMode != True:
print ('Master server is not streaming to standby')
exit(1)
if repLagBytes > repLagBytesLimit:
print 'Slave replay is lagging behind by %f bytes' % repLagBytes
exit(1)
print('All clear!')
exit(0)

On the slave server

/etc/monit/conf.d/pg-slave-replication-check

1
2
3
4
5
check program replication_check
with path "/coins/pg-monitoring/slave-replication-check.py"
as uid "{{postgresql_service_user}}"
and gid "webadmins"
if status != 0 for 3 cycles then alert

/coins/pg-monitoring/slave-replication-check.py

This script queries the database to ascertain that it is in the right
state (recovery). It also queries the current xlog position from the master,
and compares it to the last reply location of the slave.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
#!/usr/bin/python
import subprocess
slaveXlogDiffLimitBytes = 128
try:
repModeRes = subprocess.check_output('psql -t -p {{postgresql_port}} -c "SELECT pg_is_in_recovery()"', shell=True)
isInRepMode = repModeRes.strip() == 't'
masterXlogLocationRes = subprocess.check_output('psql -t -p {{postgresql_port}} -h {{postgres_basebackup_host}} -U {{postgres_basebackup_user}} {{postgres_db_name}} -c "select pg_current_xlog_location();"', shell=True)
masterXlogLocationStr = masterXlogLocationRes.strip()
slaveXlogDiffRes = subprocess.check_output('psql -t -p {{postgresql_port}} {{postgres_db_name}} -c "select pg_xlog_location_diff(pg_last_xlog_replay_location(), \'' + masterXlogLocationStr + '\'::pg_lsn);"', shell=True)
slaveXlogDiffBytes = float(slaveXlogDiffRes.strip())
except subprocess.CalledProcessError as e:
print "Error retrieving stats: {0}".format(e)
exit(1)
if isInRepMode != True:
print ('Slave server is not in recovery mode')
exit(1)
if slaveXlogDiffBytes > slaveXlogDiffLimitBytes:
print "Slave server replication is behind master by %f bytes" % slaveXlogDiffBytes
exit(1)
print('All clear!')
exit(0)

You may wonder why I chose python instead of Bash or my usual favorite: Node.js. Python is installed in our base server image, while Node is not, and I want to keep out database servers as stock as possible. I chose python over bash because I find that bash scripts are brittle and difficult to debug.

Launching a Console in VMware Web Client on Windows 10, Chrome 42+

Dylan Wood

Maybe it is just me, but I had a difficult time launching the console of my
guest VMs using the VMware Web Client. Here is how I eventually got it working
on Windows 10 and Chrome 43.x.

Launch VMWare Web Client

Background

I am a huge fan of VMware’s plan to replace their Windows-only vSphere desktop
client with a web client. Using the web client, I am able to perform
most tasks directly from my Mac, thus saving the time of booting a Windows VM.
The only task which cannot be performed in the web client from OS-X is launching
the guest OS console.

In order to open the console in the web client, it is necessary to install the
VMware Client Integration Plugin, which VMWare claims is only available for
64/32-bit Windows and 32-bit Linux. I was unable to get the Client Integration
Plugin to install on Ubuntu 14.04, so it looks like I am still stuck using a
Windows VM to manage our VMware cluster.

Even on Windows, it took some time to get things configured such that I could
access a guest VM’s console via the web client. Here is how I eventually did it.

Environment

  • OS: Windows 10 Evaluation Copy
  • Browser: Google Chrome 43.0.2357.130 (Tried IE Edge and Firefox with no luck)

Install Client Integration Plugin

  1. Navigate to your VMware Web Client login page in your browser. Do not log in.
  2. Click the link at the bottom left of the page entitled ‘Download Client Integration Plugin’.
  3. Run the downloaded installer, accepting all defaults.

Enable NPAPI plugins:

  1. Paste chrome://flags/#enable-npapi into your address bar and press return.
  2. Click Enable below Enable NPAPI.
  3. Click Relaunch Now at the bottom left of the page.

Allow VMware plugins to run

  1. Paste chrome://plugins/ into the address bar and press return.
  2. Check the box next to ‘Always allow to run’ below both VMware plugins.

Verify plugins

  1. Restart the windows machine for good measure.
  2. Open Chrome and navigate back to your VMware Web Client login page.
    You should see two notifications from chrome at the top of the page (see image below).
    These notifications can be disregarded (for now, see discussion further below).plugin_warnings.png

Still does not work?

If you do not see the warnings from Chrome, try this:

  1. Navigate to chrome://settings/content
  2. Scroll to ‘Unsandboxed Plugins’
  3. Select ‘Allow all sites …’
  4. Click ‘Done’
  5. Repeat the Verify plugins steps above.

Open the console for a VM

  1. Log in to the VMware Web Client
  2. Locate a VM whos console you want to open
  3. Click the Settings tab
  4. Near the top of the center pane, you should see a black square with the text Launch Console beneath it. If you see a link to Download plugin instead, something
    is wrong. Try repeating the steps above.

Discussion about NPAPI plugin support

Google has promised to completely remove NPAPI plugin support from Chrome with
version 45. Given the approximate 5-week release schedule that Google has been on,
this means that you will only be able to use the most recent version of Chrome
with the VMware Client Integration Plugin for another couple of months.

With this in mind, I am going to keep my vSphere desktop application installed.
Hopefully, VMware has already begun work on a truly cross platform Web Client
that supports launching a console.

How to [partially] retreat from a database upgrade

Dylan Wood

This post has turned into a bit of a long story.
If you are just looking for how to perform a pg_restore from a newer version of PostgreSQL to an older version of PostgreSQL, look down toward the bottom.

We recently upgraded our worn-out PostgreSQL 8.4 database running on a Cents 5.5 VM to a shiny new PostgreSQL 9.4 database on top of Ubuntu 14.04.
During a three week testing period, we encountered and fixed a coulple of upgrade-induced bugs in our staging environment.
At the end of three weeks of testing, we felt confident that the upgrade would go smoothly in production… and it did (mostly).

The day after the upgrade, users started to submit tickets complaining that our data export tool was running very slowly in some cases, and just hanging in other cases.
Myself and two other engineers spent the next day and a half benchmarking the new database servers over and over, and looking at Explain Analyze plans.
Eventually, we convinced ourselves that the issue was not with the underlying virtual machine, or the OS, but with our configuration of postgres.

To better debug, we restarted our old database server, and ran the offending queries there as well as in the new server in our staging environment.
We were able to gain some insights into the issue by comparing the Explain Analyze output from both servers:
The new database was not using the same indices that the old database was. This resulted in more nested loops and analyzing more rows than necessary.

By increasing the random_page_cost from 4 to 15, we were able to get the query explain plans to look more similar, but performance did not improve.
The new database was still choosing different indices to scan.

At this point, our users had been without a useful query-building-export tool for two business days, so it was time to switch tactics and implement a work-around solution.
I decided that it would be easiest to direct the queries used by our export tool to a copy of our old production database.
We would be able to keep the copy relatively up to date by loading nightly backups from our production infrastructure.

Modifying the application layer to send requests to the old database server was trivial, since there was a dedicated endpoint just for the low-performig export tool.
Getting the old database to refresh from a backup of the new database was a little trickier.

First, I set up a cron job to run a pg_dump on our hot standby database server every night, and store the dump on our network storage.
I have always used the custom format (-Fc) for pg_dumps, as they allow a lot of flexibility when performing the restore.
This was not an option in this case because I received the following error when trying to restore on the PG 8.4 server: pg_restore: [archiver] unsupported version (1.12) in file header.

My initial attempts to circumvent this included running the pg_dump of the new database remotely from the old database server unsuccessfully, and attempting to upgrade only postgres-contrib on the old database server.
Neither of these solutions worked out, so I decided to use the plain pg_dump format (-Fp). This outputs plain SQL statements to rebuild the schema and data.
There are still a few errors during the restore, because the CREATE EXTENSION functionality does not exist in PG 8.4, but I can simply rebuild the necessary extensions manually after the rebuild.

To reduce the time taken by the dump and restore process, I only dump the schema used by the export tool.
In addition, I omit all history tables (a construct we use to track changes made to data in the database) and some of the larger tables not used by the query tool.
This also reduces the size of the restored database considerably, and allows me to restore into a temporary database while the primary database is still running, allowing for near-zero downtime.

A simplified diagram of the current system is shown below:
QBPG84.png

Here is the cron task that dumps the data. This is placed in its own file in /etc/cron.d

1
30 23 * * * postgres pg_dump -vFp -p 6117 -T 'mrsdba.mrs_series_data' -T '*_hist' -T mrsdba.mrs_series -T mrsdba.mrs_analysis_files -T mrsdba.mrs_assessment_events -T mrsdba.mrs_asmt_resp_hist_new -n 'mrsdba' postgres > '/coins/mn t/ni/prodrepdbcoin.sql' > /tmp/rep_dump.log 2>&1

Here is the script that creates a new Postgres 8.4 DB from the dump of the Postgres 9.4 database.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# Author: Dylan Wood
# Date: June.20.2015
# Script is called from /etc/cron.d/coins_query_restore
echo 'Starting COINS DB refresh as user:'
echo `id`
echo `date`
# Define variables
ARCHIVE_DIR="/export/ni/prodrepdbcoin.sql"
ARCHIVE_FILE=`ls -1t $ARCHIVE_DIR | head -1`
DBNAME="postgres"
DBPORT=6117
# Create temp database
# Create empty DB
echo 'Creating empty DB'
createdb -p $DBPORT ${DBNAME}_temp
# Create lang
echo 'create plpgsql lang'
psql -p $DBPORT -d ${DBNAME}_temp -c 'CREATE LANGUAGE plpgsql'
# Restore DB
echo 'restoring db from latest dump'
psql -p $DBPORT -d ${DBNAME}_temp -f $ARCHIVE_FILE
# Edit default search path
echo 'Setting default search path'
psql -p $DBPORT -d ${DBNAME}_temp -c "ALTER DATABASE ${DBNAME}_temp SET search_path=mrsdba, casdba, public;"
# Truncate qb temp tables
echo 'Truncating QB temp tables'
psql -p $DBPORT -d ${DBNAME}_temp -c "TRUNCATE TABLE mrsdba.mrs_qb_asmt_data_sort_temp; TRUNCATE TABLE mrsdba.mrs_qb_asmt_pivot_categories_temp;"
# Add empty schemas
echo 'Adding casdba, dxdba and dtdba'
psql -p $DBPORT -d ${DBNAME}_temp -c "CREATE schema casdba; CREATE schema dxdba; CREATE schema dtdba;"
# Create tablefunc extension
echo 'Create tablefunc extension'
psql -p $DBPORT -d ${DBNAME}_temp -f /usr/share/pgsql/contrib/tablefunc.sql
# VACUUM ANALYZE THE DB
echo 'VACUUM ANALYZE'
psql -p $DBPORT -d ${DBNAME}_temp -c "VACUUM ANALYZE"
# Drop database
# First, disconnect all connections
echo 'Terminating connections to DB'
psql -d $DBNAME -p $DBPORT -c "SELECT pg_terminate_backend(procpid) FROM pg_stat_activity WHERE procpid <> pg_backend_pid() AND datname = '$DBNAME';"
# Drop DB
echo 'Dropping DB'
dropdb -p $DBPORT $DBNAME
# Rename temp DB
echo 'Renaming temp database'
psql -d habaridb -p $DBPORT -c "ALTER DATABASE ${DBNAME}_temp RENAME TO ${DBNAME};"
echo 'Finished with COINS DB refresh'
echo `date`

Standardizing PostgresSQL Instances

Dylan Wood

COINS uses a centralized PostgreSQL database. We have been so busy developing new features that we have not upgraded the database used by the COINS production application since 2010!
New feature requirements and a need for increased disk space on our production server are finally motivating us to upgrade to PostgreSQL 9.4.
While we are at it, we will upgrade the underlying virtual host to Ubuntu Server 14.04, with enough RAM to fit our rapidly growing datbase in memory.
Finally, it makes sense to lay some ground work to clean up our inconsistent use of database names and ports.

Summary of changes:

Current Value New Value
OS CentOS 5.5 Ubuntu Server 14.04
DBMS PostgreSQL 8.4.5 PostgreSQL 9.4.2
RAM 16GB 48GB
CPU Cores 4 4
Recovery Nightly pg_dump WAL archiving for PITR (managed by PG Barman
Replication Daily pg_restore from nightly pg_dump Hot Standby w/ WAL shipping
COINS DB name postgres coins
Port 6117 5432
Hostname tesla.mind.unm.edu proddbcoin.mind.unm.edu
Connection Pooling none pgbouncer

Justification

Operating System

CentOS has served us well for many years, but we have found that Ubuntu’s more up-to-date repositories allow us to stay with the herd as new features are released in packages we depend on (e.g. PostgreSQL, PHP5, Node.js, etc…)

Postgres

Simple: 8.4 is no longer supported. Also, new JSON functionality is really nice (e.g. row_to_json).

RAM

When our current production database server was provisioned, 16GB was enough ram to hold two copies of the COINS database. The database is currently 24GB on disk, and growing fast. 48GB should buy us a little time.

CPU Cores

Postgres does not do anything special for multi-core environments. Instead, it relies on the operating system to destribute its child processes across the cores evenly. Our database has never been CPU bound, so we see no need to increase the number of cores at this point.

Disaster Recovery and Backups

We currently have a cron which performs a pg_dump of the production database every night, and stores the dump on our internal network storage at MRN. In the event of a total loss of our database server, we would be able to recover all changes made before midnight on the day of the failure. Utilizing WAL archiving will allow for Point in Time recovery, and could allow us to salvage data and changes made only minutes or seconds before the outage. In addition, it lays the ground work for a geographically distributed recovery system.

Replication

In order to get COINS running as soon as possible after an outage, we have another production-ready database server running at all times. This database is refreshed every morning from the previous night’s pg_dump of production. Unfortunately, if the production database were to fail, users would loose access to data entered after the last pg_dump. Further, if we were able to salvage the data entered between the last pg_dump and the outage, we would need to somehow merge all of that data with data entered into the replication database after the outage.

The new system uses WAL streaming to replicate all changes made in production (even schema changes!). In the event that the production database were to fail, the replication database would likely be only a few records behind the production database. Aside from loosing much less data in the event of a failover, there are other benefits to having a nearly up-to-date copy of production lying around at all times:

  • Backups can be made of the replication database, thus reducing the load on the production server during backup times
  • The replication database can be configured to handle read-only queries, further reducing the load on the master production database, and decreasing query time.

Database name

Back in 2010, the COINS team migrated from an Oracle Database to PostgreSQL. Our understanding of Postgres was still very limited, and we made some poor design decisions. One of these decisions was to use the default maintenance database as our primary DB. This does not directly cause any problems, but is generally a bad practice.

There are at least one dozen devices spread across the United States that connect to our production database, and rely on the current database name. Changing all of these touch points in a single go would be stressful and very risky (even after practice, and lots of planning, we could still miss or break a few devices). Updating the devices one at a time to utilize the new database name is therefore much more favorable. Doing so will allow us to cooperate with the device owners to come up with a time that will not negatively impact their work.

The problem is: a Postgres database can only have one name and one port. We can overcome this by using a connection pooling tool called PGBouncer. In addition to reducing the overhead involved with creating connections inside of PostgreSQL, PGBouncer also allows aliasing the database name. This means that some devices can connect to our database using the database name postgres while others can connect using the database name coins.

Database port

Another one of the poor design decisions from back in 2010 was to use a non-standard port for PG. I believe that this was a security through obscurity decision. Not only does the obscurity cause issues with our own configuration, it pprovides no additional security against anyone who is able to port-scan on our network. Any security benefit that it might have given us is void as soon as I publish this article.

Changing the port is subject to the same pitfalls mentioned above, so we need a way to support both the legacy port and the new port simultaneously while we carefully update all devices. This can be accomplished using port forwarding within the database server

Hostname

Just to be consistent with our latest server naming convention, the production database should be called proddbcoin. Since we use static IP assignments in our DNS, this should be easy: We can direct all legacy and current hostnames to the same IP, allowing us to slowly migrate devices to the new hostname. In fact, most devices uses a public IP address to connect to our database, since they are not within our private network.

Connection Pooling

I went over this a little in the database name section. The connection pooling approach prevents the overhead involved in creating a new connection each time a device, or our PHP server needs one. I also allows us to alias the database name for a smooth transition away from using the postgres database. Finally, it offers the benefits typically associated with a reverse proxy: the possibility of load ballancing across multiple servers, or switching the servers out behind the connection pool without interrupting service at all.

Summary

The new COINS production database setup may seem a bit more complex than the one it is replacing, and it is. However, all of these complex pieces are being provisioned and configured using Ansible, so the steps can easily be repeated and tweaked.

Here is a diagram showing how both old and new connections strings can reach the same databse: network.png

The database is slated to be replaced on Wednesday, June 17th. I will be practicing the deployment using Ansible in our staging environment until then

Why a blog

Dylan Wood

Why am I doing this?“ I try to ask myself this question at least once per hour while I am working to be sure that I am doing something that will contribute to our team’s goals.
Consequently, it makes sense to ask (and answer) this question now. There are a few justifications that come to mind.

Internal communication and continuity

When working on a new project or feature, one of our team members will usually create a Google Doc, presentation or simple email that explains the design decisions made.
However, because there are a variety of venues for sharing this information (Gmail, Google Drive, Gitter, to name a few), it can be difficult to find it again later.
If this blog idea catches on with the rest of the team, this could become the centralized place to document design decisions.

Sharing with the community

Many of the problems that we solve every day are not neuroimaging-specific. Instead, they are problems that web application engineers from all sorts of industries are faced with daily.
By placing our best practices and lessons learned in a public place, we may be able to help others avoid pitfals that we’ve already succumb to.
Further, as COINS becomes more open source, this blog may be the place that community contributers will look to find COINS-specific technical information.

The team that I work with does some pretty cool stuff, and I am excited that we will all be able to share it with anyone who is interested.

Reflection

Writing things down is a great way to process them. I have uncovered countless bugs by just documenting and justifying my changes.

Ok, now on to the first real post (Standardizing our PostgreSQL instances)…