How to handle PostgreSQL triggers in a distributed environment

Question

How to handle PostgreSQL triggers in a distributed environment

We run PostgreSQL triggers to control inserts / updates / deletes on multiple tables so that another application listening for these events can synchronize our relational database with our full-text search database.

This is what the trigger function looks like:

CREATE FUNCTION notification() RETURNS trigger AS $$
BEGIN
  PERFORM pg_notify('search', TG_TABLE_NAME || ',id,' || NEW.id);
  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

And this is how we add a trigger to each table:

CREATE TRIGGER foo_trigger AFTER INSERT OR UPDATE or DELETE ON foo
FOR EACH ROW EXECUTE PROCEDURE notification();

And here is a very simple example of how we will have a node application (worker) listening to these trigger events:

var pg  = require('pg');

var connString = "postgres://user@localhost/foo_local";

pg.connect(connString, function(err, client, done) {

  client.on('notification', function(msg) {
    //get the added / updated / deleted record
    //sync it with the search database
  });

  var query = client.query('LISTEN search');
});

Here are my three questions:

Part 1 Our application is load balanced across multiple instances. What happens when a node / worker app that is also propagating receives an event? Will all instances of the production application that are listening will be launched?

If so, that's a bad thing - we don't want all instances of the production application to handle every event, because they all do the same work, and that would negatively impact the benefits of using multiple listeners for load balancing. How do we mitigate this?

Part 2 What happens if a worker receives a trigger event but it runs for a long time? Will PostgreSQL stop events that were raised until listeners receive them?

Part 3 We have about 5 tables that we want to fire triggers on INSERT / UPDATE / DELETE. We have a lot of requests, so this will fire a lot of events in a short amount of time. We need a worker to listen for these events and process the changed records so that it can submit them to the full text search database. Is there a better way to do this for volume handling?

Another solution that our team is considering is to ditch the SQL triggers and just use a message queuing system to send messages to the datastore (SQS or Redis) and then just workers pick the messages from the queue. We want to avoid this route if we can, as it adds more architecture to our platform; however, we are prepared to do so if this is our only option.

Your thoughts will be much appreciated.

+3

javascript node.js events triggers postgresql

doremi 05 May '15 at 23:07

source to share

1 answer

Yoni Rabinovitch · Answer 1 · 2015-10-26T14:50:59+0000

First of all, in your trigger function, you can make life easier for your listeners by providing more specific information about what has changed (for example, in an UPDATE).

You can do something like this:

CREATE OR REPLACE FUNCTION notification() RETURNS trigger AS $$
DECLARE
  id bigint;
BEGIN
  IF TG_OP = 'INSERT' OR TG_OP = 'UPDATE' THEN
    id = NEW.id;
  ELSE
    id = OLD.id;
  END IF;

  IF TG_OP = 'UPDATE' THEN
    PERFORM pg_notify('table_update', json_build_object('schema', TG_TABLE_SCHEMA, 'table', TG_TABLE_NAME, 'id', id, 'type', TG_OP, 'changes', hstore_to_json(hstore(NEW) - hstore(OLD)))::text);
    RETURN NEW;
  END IF;

  IF TG_OP = 'INSERT' THEN
    PERFORM pg_notify('table_update', json_build_object('schema', TG_TABLE_SCHEMA, 'table', TG_TABLE_NAME, 'id', id, 'type', TG_OP, 'row', row_to_json(NEW))::text);
    RETURN NEW;
  END IF;

  IF TG_OP = 'DELETE' THEN
    PERFORM pg_notify('table_update', json_build_object('schema', TG_TABLE_SCHEMA, 'table', TG_TABLE_NAME, 'id', id, 'type', TG_OP, 'row', row_to_json(OLD))::text);
    RETURN OLD;
  END IF;

END;
$$ LANGUAGE plpgsql;

Now for your questions, or at least: Part 1 . I believe that all instances of production applications that are listening will receive the triggered event. This can be useful for real-time pub / sub notifications for multiple listeners. For your use case, it seems like you need to add some kind of queue package on top of the main PostgreSQL LISTEN / NOTIFY like queue_classic (for Ruby) or perhaps pg-jobs for node.js.

Anyway, since a few months after you asked about this, I'm wondering which path did you take in the end and how did it come about? Can you share your experience and knowledge?

How to handle PostgreSQL triggers in a distributed environment

More articles: