Skip to end of metadata
Go to start of metadata

You are viewing an old version of this content. View the current version.

Compare with Current View Version History

« Previous Version 11 Next »

Overview

We plan to upgrade the UH Message Broker from a single host to a clustered environment to improve availability. Here is what you can expect:

ItemCurrentlyChanging to…Comments
Server configuration

Single server

3-node cluster
SSL protocolsSupports TLS 1.0, 1.1 and 1.2Only TLS 1.1 and 1.2 are supported
Software versions

RabbitMQ 3.1.5

Erlang 19

RabbitMQ  3.7.12 or higher

Erlang 21.2.6 or higher


RabbitMQ clientWhatever the current version was when you downloaded your client

Although we expect older clients to work, we recommend that you upgrade to the latest client

The oldest Java client you should use is 3.6.6.  Prior versions may force the use of the unsupported TLS 1.0, regardless of Java version and settings.

Need to determine how SSL certification verification should be implemented given these warnings from later clients:

WARN [localhost-startStop-1] com.rabbitmq.client.TrustEverythingTrustManager.<init> SECURITY ALERT: this trust manager trusts every certificate, effectively disabling peer verification. This is convenient for local development but offers no protection against man-in-the-middle attacks. Please see https://www.rabbitmq.com/ssl.html to learn more about peer certificate verification.

AMQP heartbeat

Default is 65535 seconds (18.2 hours)


Default is 60 seconds

You need to set the AMQP heartbeat on your client settings to 60 seconds so that it matches the server's expectations. Otherwise, the server may think that you are no longer connected.

Here's a Java example: https://www.rabbitmq.com/heartbeats.html#using-heartbeats-in-java

This smaller heartbeat value will generate network traffic every 60s, thus preventing network devices from dropping your connection when it is idle.

Publish confirmsRecommendedStrongly recommendedIf you publish messages, you've always been expected to use publish confirms or risk not being notified of failed messages.   This is even more important in a clustered environment.
Handling dropped broker connectionsYou should have code that handles dropped connections to the broker.  The code should repeatedly attempt to reconnect until it's successful, and if applicable, retry the interrupted operation.

No change. Continue doing the same.

If you are not currently doing this, you should.


High availability

This is a single server, so any major failure could cause the broker to be unavailable for several minutes or hours.

The broker should come back within 16 seconds.

If you set your dropped connection retry interval to 16 seconds, that should result in a successful re-connection after any dropped connection.

The load balancer health check is every 5 seconds, with 3 failures triggering the switch to another cluster node.
Mirrored queuesN/A

Queues are mirrored and synchronized across all 3 nodes unless the queue name begins with an underscore.


Mirroring allows your queue to be serviced by any node when you reconnect to the cluster after a failure.

Mirroring uses up more resources, especially when it's a large test queue that hardly gets consumed.  You can skip mirroring for such test queues by using an underscore as the first character in the queue name.

Timeline

DateEvent
Jul 24 2019Test cluster environment is available for developers to test
Oct 20 2019Production migration to cluster environment
  • No labels