Overview

We plan to upgrade the UH Message Broker from a single host to a clustered environment to improve availability. Here is what you can expect:

Item	Currently	Changing to…	Comments
Server configuration	Single server	3-node cluster
SSL protocols	Supports TLS 1.0, 1.1 and 1.2	Only TLS 1.1 and 1.2 are supported
Software versions	RabbitMQ 3.1.5 Erlang 19	RabbitMQ 3.7.12 or higher Erlang 21.2.6 or higher
RabbitMQ client	Whatever the current version was when you downloaded your client	Although we expect older clients to work, we recommend that you upgrade to the latest client The oldest Java client you should use is 3.6.6. Prior versions may force the use of the unsupported TLS 1.0, regardless of Java version and settings.	Need to determine how SSL certification verification should be implemented given these warnings from later clients: WARN [localhost-startStop-1] com.rabbitmq.client.TrustEverythingTrustManager.<init> SECURITY ALERT: this trust manager trusts every certificate, effectively disabling peer verification. This is convenient for local development but offers no protection against man-in-the-middle attacks. Please see https://www.rabbitmq.com/ssl.html to learn more about peer certificate verification.
AMQP heartbeat	Default is 65535 seconds (18.2 hours)	Default is 60 seconds	You need to set the AMQP heartbeat on your client settings to 60 seconds so that it matches the server's expectations. Otherwise, the server may think that you are no longer connected. Here's a Java example: https://www.rabbitmq.com/heartbeats.html#using-heartbeats-in-java This smaller heartbeat value will generate network traffic every 60s, thus preventing network devices from dropping your connection when it is idle.
Publish confirms	Recommended	Strongly recommended	If you publish messages, you've always been expected to use publish confirms or risk not being notified of failed messages. This is even more important in a clustered environment.
Handling dropped broker connections	You should have code that handles dropped connections to the broker. The code should repeatedly attempt to reconnect until it's successful, and if applicable, retry the interrupted operation.	No change. Continue doing the same. If you are not currently doing this, you should.
High availability	This is a single server, so any major failure could cause the broker to be unavailable for several minutes or hours.	The broker should come back within 16 seconds. If you set your dropped connection retry interval to 16 seconds, that should result in a successful re-connection after any dropped connection.	The load balancer health check is every 5 seconds, with 3 failures triggering the switch to another cluster node.
Mirrored queues	N/A	Queues are mirrored and synchronized across all 3 nodes unless the queue name begins with an underscore.	Mirroring allows your queue to be serviced by any node when you reconnect to the cluster after a failure. Mirroring uses up more resources, especially when it's a large test queue that hardly gets consumed. You can skip mirroring for such test queues by using an underscore as the first character in the queue name.

Timeline

Date	Event
Jul 24 2019	Test cluster environment is available for developers to test
Oct 20 2019	Production migration to cluster environment

UH Message Broker upgrade to clustered environment

Overview

Timeline