Setting up a second Seq instance ensures all data is retained, and logs can continue to be ingested, if the first Seq instance is lost or needs to be taken offline for maintenance.
We're adding multi-node support to Seq in three phases, with DR and zero-downtime upgrades in the first phase, and HA and scale-out making up phases two and three in future releases.
Seq's DR configuration relies on two Seq instances ("cluster nodes") behind an HTTP load balancer.
Each node in the cluster uses a local storage volume to maintain a complete copy of the log data held in the cluster.
Nodes share a single highly-available database - either Microsoft SQL Server, Azure SQL Database, or PostgreSQL - to store metadata including users, signals, dashboards, and so on.
A single node is designated "leader", and handles ingestion and API requests. The other node acts in the "follower" role, and continually synchronizes its local log store with changes made on the leader. Replication to the follower node is asynchronous.
The load balancer may route traffic to either or both nodes: when requests arrive at the follower node, they are internally proxied to the leader. This avoids the need to reconfigure the load balancer when the leader role is passed between nodes, and when nodes are temporarily taken offline.
A two-node Seq cluster supports the following additional scenarios that cannot be reliably implemented using a single Seq instance:
- Disaster Recovery (DR) — when one of the two nodes fails, it can be replaced without losing already-ingested data, or system configuration. If the failing node is the leader, manual intervention is required for disaster recovery, so ingestion and queries will be unavailable until fail-over is performed. Failure of the follower node will not prevent ingestion.
- Zero-downtime Maintenance and Upgrades — when maintenance needs to be performed, the leader node can be gracefully failed-over to bring it offline, while the follower takes over as the new leader node. This process can be used to perform Seq, system, and hardware maintenance and upgrades without significant interruption of ingestion or queries.
DR is the first of three phases in the development of Seq's multi-node features. Each phase will be largely compatible with those before it, so over time, additional capabilities can be added to a Seq cluster.
The additional capabilities are:
- High Availability (HA) — in this configuration, a cluster of three or more nodes will perform automatic leader election, so that manual fail-over is not required in order to continue ingestion and API availability when one or more of the cluster nodes fail.
- Scale-out — in this configuration, queries issued in an HA cluster will make use of multiple machines to increase performance/reduce execution time.
This won't be the end of Seq's clustering story, with many more opportunities to pursue, however we're focusing our current development efforts primarily on these two capabilities in the near term.
Multi-node features require Seq license that supports the installation of at least two nodes, e.g. an Enterprise or better license. Each node contributes one server install towards the license's install limit. Please contact us if you need any help regarding licensing.
Having two separate machines to deploy Seq onto is the most useful setup for DR and zero-downtime upgrades.
The machines don't need to be identical, as long as they can both handle being the main "leader" node serving traffic and queries. Keeping both machines at a useful level of capability will provide the best experience. The standard System Requirements apply to each node.
Because queries are served by the leader node, and it's selected manually, it's possible to have one larger machine for normal operations, and a smaller follower node for DR and to handle ingestion during upgrades, if this helps reduce costs.
The machines both need the same kind of local storage as would be used in a single-node configuration.
It's best to have a fast network between the machines, though cross-datacenter replication should be possible with sufficient bandwidth.
The nodes will communicate with each other via two open ports:
- An internal API port used for proxied HTTP requests, which can be the same port (80 or 443) that's serving regular API traffic, and
- A cluster port; our examples use 5344; this is used for WebSocket connections between the nodes.
A recent Microsoft SQL Server, Azure SQL Database, or PostgreSQL server, is required for metadata storage.
An initial, empty database should be created for Seq, and a user account provisioned with appropriate permissions to manage both schema and data within the database.
Seq will create schemas, tables, and other database objects on first run.
A load balancer in front of the two instances will ensure that API access and ingestion can continue while one of the instances is offline.
The load balancer should check
/health on the regular HTTP API endpoints of the two nodes (note that
/health is at the root, it's not
/api/health), and take a node out of rotation if
GET /health returns a status code other than 200.
It doesn't matter how the load balancer routes traffic - the follower node acts as a web proxy to the leader node, so requests arriving at one node will be internally routed to the other via its API port.
Using round-robin or similar routing will keep each node hot, and may improve the chances of detecting problems prior to attempted fail-over, so this is the recommended default.
Communication between nodes is via HTTP and WebSockets. If the two machines are isolated, it's possible to run them without TLS, but if you want to use HTTPS and WSS, then you'll need SSL certificates in password-protected PFX format for the DNS names through which the machines will see each other.
It's not necessary for the internal hostnames to be the same as the public Seq hostname; e.g. Seq might be
https://seq.example.com and internally the HTTP API endpoint used for intra-cluster traffic can be
On Windows, the internal HTTP API endpoint will need SSL applied using the normal
bind-ssl command used for Seq's other HTTP endpoints.
For cluster traffic on Windows, and for both cluster and internal API traffic on Linux, TLS is applied by specifying
wss:// for the endpoint addresses, and including
<port>.pfx files under the
Certificates/ folder in the Seq storage root (this may need to be created if it does not already exist).
The password for the PFX file(s) needs to be configured using:
seq secret set -k certificates.defaultPassword -v <password>
Don't configure the certificate password until the
storage.secretKeysetting has been applied to each node (see "Getting started", below).
The leader node authenticates cluster traffic from follower nodes using a single fixed key, which is set up as part of node configuration. The key is an arbitrary string of characters that must be given the same value on each node.
Because clustered Seq configurations need to locally store sensitive information such as connection strings, on Linux we recommend keeping the secret key in the
SEQ_STORAGE_SECRETKEY environment variable, instead of in
On Windows, the value in
Seq.json will be encrypted with machine-scope DPAPI. Access to the
Seq.json file and root storage folder should be restricted to the user account that the Seq service runs under (or administrators, if running as Local System).
The first step to setting up Seq in a two-node, DR configuration, is to configure the initial leader node. The follower node is configured next and will synchronize itself with the leader's state.
It's possible to work through these instructions yourself to set things up, but we're also glad to help, if you'd rather deploy it in collaboration with one of the engineers from the Seq team. Just drop us a line via
[email protected], and we'll make sure the process is smooth and hassle-free!
The first step when adding DR to an existing Seq instance is to migrate its metadata store to the database that will be shared between cluster nodes.
This is done using
seq metastore to-mssql or
seq metastore to-postgresql. Both commands have very similar syntax; the
to-mssql version is shown below:
seq metastore to-mssql --connection-string="<connection string here>"
If you're using Docker, then substitute the
docker run -it -v (volume mapping) datalust/seq.
Inside the Docker container, for example when running init scripts, substitute
The connection string will be encrypted with the Seq instance's secret key, and stored in the Seq filesystem root under
Later, to set up the follower node, you'll need the secret key from the existing node. If you don't have it in external storage already, you should retrieve it now using:
To set up a brand new node as leader, install Seq (if on Windows), but don't install or start the Seq service.
Instead, generate a new secret key using:
seq show-key --generate
Then create a new configuration and initialize it with the secret key:
seq config create seq config set -k storage.secretKey -v "<secret key>"
The first thing to do is enable clustering, which is behind a feature flag:
seq config set -k features.enabled -v clustering
Next, you'll need to set up the connection to the shared database:
seq secret set -k metastore.msSql.connectionString -v "<connection string>"
So that inbound links generated for redirects and in alert notifications is correct, you need to set
api.canonicalUri to the address of the load balancer.
seq config set -k api.canonicalUri -v "<address>"
Next, configure the cluster network and authentication key (PowerShell syntax shown):
seq node setup --pre --cluster-listen ws://localhost:5344 ` --internal-api http://localhost ` --peer-cluster ws://seq002.example.local:5344 ` --peer-internal-api http://seq002.example.local ` -k "<cluster authentication key>" ` --node-name seq001.example.local
If you're using a different port for internal API traffic, you'll need to add its
scheme://hostname:port combination to the
api.listenUris setting, too.
Preview commands like
seq node setupare only available when the
--preargument is specified. To get additional help for a preview command, use
seq help <command name> --pre.
The URIs and authentication key values will need to be specified.
--node-name value is informational, and can be any string useful for identifying the individual node.
Finally, configure the node as leader, and start it:
seq node lead --pre seq service start
You should now be able to browse Seq at its regular address, and via the load balancer.
Under Settings, you should see a new Cluster item. The Cluster screen will show only one node - the one you're browsing - and you'll also see a message in the notification area warning you that there's no up-to-date follower.
Configuring the follower node is almost identical to configuring the leader node; it's important to note:
storage.secretKeyand cluster authentication key values must be the same as those used by the leader,
- The peer URIs configured with
seq node setupmust point to the leader node, and
seq node leadcommand is not executed on the follower.
The full configuration process (PowerShell syntax) looks like:
seq config create seq config set -k storage.secretKey -v "<secret key>" seq config set -k features.enabled -v clustering seq secret set -k metastore.msSql.connectionString -v "<connection string>" seq config set -k api.canonicalUri -v "<address>" seq node setup --pre --cluster-listen ws://localhost:5344 ` --internal-api http://localhost ` --peer-cluster ws://seq001.example.local:5344 ` --peer-internal-api http://seq001.example.local ` -k "<cluster authentication key>" ` --node-name seq002.example.local seq service start
When the follower has started, it will appear in the Cluster screen:
It's time to start ingesting some events!
Make sure client applications are configured to log via the load balancer, not the direct ingestion endpoints on the Seq nodes.
To check the state of a node, browse to it directly: if it's successfully serving or proxying requests, you'll see the Seq user interface.
Because all HTTP requests that reach the follower are proxied to the server, an unhealthy follower that can still reach the leader can appear healthy. To work around this, a special
/health endpoint must be used to check the health of the follower node itself. Making a
GET request to the root
/health endpoint will result in 200 if the follower itself is healthy, and 5xx otherwise (
/health is not proxied between nodes in the cluster).
As events reach the cluster through the load balancer, they'll be ingested into the event store. Seq uses generational storage, and as each batch of events is written, the store generation will be incremented.
The Cluster screen shows the generation of the event store held by each node in the storage generation column.
The storage generation is a two-part number; the right-hand part is the "write" number, corresponding roughly to the number of written batches of events, and the left-hand part (5, above) is the "model" number, corresponding roughly to the number of buffer creation, retention policy application, and indexing operations that have occurred.
For the most part, the storage generation can be treated as an opaque identifier, but it, in combination with the follower node's last sync time, can be used to determine what replication delay currently exists between the nodes.
When new Seq versions need to be installed, or other maintenance needs to be carried out, shut down the current follower node, and perform maintenance on it first.
Once the follower node is back online, wait for it to catch up to the leader (the initial sync may take a while, if sufficient data has been ingested).
Then, demote the leader node from the Cluster screen by pressing the Demote button in the leader node's row. A confirmation will be shown:
Once the demotion process has completed, the follower will be promoted, and the warning icon shown to reflect that the cluster is in a degraded state.
The storage generation that's shown after failing over may be a recomputed value that is different from the values shown earlier.
The old leader node can now be shut down, updated, and restarted to bring it back as a follower.
If the leader node catastrophically fails or is otherwise lost, the follower node can be promoted to become the new leader by running:
seq node lead --pre seq service restart
Once a follower has been promoted this way, the old (failed) leader, if recoverable, should not be restarted until the
seq node follow --precommand has been run on it to set it back into the follower state.
When running multiple Seq nodes, in general, one-off administration tasks like restoring from a backup, or resetting authentication from the command-line, should be performed on the leader.
An exception is manipulation of Seq's local configuration using the
seq config and
seq node commands: the effect of these is local to the node on which they're run.
Updated about a month ago