DocumentationDiscussions
DocumentationDiscussions
Hey! These docs are for version 4.2, which is no longer officially supported. Click here for the latest version, 2022.1!

Data Store

Seq is driven by its own log-specific data store.

"Hot" log data is retained and queried through an in-memory layer called the cache, and a time-indexed on-disk layer called the archive provides access to historical information that does not fit into RAM.

The Cache

Log data has a very specific access pattern: recent events are almost always more interesting than historical ones. Seq makes use of this to optimize the way machine resources are used.

The cache is a time-ordered list of segments. Each segment is a time slice – the duration of each slice is currently set at three hours.

As events arrive and RAM fills up, segments are dropped from the end of the list, preserving more recent events.

943943

It’s a very simple strategy, but an effective one – response times on queries improve dramatically when all data is in cache, and most queries of interest are filled by recent events.

Query performance

Queries are normally serviced from the cache. When log data exceeds the size of available RAM, older events will be brought into memory from the archive for querying. While querying the archive is a useful way to view historical data, disk-based queries are significantly slower than in-memory queries, so deployments of Seq normally need to be sized so that the RAM capacity matches the volume of data required for day-to-day monitoring and diagnostics - usually at least 7, 14 or 30 days.

Effective use of retention policies to "thin out" older log data can keep longer periods of data in RAM.

The diagnostics page in Seq provides some information on the relative size of the cache vs. the on-disk archive.

The Archive

Data on-disk is stored 7-day blocks called extents. These are stored in the storage root path in a folder called "Extents".

943943

The extents are completely self-contained, meaning the individual folders can be deleted, moved or backed-up independently.

Each extent is an individual time-indexed B-tree of compressed event data. The data files are managed using ESENT, the storage format used in various Microsoft server products including Exchange Server and Active Directory.

Retention processing and compaction

As retention policies clean up historical log data, extent files become "sparse". On Windows Server 2012 R2 and later operating systems, free space within the extent files will be automatically reclaimed, and the extents will shrink accordingly.

On earlier operating systems, a background compaction process runs within Seq approximately once per 24-hours to take the extent offline and compact it. Because this process makes the 7-day range temporarily unavailable for querying, recent extents (within 14 days) will not be automatically compacted. Instead, Seq will wait until the extent is 14 days old, and only then begin regularly compacting the file. This can have an impact on disk space usage (recent extent files will be larger than necessary), hence Windows Server 2012 R2 and later are the preferred operating systems for running Seq.

When, finally, an extent contains no data, the complete extent file will be removed.

📘

Permalinked events

Events that have been "retained", e.g. to create a permalink, won't be removed by retention policies. This means Seq will report that the archive extends all the way to the oldest pinned event. To check the actual contents of a time range, a count(*) query bounded by the date range in question is required.


Did this page help you?