Indexing
The Seq RAM cache provides fast access to recent events. When the volume of data exceeds available RAM, searches and queries switch to searching the disk archive.
Archive queries are significantly slower than queries served from RAM. To improve the performance of queries, Seq indexes Signals to reduce the volume of data that archive queries must search. This happens automatically.
To take advantage of indexing, one or more signals should be selected before issuing a search or query.
Index acceleration is proportional to the amount of data touched by the signals selected in a query ("signal density"). The benefit provided by an index will also be influenced by the size of the events being searched, and the overlap between signals.
Indexing considerations
- Indexing won't be applied to any data written in the last hour: this prevents churn caused by events arriving slightly late, or when the originating application is on a machine with clock drift
- Indexing also doesn't trigger until at least 160 MB of contiguous data are available to be indexed
- When signals change, recent data is reindexed first: this means that if, say, an automated process (or users) modify signals constantly, then some of the old data may be left with stale indexes (this won't impact correctness, only performance)
- Indexing competes with retention policy processing time; so, if retention policies are running efficiently, the time remaining will be used for indexing; if retention policies are running at capacity (this would be indicated by single-digit "headway" numbers in the logs) then indexing won't trigger
- Historical writes, for example importing a days' worth of logs with timestamps one week ago, won't be indexed unless retention processing compacts them into the indexed log stream (they will be searchable, but indexes will not boost performance over the imported range)
Updated about 4 years ago