This is an updated version of the original 8.0.1 test, located here. The reason for the update- Splunk reached out to me and provided me with a newly introduced method of ingesting metrics, as of version 8.0.

As a result, I implemented the new methods, and re-executed the tests, INCLUDING the original methods, along with the new methods as well.

TL;DR Spoiler

By leveraging the Metrics MK format- I was able to reduce my license requirement by over 90% compared to PerfmonMK format as events. Compared to the default out of the box Perfmon data, over 98% reduction in licensing!

At the same time, It used less overall disk storage then any of the other current methods, while performing MUCH faster!

If you aren’t evaluating converting your perfmon data to metrics, You need to start!!

How testing will be performed

For testing purposes, I will have four inputs, each pointing at their own separate index. Each of the inputs are configured with the same data collection, and interval.

  1. Regular Perfmon as Events (Default for TA_Windows)
  2. Regular Perfmon as Metrics
  3. Perfmon MK as Events
  4. Perfmon MK as Metrics MK (New Method)

For testing, I will be looking at the LogicalDisk perfmon, collecting data at a 15 second interval, with a very generous handful of metrics selected, to facilitate collecting a lot of data, rather quickly.

inputs.conf
# Regular Perfmon Data, stored in Events index.
[perfmon://LogicalDisk_Event]
counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 1
interval = 15
useEnglishOnly = true
index=perfmon_disk_events
showZeroValue=1

# Regular Perfmon Data, stored in Metrics index.
[perfmon://LogicalDisk_Metric]
counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 1
interval = 15
useEnglishOnly = true
index=perfmon_disk_metrics
showZeroValue=1
sourcetype=Perfmon_To_Metric

# Perfmon MK Data, Stored in Events index.
[perfmon://LogicalDisk_MK_Event]
counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 1
interval = 15
useEnglishOnly = true
index=perfmon_mk_disk_events
mode=multikv
showZeroValue=1

# Perfmon MK Data, Stored in Metrics Index.
[perfmon://LogicalDisk_MK_MVMetric]
counters = % Free Space;; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 1
interval = 15
mode=multikv
useEnglishOnly = true
index=perfmon_mk_disk_metrics_mk
showZeroValue=1
sourcetype=PerfmonMK_To_MetricMK_AUTO
props.conf
#Convert Regular Perfmon Event, into a Metric
[Perfmon_To_Metric]
TRANSFORMS-_value = value
TRANSFORMS-metric_name = perfmon_metric_name
TRANSFORMS-instance = instance
SEDCMD-remove-whitespace = s/ /_/g s/\s/ /g

#Convert Perfmon MK Event, into a multi-key Metric
[PerfmonMK_To_MetricMK_AUTO]
INDEXED_EXTRACTIONS = tsv
LINE_BREAKER = ([\r\n]+)
NO_BINARY_CHECK = 1
category = Log To Metrics
pulldown_type  = 1
METRIC-SCHEMA-TRANSFORMS = metric-schema:PerfmonMK_To_MetricMK_AUTO
TRANSFORMS-perfmonmk = perfmonmk:PerfmonMK_To_MetricMK_AUTO
transforms.conf
[value]
REGEX = .*Value=(\S+).*
FORMAT = _value::$1
WRITE_META = true

[perfmon_metric_name]
REGEX = .*object=(\S+).*counter=(\S+).*
FORMAT = metric_name::$1.$2 metric_type::$1
WRITE_META = true

[instance]
REGEX = .*instance=(\S+).*
FORMAT = instance::$1
WRITE_META = true

[metric-schema:PerfmonMK_To_MetricMK_AUTO]
METRIC-SCHEMA-MEASURES = _ALLNUMS_

[perfmonmk:PerfmonMK_To_MetricMK_AUTO]
WRITE_META = 1
REGEX = collection=\"?(?<collection>[^\"\n]+)\"?\ncategory=\"?(?<category>[^\"\n]+)\"?\nobject=\"?(?<object>[^\"\n]+)\"?\n([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t\n([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t([^\t]+)\t\n
FORMAT = collection::"$1" category::"$2" object::"$3" "$4"::"$28" "$5"::"$29" "$6"::"$30" "$7"::"$31" "$8"::"$32" "$9"::"$33" "$10"::"$34" "$11"::"$35" "$12"::"$36" "$13"::"$37" "$14"::"$38" "$15"::"$39" "$16"::"$40" "$17"::"$41" "$18"::"$42" "$19"::"$43" "$20"::"$44" "$21"::"$45" "$22"::"$46" "$23"::"$47" "$24"::"$48" "$25"::"$49" "$26"::"$50" "$27"::"$51"
WRITE_META = true
indexes.conf
# Regular Perfmon Data, Events Index.
[perfmon_disk_events]
coldPath = $SPLUNK_DB\$_index_name\colddb
enableDataIntegrityControl = 0
enableTsidxReduction = 0
homePath = $SPLUNK_DB\$_index_name\db
maxTotalDataSizeMB = 512000
thawedPath = $SPLUNK_DB\$_index_name\thaweddb

# Regular Perfmon Data, Metrics Index.
[perfmon_disk_metrics]
coldPath = $SPLUNK_DB\$_index_name\colddb
datatype = metric
enableDataIntegrityControl = 0
enableTsidxReduction = 0
homePath = $SPLUNK_DB\$_index_name\db
maxTotalDataSizeMB = 512000
thawedPath = $SPLUNK_DB\$_index_name\thaweddb

# Perfmon MK Data, Events Index.
[perfmon_mk_disk_events]
coldPath = $SPLUNK_DB\$_index_name\colddb
enableDataIntegrityControl = 0
enableTsidxReduction = 0
homePath = $SPLUNK_DB\$_index_name\db
maxTotalDataSizeMB = 512000
thawedPath = $SPLUNK_DB\$_index_name\thaweddb

# Perfmon MK Data, Metrics Index.
[perfmon_mk_disk_metrics_mk]
coldPath = $SPLUNK_DB\$_index_name\colddb
enableDataIntegrityControl = 0
datatype = metric
enableTsidxReduction = 0
homePath = $SPLUNK_DB\$_index_name\db
maxTotalDataSizeMB = 512000
thawedPath = $SPLUNK_DB\$_index_name\thaweddb

Testing will be performed on a new install of Splunk enterprise 8.0.1, on my workstation. 32GB ram, xeon processor. (Don’t worry- I am still trying to get ahold of a Ryzen….)

NO additional or 3rd party apps are installed. Testing was performed on a fresh install of Splunk, with only the above configuration files added.

The tests were started at 8:57am, and ended at 9:27am.

Data Collection Methods

Event Count

Count of events was obtained by recording the number displayed at http://localhost:8000/en-US/manager/search/data/indexes

Storage Usage

Storage utilization was obtained in Windows explorer by manually going to C:\Program Files\Splunk\var\lib\splunk, right clicking the folders for each of the indexes, and recording “Size on disk”

License Utilization

index=_internal source="C:\\Program Files\\Splunk\\var\\log\\splunk\\license_usage.log"
| stats sum(b) as Size by idx
| eval Size= Size/1024

Performance Testing

Performance tests will be done with a specific query used for each index. Due to the limited amount of data (30 minutes, at a 15 second interval), there may not be enough data to do a “Production” test. Tests will be ran on the same timespan from 9am to 9:30am. An average of 5 query times will be recorded.

Here are the individual searches:

perfmon_disk_events
index=perfmon_disk_events instance=”C:” counter=”% Disk Read Time”
| timechart span=15s avg(Value)
perfmon_mk_disk_events
index=perfmon_mk_disk_events
| timechart span=15s avg(%_Disk_Read_Time)
perfmon_disk_metrics
| mstats avg(_value) WHERE metric_name=”LogicalDisk.%_Disk_Read_Time” AND index=”perfmon_disk_metrics” span=15s
perfmon_mk_disk_metrics_mk
| mstats avg(_value) WHERE metric_name=”%_Disk_Read_Time” AND index=”perfmon_mk_disk_metrics_mk” span=15s

Test Results – 30 Minutes

IndexEvent CountDisk SizeLicense Usage
perfmon -> events10,856572 KB1,431 KB
perfmon -> metrics10,856 516 KB1,508 KB
perfmon_mk -> events118292 KB173 KB
perfmon_mk -> metrics_mk118256 KB16 KB

PerfmonMK -> MetricsMK Statistics

% License Decrease compared to Perfmon Events98%
% License Decrease compared to Perfmon MK90.7%
% Disk Usage Decrease compared to Perfmon Events55%
% Disk Usage Decrease compared to Perfmon MK12%

Performance Results

Performance Testing - Raw Data
index=perfmon_disk_events instance=”C:” counter=”% Disk Read Time” | timechart span=15s avg(Value)
This search has completed and has returned 121 results by scanning 109 events in 0.132 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.223 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.136 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.139 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.122 seconds

index=perfmon_mk_disk_events | timechart span=15s avg(%_Disk_Read_Time)
This search has completed and has returned 121 results by scanning 109 events in 0.161 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.159 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.149 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.142 seconds
This search has completed and has returned 121 results by scanning 109 events in 0.195 seconds

| mstats avg(_value) WHERE metric_name="LogicalDisk.%_Disk_Read_Time" AND index="perfmon_disk_metrics" span=15s
This search has completed and has returned 109 results by scanning 436 events in 0.079 seconds
This search has completed and has returned 109 results by scanning 436 events in 0.088 seconds
This search has completed and has returned 109 results by scanning 436 events in 0.081 seconds
This search has completed and has returned 109 results by scanning 436 events in 0.15 seconds
This search has completed and has returned 109 results by scanning 436 events in 0.087 seconds

| mstats avg(_value) WHERE metric_name=%_Disk_Read_Time AND index=perfmon_mk_disk_metrics_mk span=15s
This search has completed and has returned 109 results by scanning 109 events in 0.19 seconds
This search has completed and has returned 109 results by scanning 109 events in 0.076 seconds
This search has completed and has returned 109 results by scanning 109 events in 0.09 seconds
This search has completed and has returned 109 results by scanning 109 events in 0.158 seconds
This search has completed and has returned 109 results by scanning 109 events in 0.081 seconds
Index NameAverage Speed (Seconds)
perfmon_disk_events0.1504
perfmon_mk_disk_events0.1612
perfmon_disk_metrics0.097
perfmon_mk_disk_metrics_mk0.119

Disclaimer: 30 minutes of data is not enough data to do a real-world comparison test.

If you wanted an accurate test, I would recommend searching at least one month of data in an production system. These tests were performed on my local machine, and are subject to variances caused by other processes running in the background.

My conclusion:

Metrics are faster then events. I will not give a percentage here, because I do not feel enough data is present to create an accurate test of measuring performance.

Conclusions

In the original post, the method used to convert Perfmon MK events to metrics was a pretty old method introduced in the Splunk infrastructure app a few years back. After making the post, Splunk’s engineering team reached out to me providing a lot of technical insight and documentation into the Metrics MK format.

After converting my tests to utilize the metrics MK format, I am completely blown away at the reduction in Licensing, and disk. Compared to the PerfmonMK format I am using in production currently, I can save over 90% on licensing, and over 10% on storage consumption by switching to a MUCH faster format, which is easier for users to ingest.

If you are interested in converting your perfmon data to metrics, I am in the process of finishing up a python script which will automatically build out the props.conf and transforms.conf to do so, with no manual configuration adjustments required.

If you are interested in contributing to this project, please visit the github page here.

My two cents- If you are not in the process of converting your data to MetricsMK, You should be!!! I cannot express how much better the performance, license usage, and disk usage is compared to the out-of-the-box perfmon format.

Documentation

Metrics Overview: https://docs.splunk.com/Documentation/Splunk/8.0.1/Metrics/Overview

Using Multi-Value Metrics: https://docs.splunk.com/Documentation/Splunk/8.0.1/Metrics/GetMetricsInOther

Log to Metrics Overview: https://docs.splunk.com/Documentation/Splunk/8.0.1/Metrics/L2MOverview

Special Thanks

I have received a lot of assistance from the Splunk team to provide this article. As such, I would like to call out their assistance.

  1. David Maislin @ Splunk has greatly assisted with issues related to Metrics, and has provided a lot of recommendations on putting together this content.
  2. (More coming after I get their permission to post their names.)
Share this content
  • 64
    Shares
%d bloggers like this: