Splunk Splunk 7.2 Metrics Review - License/Disk Comparison

Eric

Owner
Staff member
Stormy Days(HC)
#1
So, three months ago, I had a hunch that metrics would result in significantly higher license utilization.

My theory was posted to Reddit HERE:

Since... Splunk 7.2 added numerous improvements to how metrics are handled and searched,
Today, I am going to go through the process of testing this theory, and provide my results.

For testing purposes, I will have three inputs, each pointing at their own separate index. Each of the inputs are configured exactly the same, with three variations.

  1. Regular Perfmon data.
  2. Perfmon MK format.
  3. Perfmon as metrics

For testing, I will be looking at the LogicalDisk perfmon, collecting data at a 15 second interval, with a very generous handful of metrics selected, to facilitate collecting a lot of data, rather quickly.

Inputs.Conf
INI:
[perfmon://LogicalDisk_Reg]
counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 0
interval = 15
useEnglishOnly = true
index=Disk_PerfMon_Regular
showZeroValue=1

[perfmon://LogicalDisk_MK]
counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 0
interval = 15
useEnglishOnly = true
index=Disk_PerfMon_MK
mode=multikv
showZeroValue=1

[perfmon://LogicalDisk_Metric]
counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 0
interval = 15
useEnglishOnly = true
index=Disk_PerfMon_Metrics
showZeroValue=1
Transforms.conf, unmodified from Splunk_TA_Infrastructure
INI:
########### Metrics ######################
[metrics-hostoverride]
DEST_KEY = MetaData:Host
REGEX = host=(\S+)
FORMAT = host::$1


########### Transforms for Windows ######################
[value]
REGEX = .*Value=(\S+).*
FORMAT = _value::$1
WRITE_META = true

# Example: object=PhysicalDisk counter="%_Disk_Write_Time"
# Transform - metric_name::PhysicalDisk.%_Disk_Write_Time
[perfmon_metric_name]
REGEX = .*object=(\S+).*counter=(\S+).*
FORMAT = metric_name::$1.$2 metric_type::$1
WRITE_META = true

[instance]
REGEX = .*instance=(\S+).*
FORMAT = instance::$1
WRITE_META = true
Props.conf
INI:
[source::Perfmon:*Metric]
TRANSFORMS-_value = value
TRANSFORMS-metric_name = perfmon_metric_name
TRANSFORMS-instance = instance
SEDCMD-remove-whitespace = s/ /_/g s/\s/ /g

Testing was performed on a fresh install of Splunk enterprise 7.2, on my windows 10 workstation.

All inputs and indexes were enabled/created while Splunk was disabled, to prevent any of the methods from having skewed results.


While, the overall data collection interval was rather short, the data is conclusive to backup my original theory.


A few notes-

  • For every single perfmonMK event collected, there are 92 separate metric/normal perfmon events collected.
  • I did not collect enough data to properly assess the claimed performance benefits for using metrics, over events. This test was purely from the standpoint of license and disk utilization.
  • Splunk did recently release an APP for browsing and displaying metrics. This app performs extremely well, and, in my opinion, makes it extremely easy for end users of the platform to consume metric data and create dashboards.
    • As a note- The app does also perform with normal, accelerated datasets very effectively. If you are like me, and have previously created a datasets for end users to consume performance data with, it works very well for that.

My list of CONs for metrics so far
  • Ignoring the license utilization theory which will be evaluated below...
  • One HUGE downside so far, is the inability to do automatic lookups to the data, or to enrich the data with more sources. From my understanding, the dimensions will need to be added during index time.
    • Good example: I have a lookup called "HostInformation" which, contains a lot of CMDB-related information. What applications the server belongs to, if its production or non-production, the OS version, etc. I have not found a meaningful method of utilizing this data WITH metrics, at least in the context of the metrics explorer.

Results will be on the next post, after I have collected around 30 minutes worth of data.
 

Eric

Owner
Staff member
Stormy Days(HC)
#2
After 10 minutes (not 30) of testing, here are the results.

Total size on disk:
disk_perfmon_metrics -- 0.14 MB
disk_perfmon_mk -- 0.10 MB
disk_perfmon_regular -- 0.20 MB

Perfmon MK is the clear winner, with metrics in 2nd place by a healthy margin.

License Utilization:

LicenseSizeComparison.png


As expected, Perfmon:MK is a clear winner by a MASSIVE margin.

Here are the results from the Indexes viewer in Splunk:

(Sorry the picture is greyed out, I shut splunk down before collecting the test results)
Index Viewer.png




My summary: I recommend to continue using perfmon:MK format, and wrapping the results into an accelerated dataset for consumption by the end users. The data model acceleration will greatly improve the performance, while the multikey format will significantly reduce the amount of license required.

At which time a few other improvements are made around metrics, I will evaluate if it is worth making the switch, at the expense of increased license utilization.


For me- License usage is important, because I collect hundreds of counters, from thousands of devices, at very regular intervals.
 
Top