The below article has been superseded.
Please click here to view the newly updated article for Splunk 8.0.2
So, three months ago, I had a hunch that metrics would result in significantly higher license utilization.
My theory was posted to Reddit HERE:
Since… Splunk 7.2 added numerous improvements to how metrics are handled and searched,
Today, I am going to go through the process of testing this theory, and provide my results.
For testing purposes, I will have three inputs, each pointing at their own separate index. Each of the inputs are configured exactly the same, with three variations.
- Regular Perfmon data.
- Perfmon MK format.
- Perfmon as metrics
For testing, I will be looking at the LogicalDisk perfmon, collecting data at a 15 second interval, with a very generous handful of metrics selected, to facilitate collecting a lot of data, rather quickly.
Inputs.Conf
[perfmon://LogicalDisk_Reg] counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec object = LogicalDisk instances = * disabled = 0 interval = 15 useEnglishOnly = true index=Disk_PerfMon_Regular showZeroValue=1 [perfmon://LogicalDisk_MK] counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec object = LogicalDisk instances = * disabled = 0 interval = 15 useEnglishOnly = true index=Disk_PerfMon_MK mode=multikv showZeroValue=1 [perfmon://LogicalDisk_Metric] counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec object = LogicalDisk instances = * disabled = 0 interval = 15 useEnglishOnly = true index=Disk_PerfMon_Metrics showZeroValue=1
Transforms.conf, unmodified from Splunk_TA_Infrastructure
########### Metrics ###################### [metrics-hostoverride] DEST_KEY = MetaData:Host REGEX = host=(\S+) FORMAT = host::$1 ########### Transforms for Windows ###################### [value] REGEX = .*Value=(\S+).* FORMAT = _value::$1 WRITE_META = true # Example: object=PhysicalDisk counter="%_Disk_Write_Time" # Transform - metric_name::PhysicalDisk.%_Disk_Write_Time [perfmon_metric_name] REGEX = .*object=(\S+).*counter=(\S+).* FORMAT = metric_name::$1.$2 metric_type::$1 WRITE_META = true [instance] REGEX = .*instance=(\S+).* FORMAT = instance::$1 WRITE_META = true Props.conf INI: [source::Perfmon:*Metric] TRANSFORMS-_value = value TRANSFORMS-metric_name = perfmon_metric_name TRANSFORMS-instance = instance SEDCMD-remove-whitespace = s/ /_/g s/\s/ /g
Testing was performed on a fresh install of Splunk enterprise 7.2, on my windows 10 workstation.
All inputs and indexes were enabled/created while Splunk was disabled, to prevent any of the methods from having skewed results.
While, the overall data collection interval was rather short, the data is conclusive to backup my original theory.
A few notes-
- For every single perfmonMK event collected, there are 92 separate metric/normal perfmon events collected.
- I did not collect enough data to properly assess the claimed performance benefits for using metrics, over events. This test was purely from the standpoint of license and disk utilization.
- Splunk did recently release an APP for browsing and displaying metrics. This app performs extremely well, and, in my opinion, makes it extremely easy for end users of the platform to consume metric data and create dashboards.
- As a note- The app does also perform with normal, accelerated datasets very effectively. If you are like me, and have previously created a datasets for end users to consume performance data with, it works very well for that.
My list of CONs for metrics so far
- Ignoring the license utilization theory which will be evaluated below…
- One HUGE downside so far, is the inability to do automatic lookups to the data, or to enrich the data with more sources. From my understanding, the dimensions will need to be added during index time.
- Good example: I have a lookup called “HostInformation” which, contains a lot of CMDB-related information. What applications the server belongs to, if its production or non-production, the OS version, etc. I have not found a meaningful method of utilizing this data WITH metrics, at least in the context of the metrics explorer.
After 10 minutes of testing, here are the results.
Total size on disk:
disk_perfmon_metrics — 0.14 MB
disk_perfmon_mk — 0.10 MB
disk_perfmon_regular — 0.20 MB
Perfmon MK is the clear winner, with metrics in 2nd place by a healthy margin.
License Utilization:
As expected, Perfmon:MK is a clear winner by a MASSIVE margin.
Here are the results from the Indexes viewer in Splunk:
(Sorry the picture is greyed out, I shut splunk down before collecting the test results)
My summary: I recommend to continue using perfmon:MK format, and wrapping the results into an accelerated dataset for consumption by the end users. The data model acceleration will greatly improve the performance, while the multikey format will significantly reduce the amount of license required.
At which time a few other improvements are made around metrics, I will evaluate if it is worth making the switch, at the expense of increased license utilization.
For me- License usage is important, because I collect hundreds of counters, from thousands of devices, at very regular intervals.