So- over a year back, when metrics was a new concept to Splunk, I ran a licensing and storage comparison HERE.
Since Splunk has done many changes and improvements to how metrics are stored, and licensed, I felt it was time to run another comparison.
How testing will be performed
Most, if not all, of the test cases will be copied from the old tests.
For testing purposes, I will have three inputs, each pointing at their own separate index. Each of the inputs are configured exactly the same, with three variations.
- Regular Perfmon data. (Default for windows TA)
- Perfmon MK format.
- Perfmon as metrics
For testing, I will be looking at the LogicalDisk perfmon, collecting data at a 15 second interval, with a very generous handful of metrics selected, to facilitate collecting a lot of data, rather quickly.
[expand title=”Click To View Configuration Files”]Inputs.conf
[perfmon://LogicalDisk_Reg] counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec object = LogicalDisk instances = * disabled = 0 interval = 15 useEnglishOnly = true index=Disk_PerfMon_Regular showZeroValue=1 [perfmon://LogicalDisk_MK] counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec object = LogicalDisk instances = * disabled = 0 interval = 15 useEnglishOnly = true index=Disk_PerfMon_MK mode=multikv showZeroValue=1 [perfmon://LogicalDisk_Metric] counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec object = LogicalDisk instances = * disabled = 0 interval = 15 useEnglishOnly = true index=Disk_PerfMon_Metrics showZeroValue=1
Transforms.conf
[metrics-hostoverride] DEST_KEY = MetaData:Host REGEX = host=(\S+) FORMAT = host::$1 [value] REGEX = .*Value=(\S+).* FORMAT = _value::$1 WRITE_META = true [perfmon_metric_name] REGEX = .*object=(\S+).*counter=(\S+).* FORMAT = metric_name::$1.$2 metric_type::$1 WRITE_META = true [instance] REGEX = .*instance=(\S+).* FORMAT = instance::$1 WRITE_META = true
Props.conf
[source::Perfmon:*Metric] TRANSFORMS-_value = value TRANSFORMS-metric_name = perfmon_metric_name TRANSFORMS-instance = instance SEDCMD-remove-whitespace = s/ /_/g s/\s/ /g
Indexes.conf
[disk_perfmon_regular] coldPath = $SPLUNK_DB\disk_perfmon_regular\colddb enableDataIntegrityControl = 0 enableTsidxReduction = 0 homePath = $SPLUNK_DB\disk_perfmon_regular\db maxTotalDataSizeMB = 512000 thawedPath = $SPLUNK_DB\disk_perfmon_regular\thaweddb [disk_perfmon_mk] coldPath = $SPLUNK_DB\disk_perfmon_mk\colddb enableDataIntegrityControl = 0 enableTsidxReduction = 0 homePath = $SPLUNK_DB\disk_perfmon_mk\db maxTotalDataSizeMB = 512000 thawedPath = $SPLUNK_DB\disk_perfmon_mk\thaweddb [disk_perfmon_metrics] coldPath = $SPLUNK_DB\disk_perfmon_metrics\colddb datatype = metric enableDataIntegrityControl = 0 enableTsidxReduction = 0 homePath = $SPLUNK_DB\disk_perfmon_metrics\db maxTotalDataSizeMB = 512000 thawedPath = $SPLUNK_DB\disk_perfmon_metrics\thaweddb[/expand]
Testing will be performed on a new install of Splunk enterprise 8.0.1, on my workstation. 32GB ram, xeon processor. (Don’t worry- I am already trying to get ahold of a Ryzen….)
NO additional or 3rd party apps are installed.
If you would like to reproduce my results, you can do a fresh install of Splunk enterprise, and add the four configuration files listed above.
I added the configuration files, restarted Splunk, and took a lunch break.
When I returned, I disabled the inputs, and restarted Splunk, for a total of 25 minutes of testing.
Here are the results. The methods to obtain the data are below.
[expand title=”Click to View Data Collection Methods”]Event Count Query
Just a quick count of events to ensure we are fairly grading the results.
| tstats count WHERE index=disk* groupby index | union [| mstats count where index=disk* metric_name=* groupby index ]
Storage Usage
Storage utilization was obtained in Windows explorer by manually going to C:\Program Files\Splunk\var\lib\splunk, right clicking the folders for each of the three indexes, and recording “Size on disk”
License Utilization
index=_internal source="C:\\Program Files\\Splunk\\var\\log\\splunk\\license_usage.log" | stats sum(b) as Size by idx | eval Size= Size/1024[/expand]
Test Results – 25 Minutes
Index | Event Count | Disk Size | License Usage |
disk_perfmon_mk | 100 | 248 KB | 140 KB |
disk_perfmon_regular | 9,200 | 492 KB | 1,160 KB |
disk_perfmon_metrics | 9,200 | 468 KB | 1,239 KB |
Statistics
% Licensing Difference MK Vs Metrics | 785% |
% Disk Difference MK Vs Metrics | 88.7% |
Further Testing
At this point, I re-enabled the inputs, restarted Splunk, and started the stopwatch and let it run for 45 more minutes.
I am curious to see the trend with more data. While, I am 100% certain Perfmon MK will be the hands-down winner in all of these tests, I am curious to know the longer term results….
Test Results – 1 Hour
Index | Event Count | Disk Size | License Usage |
disk_perfmon_mk | 279 | 672 KB | 403 KB |
disk_perfmon_regular | 25,668 | 724 KB | 3,340 KB |
disk_perfmon_metrics | 25,668 | 1,120 KB | 3,569 KB |
% Licensing Difference MK Vs Metrics | 785% |
% Disk Difference MK Vs Metrics | 66% |
% Licensing Difference Perfmon Vs Metrics | 7% |
% Disk Difference Perfmon Vs Metrics | 54% |
Conclusions
I was under the impression the licensing of Metrics had been improved in Splunk 8… however- compared to the PerfmonMK format- there is additional room for improvement left.
While- I will still continue to utilize metrics for use-cases, mostly due to the ease of use… I would be cautious around converting your existing PerfmonMK data to Metrics.
If I apply the 785% increase in licensing to what I am collecting in my production environment, I would go from 3GB Daily, to 25GB Daily for my PerfmonMK traffic. While- this would only account for a ~1% increase in my daily licensing, it is still something to be aware of.
index=index_utilization_summary st=PerfmonMK* | stats sum(bytes) as TotalMKLicense | eval Total_MK_GB = TotalMKLicense / 1024 / 1024 / 1024 | eval Total_Metrics_GB = (TotalMKLicense*7.85) / 1024 / 1024 / 1024 * If I performed the above math incorrectly, please let me know! *
In my opinion, the additional speed, performance, and usability of metrics would likely outweigh the 1% impact to MY licensing. However, for customers licensed for 100-500GB, this impact would be far more considerable.
If you are currently using the regular Perfmon format, instead of PerfmonMK, I would recommend to considering changing your collections to instead use metrics, as it is only a 7% difference in licensing. I also anticipate the metric’s disk usage difference will also reduce as the indexes grow.
Absolutely EXACTLY what I was looking for – thanks so much for this! Great tests all around
It’s a hair old though, I cannot promise the data is still accurate! I am willing to bet they have improved the performance of metrics a good amount since this article was written.
Thank you for the detailed and informative answer.
I’m not sure what you mean with: “This will only work with a static number of fields/instances.” Is this only a problem with the perfmon modular input agent or in general?
I was using perfmon data with metrics index for some time (yes, it was more expensive then perfmonmk, but the performance was so much better 😉
But to get it working with the new format was a painful experience 🙁
So, I gave up and switched to Telegraf. It supports Splunk multi-event Metrics and works great.
Here are all the information you need:
https://www.splunk.com/en_us/blog/it/the-daily-telegraf-getting-started-with-telegraf-and-splunk.html
Splunk Metrics serializer
https://github.com/influxdata/telegraf/tree/master/plugins/serializers/splunkmetric
Windows Performance Counters Input Plugin
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/win_perf_counters
| msearch index=windows_metric
is giving me:
{ [-]
instance: C:
metric_name:os.win.logicaldisk.Avg._Disk_Bytes/Read: 33280
metric_name:os.win.logicaldisk.Avg._Disk_Bytes/Write: 11029.853515625
metric_name:os.win.logicaldisk.Avg._Disk_Read_Queue_Length: 0.00010245080920867622
metric_name:os.win.logicaldisk.Avg._Disk_Write_Queue_Length: 0.0019629651214927435
metric_name:os.win.logicaldisk.Avg._Disk_sec/Read: 0.00031807893537916243
metric_name:os.win.logicaldisk.Avg._Disk_sec/Write: 0.0001412120764143765
metric_name:os.win.logicaldisk.Current_Disk_Queue_Length: 0
metric_name:os.win.logicaldisk.Disk_Read_Bytes_persec: 10719.2177734375
metric_name:os.win.logicaldisk.Disk_Reads_persec: 0.32209187746047974
metric_name:os.win.logicaldisk.Disk_Write_Bytes_persec: 153323.875
metric_name:os.win.logicaldisk.Disk_Writes_persec: 13.90080738067627
metric_name:os.win.logicaldisk.Free_Megabytes: 23661
metric_name:os.win.logicaldisk.Percent_Free_Space: 31.267839431762695
metric_name:os.win.logicaldisk.Percent_Idle_Time: 99.80695343017578
metric_name:os.win.logicaldisk.Split_IO_persec: 0.33904409408569336
objectname: LogicalDisk
}
2 Weeks ago I started to collect vmware performance data with Telegraf and can finally use the metric store for this. The Splunk Add-On for VMware app is still not supporting this.
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/vsphere
I can provide examples of the config files if needed 🙂
with regards
I saw your link to this article in splunk slack metrics channel. I thought you were an expert on this topic, but I should have saved myself the trouble of reading 🙁
Starting with Splunk 8, you can send multiple metric values in one payload. Very similar to Perfmon MK format.
You should use the new feature, when you’re comparing metrics index with the Perfmon MK format.
At the end the new format is even cheaper. You could put a lot of metric values and dimensions in one event and “pay” maximal 150 Bytes per event.
https://docs.splunk.com/Documentation/Splunk/8.0.5/Admin/HowSplunklicensingworks
“When ingesting metrics data, each metric event is measured by volume like event data. However, the per-event size measurement is capped at 150 bytes. Metric events that exceed 150 bytes are recorded as only 150 bytes”
I was actually working with Splunk to do a conference presentation on how to save license + storage by using PerfmonMK to metrics this year. We ran into a lot of issues which made the topic non-presentable. I will summerize:
TLDR;
You can send multiple metrics in a single payload, assuming the number of fields AND rows/instances remains consistent. This is because you have to write a regular expression which takes in the data, and transforms it properly. An example is below. The big issue is maintainability, since the expressions have to be created specifically for each type of data. In the example of collecting processor information, you would need to create a expression for each variation in your environment.
We did spend months working with the back-end engineering team @ Splunk to work on identifying workarounds, or solutions for this problem… and we came up empty handed.
[metric-schema:my_processor_st_1596217721285][^\"\n]+)\"?\ncategory=\"?(?[^\"\n]+)\"?\nobject=\"?(?
METRIC-SCHEMA-MEASURES = ALLNUMS
METRIC-SCHEMA-WHITELIST-DIMS = collection,category,object
[my_processor_st_extraction]
WRITE_META = 1
REGEX = collection=\"?(?
If you would like to discuss further, feel free to reach out to me in the Discord linked at the top of the page.
Furthermore- here is the post which did include the new method for collecting metrics using metrics MKV:
https://xtremeownage.com/2020/02/11/splunk-8-0-1-metrics-vs-events-licensing-comparison-updated-with-metrics-mk/
password is: “inaccurate”
Thank you for the detailed and informative answer.
I’m not sure what you mean with: “This will only work with a static number of fields/instances.” Is this only a problem with the perfmon modular input agent or in general?
I was using perfmon data with metrics index for some time (yes, it was more expensive then perfmonmk, but the performance was so much better 😉
But to get it working with the new format was a painful experience 🙁
So, I gave up and switched to Telegraf. It supports Splunk multi-event Metrics and works great.
Here are all the information you need:
https://www.splunk.com/en_us/blog/it/the-daily-telegraf-getting-started-with-telegraf-and-splunk.html
Splunk Metrics serializer
https://github.com/influxdata/telegraf/tree/master/plugins/serializers/splunkmetric
Windows Performance Counters Input Plugin
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/win_perf_counters
| msearch index=windows_metric
is giving me:
{ [-]
instance: C:
metric_name:os.win.logicaldisk.Avg._Disk_Bytes/Read: 33280
metric_name:os.win.logicaldisk.Avg._Disk_Bytes/Write: 11029.853515625
metric_name:os.win.logicaldisk.Avg._Disk_Read_Queue_Length: 0.00010245080920867622
metric_name:os.win.logicaldisk.Avg._Disk_Write_Queue_Length: 0.0019629651214927435
metric_name:os.win.logicaldisk.Avg._Disk_sec/Read: 0.00031807893537916243
metric_name:os.win.logicaldisk.Avg._Disk_sec/Write: 0.0001412120764143765
metric_name:os.win.logicaldisk.Current_Disk_Queue_Length: 0
metric_name:os.win.logicaldisk.Disk_Read_Bytes_persec: 10719.2177734375
metric_name:os.win.logicaldisk.Disk_Reads_persec: 0.32209187746047974
metric_name:os.win.logicaldisk.Disk_Write_Bytes_persec: 153323.875
metric_name:os.win.logicaldisk.Disk_Writes_persec: 13.90080738067627
metric_name:os.win.logicaldisk.Free_Megabytes: 23661
metric_name:os.win.logicaldisk.Percent_Free_Space: 31.267839431762695
metric_name:os.win.logicaldisk.Percent_Idle_Time: 99.80695343017578
metric_name:os.win.logicaldisk.Split_IO_persec: 0.33904409408569336
objectname: LogicalDisk
}
2 Weeks ago I started to collect vmware performance data with Telegraf and can finally use the metric store for this. The Splunk Add-On for VMware app is still not supporting this.
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/vsphere
I can provide examples of the config files if needed 🙂
with regards
Thank you for the detailed and informative answer.
I’m not sure what you mean with: “This will only work with a static number of fields/instances.” Is this only a problem with the perfmon modular input agent or in general?
I was using perfmon data with metrics index for some time (yes, it was more expensive then perfmonmk, but the performance was so much better 😉
But to get it working with the new format was a painful experience 🙁
So, I gave up and switched to Telegraf. It supports Splunk multi-event Metrics and works great.
Here are all the information you need:
https://www.splunk.com/en_us/blog/it/the-daily-telegraf-getting-started-with-telegraf-and-splunk.html
Splunk Metrics serializer
https://github.com/influxdata/telegraf/tree/master/plugins/serializers/splunkmetric
Windows Performance Counters Input Plugin
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/win_perf_counters
2 Weeks ago I started to collect vmware performance data with Telegraf and can finally use the metric store for this. The Splunk Add-On for VMware app is still not supporting this.
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/vsphere
I can provide examples of the config files if needed 🙂
with regards
Thank you for the detailed and informative answer.
I’m not sure what you mean with: “This will only work with a static number of fields/instances.” Is this only a problem with the perfmon modular input agent or in general?
I was using perfmon data with metrics index for some time (yes, it was more expensive then perfmonmk, but the performance was so much better 😉
But to get it working with the new format was a painful experience 🙁
So, I gave up and switched to Telegraf. It supports Splunk multi-event Metrics and works great.
Here are all the information you need:
https://www.splunk.com/en_us/blog/it/the-daily-telegraf-getting-started-with-telegraf-and-splunk.html
Splunk Metrics serializer
https://github.com/influxdata/telegraf/tree/master/plugins/serializers/splunkmetric
Windows Performance Counters Input Plugin
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/win_perf_counters
2 Weeks ago I started to collect vmware performance data with Telegraf and can finally use the metric store for this. The Splunk Add-On for VMware app is still not supporting this.
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/vsphere
I can provide examples of the config files if needed 🙂
I will have a look at it.
To try and break down the issues we had-
To make everything work directly in vanilla Splunk, you have to have a single expression which can cover the entire event, that part is easy. The challenging part comes, when you have to create your format string, because each field has to have a unique name.
The issue with perfmon data, the number of columns is typically consistent, however, the number of rows is not. Say for example, you are pulling in data for perfmon process- there is no way to determine how many instances will be there at any given time, and your format string, has to work for any number of instances, which is not currently possible within vanilla props/transforms.
To work around this issue, I converted perfmonMK data, to csv format using transforms, and then ingested the csv as log2metrics, which worked without issues…. except- for Splunk breaking the csv into multiple rows, reappending the header to each row, which overall, caused license usage to grow over normal perfmonMK event data.
I guess my issue- is using metrics, we cannot get to equal, or less license usage then normal perfmonMK… which- overall, I would say it is still worth switching to metrics in that case. I do have a GUI project which automatically builds out all of the relavent props/transforms/etc….. perhaps I will finish it using the csv method.
I guess as a positive note from all of the work we did this year, it is a recognized/visible issue to the core splunk teams now…. and they are working on the issue.
On the sad note- we had to cancel our presentation as to why everybody SHOULD be using metrics… due to aforementioned issues with metrics MKV. (multiple KEY and VALUE)