Splunk – 8.0.1 Metrics vs Events Licensing Comparison

So- over a year back, when metrics was a new concept to Splunk, I ran a licensing and storage comparison HERE.

Since Splunk has done many changes and improvements to how metrics are stored, and licensed, I felt it was time to run another comparison.

How testing will be performed

Most, if not all, of the test cases will be copied from the old tests.

For testing purposes, I will have three inputs, each pointing at their own separate index. Each of the inputs are configured exactly the same, with three variations.

Regular Perfmon data. (Default for windows TA)
Perfmon MK format.
Perfmon as metrics

For testing, I will be looking at the LogicalDisk perfmon, collecting data at a 15 second interval, with a very generous handful of metrics selected, to facilitate collecting a lot of data, rather quickly.

[expand title=”Click To View Configuration Files”]

Inputs.conf

[perfmon://LogicalDisk_Reg]
counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 0
interval = 15
useEnglishOnly = true
index=Disk_PerfMon_Regular
showZeroValue=1
[perfmon://LogicalDisk_MK]
counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 0
interval = 15
useEnglishOnly = true
index=Disk_PerfMon_MK
mode=multikv
showZeroValue=1
[perfmon://LogicalDisk_Metric]
counters = % Free Space; Free Megabytes; Current Disk Queue Length; % Disk Time; Avg. Disk Queue Length; % Disk Read Time; Avg. Disk Read Queue Length; % Disk Write Time; Avg. Disk Write Queue Length; Avg. Disk sec/Transfer; Avg. Disk sec/Read; Avg. Disk sec/Write; Disk Transfers/sec; Disk Reads/sec; Disk Writes/sec; Disk Bytes/sec; Disk Read Bytes/sec; Disk Write Bytes/sec; Avg. Disk Bytes/Transfer; Avg. Disk Bytes/Read; Avg. Disk Bytes/Write; % Idle Time; Split IO/Sec
object = LogicalDisk
instances = *
disabled = 0
interval = 15
useEnglishOnly = true
index=Disk_PerfMon_Metrics
showZeroValue=1

Transforms.conf

[metrics-hostoverride]
DEST_KEY = MetaData:Host
REGEX = host=(\S+)
FORMAT = host::$1

[value]
REGEX = .*Value=(\S+).*
FORMAT = _value::$1
WRITE_META = true

[perfmon_metric_name]
REGEX = .*object=(\S+).*counter=(\S+).*
FORMAT = metric_name::$1.$2 metric_type::$1
WRITE_META = true

[instance]
REGEX = .*instance=(\S+).*
FORMAT = instance::$1
WRITE_META = true

Props.conf

[source::Perfmon:*Metric]
TRANSFORMS-_value = value
TRANSFORMS-metric_name = perfmon_metric_name
TRANSFORMS-instance = instance
SEDCMD-remove-whitespace = s/ /_/g s/\s/ /g

Indexes.conf

[disk_perfmon_regular]
coldPath = $SPLUNK_DB\disk_perfmon_regular\colddb
enableDataIntegrityControl = 0
enableTsidxReduction = 0
homePath = $SPLUNK_DB\disk_perfmon_regular\db
maxTotalDataSizeMB = 512000
thawedPath = $SPLUNK_DB\disk_perfmon_regular\thaweddb

[disk_perfmon_mk]
coldPath = $SPLUNK_DB\disk_perfmon_mk\colddb
enableDataIntegrityControl = 0
enableTsidxReduction = 0
homePath = $SPLUNK_DB\disk_perfmon_mk\db
maxTotalDataSizeMB = 512000
thawedPath = $SPLUNK_DB\disk_perfmon_mk\thaweddb

[disk_perfmon_metrics]
coldPath = $SPLUNK_DB\disk_perfmon_metrics\colddb
datatype = metric
enableDataIntegrityControl = 0
enableTsidxReduction = 0
homePath = $SPLUNK_DB\disk_perfmon_metrics\db
maxTotalDataSizeMB = 512000
thawedPath = $SPLUNK_DB\disk_perfmon_metrics\thaweddb

[/expand]

Testing will be performed on a new install of Splunk enterprise 8.0.1, on my workstation. 32GB ram, xeon processor. (Don’t worry- I am already trying to get ahold of a Ryzen….)

NO additional or 3rd party apps are installed.

If you would like to reproduce my results, you can do a fresh install of Splunk enterprise, and add the four configuration files listed above.

I added the configuration files, restarted Splunk, and took a lunch break.

When I returned, I disabled the inputs, and restarted Splunk, for a total of 25 minutes of testing.

Here are the results. The methods to obtain the data are below.

[expand title=”Click to View Data Collection Methods”]

Event Count Query

Just a quick count of events to ensure we are fairly grading the results.

| tstats count WHERE index=disk* groupby index 
| union 
    [| mstats count where index=disk* metric_name=* groupby index
        ]

Storage Usage

Storage utilization was obtained in Windows explorer by manually going to C:\Program Files\Splunk\var\lib\splunk, right clicking the folders for each of the three indexes, and recording “Size on disk”

License Utilization

index=_internal source="C:\\Program Files\\Splunk\\var\\log\\splunk\\license_usage.log"
| stats sum(b) as Size by idx
| eval Size= Size/1024

[/expand]

Test Results – 25 Minutes

Index	Event Count	Disk Size	License Usage
disk_perfmon_mk	100	248 KB	140 KB
disk_perfmon_regular	9,200	492 KB	1,160 KB
disk_perfmon_metrics	9,200	468 KB	1,239 KB

Statistics

% Licensing Difference MK Vs Metrics	785%
% Disk Difference MK Vs Metrics	88.7%

Further Testing

At this point, I re-enabled the inputs, restarted Splunk, and started the stopwatch and let it run for 45 more minutes.

I am curious to see the trend with more data. While, I am 100% certain Perfmon MK will be the hands-down winner in all of these tests, I am curious to know the longer term results….

Test Results – 1 Hour

Index	Event Count	Disk Size	License Usage
disk_perfmon_mk	279	672 KB	403 KB
disk_perfmon_regular	25,668	724 KB	3,340 KB
disk_perfmon_metrics	25,668	1,120 KB	3,569 KB

% Licensing Difference MK Vs Metrics	785%
% Disk Difference MK Vs Metrics	66%
% Licensing Difference Perfmon Vs Metrics	7%
% Disk Difference Perfmon Vs Metrics	54%

Conclusions

I was under the impression the licensing of Metrics had been improved in Splunk 8… however- compared to the PerfmonMK format- there is additional room for improvement left.

While- I will still continue to utilize metrics for use-cases, mostly due to the ease of use… I would be cautious around converting your existing PerfmonMK data to Metrics.

If I apply the 785% increase in licensing to what I am collecting in my production environment, I would go from 3GB Daily, to 25GB Daily for my PerfmonMK traffic. While- this would only account for a ~1% increase in my daily licensing, it is still something to be aware of.

index=index_utilization_summary st=PerfmonMK*
| stats sum(bytes) as TotalMKLicense 
| eval Total_MK_GB = TotalMKLicense / 1024 / 1024 / 1024 
| eval Total_Metrics_GB = (TotalMKLicense*7.85) / 1024 / 1024 / 1024

* If I performed the above math incorrectly, please let me know! *

In my opinion, the additional speed, performance, and usability of metrics would likely outweigh the 1% impact to MY licensing. However, for customers licensed for 100-500GB, this impact would be far more considerable.

If you are currently using the regular Perfmon format, instead of PerfmonMK, I would recommend to considering changing your collections to instead use metrics, as it is only a 7% difference in licensing. I also anticipate the metric’s disk usage difference will also reduce as the indexes grow.

Join the discussion 9 Comments

Nemanya says:

March 24, 2022 at 4:35 am

Absolutely EXACTLY what I was looking for – thanks so much for this! Great tests all around
- XO says:
  
  March 24, 2022 at 9:59 am
  
  It’s a hair old though, I cannot promise the data is still accurate! I am willing to bet they have improved the performance of metrics a good amount since this article was written.
Chris says:

August 29, 2020 at 10:55 am

Thank you for the detailed and informative answer.

I’m not sure what you mean with: “This will only work with a static number of fields/instances.” Is this only a problem with the perfmon modular input agent or in general?

I was using perfmon data with metrics index for some time (yes, it was more expensive then perfmonmk, but the performance was so much better 😉
But to get it working with the new format was a painful experience 🙁
So, I gave up and switched to Telegraf. It supports Splunk multi-event Metrics and works great.

Here are all the information you need:

https://www.splunk.com/en_us/blog/it/the-daily-telegraf-getting-started-with-telegraf-and-splunk.html

Splunk Metrics serializer
https://github.com/influxdata/telegraf/tree/master/plugins/serializers/splunkmetric

Windows Performance Counters Input Plugin
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/win_perf_counters

| msearch index=windows_metric
is giving me:

{ [-]
instance: C:
metric_name:os.win.logicaldisk.Avg._Disk_Bytes/Read: 33280
metric_name:os.win.logicaldisk.Avg._Disk_Bytes/Write: 11029.853515625
metric_name:os.win.logicaldisk.Avg._Disk_Read_Queue_Length: 0.00010245080920867622
metric_name:os.win.logicaldisk.Avg._Disk_Write_Queue_Length: 0.0019629651214927435
metric_name:os.win.logicaldisk.Avg._Disk_sec/Read: 0.00031807893537916243
metric_name:os.win.logicaldisk.Avg._Disk_sec/Write: 0.0001412120764143765
metric_name:os.win.logicaldisk.Current_Disk_Queue_Length: 0
metric_name:os.win.logicaldisk.Disk_Read_Bytes_persec: 10719.2177734375
metric_name:os.win.logicaldisk.Disk_Reads_persec: 0.32209187746047974
metric_name:os.win.logicaldisk.Disk_Write_Bytes_persec: 153323.875
metric_name:os.win.logicaldisk.Disk_Writes_persec: 13.90080738067627
metric_name:os.win.logicaldisk.Free_Megabytes: 23661
metric_name:os.win.logicaldisk.Percent_Free_Space: 31.267839431762695
metric_name:os.win.logicaldisk.Percent_Idle_Time: 99.80695343017578
metric_name:os.win.logicaldisk.Split_IO_persec: 0.33904409408569336
objectname: LogicalDisk
}

2 Weeks ago I started to collect vmware performance data with Telegraf and can finally use the metric store for this. The Splunk Add-On for VMware app is still not supporting this.
https://github.com/influxdata/telegraf/tree/master/plugins/inputs/vsphere

I can provide examples of the config files if needed 🙂

with regards
Chris says:

August 28, 2020 at 10:53 am

I saw your link to this article in splunk slack metrics channel. I thought you were an expert on this topic, but I should have saved myself the trouble of reading 🙁

Starting with Splunk 8, you can send multiple metric values in one payload. Very similar to Perfmon MK format.
You should use the new feature, when you’re comparing metrics index with the Perfmon MK format.
At the end the new format is even cheaper. You could put a lot of metric values and dimensions in one event and “pay” maximal 150 Bytes per event.

https://docs.splunk.com/Documentation/Splunk/8.0.5/Admin/HowSplunklicensingworks
“When ingesting metrics data, each metric event is measured by volume like event data. However, the per-event size measurement is capped at 150 bytes. Metric events that exceed 150 bytes are recorded as only 150 bytes”
- XO says:
  
  August 28, 2020 at 12:27 pm
  
  I was actually working with Splunk to do a conference presentation on how to save license + storage by using PerfmonMK to metrics this year. We ran into a lot of issues which made the topic non-presentable. I will summerize:
  1. While it is possible to ingest multiple measures/instances as a single metric, leading to significant savings in licensing… This will only work with a static number of fields/instances. This may work well for metrics such as CPU, where your CPU does not change often, however, it would be unsupportable in an enterprise environment where each device may have a different configuration.
  2. We did successfully convert perfmonMK data into a .csv format, and attempted to ingest metrics based on the .csv. However- despite our best efforts, the transform will split the csv into multiple rows, and would have to copy the headers, leading to increased license usage over PerfmonMK format. However, it was able to get fairly close to perfmonMK events in terms of licensing.
  3. There IS/WAS an updated version of this article, which uses the ability to ingest multiple measures in a single event, however, its implementation was flawed due to the points in item #1…. as such, it is currently hidden from public until we can address the inaccuracies.
  TLDR;
  
  You can send multiple metrics in a single payload, assuming the number of fields AND rows/instances remains consistent. This is because you have to write a regular expression which takes in the data, and transforms it properly. An example is below. The big issue is maintainability, since the expressions have to be created specifically for each type of data. In the example of collecting processor information, you would need to create a expression for each variation in your environment.
  
  We did spend months working with the back-end engineering team @ Splunk to work on identifying workarounds, or solutions for this problem… and we came up empty handed.
  
  [metric-schema:my_processor_st_1596217721285] METRIC-SCHEMA-MEASURES = ALLNUMS METRIC-SCHEMA-WHITELIST-DIMS = collection,category,object [my_processor_st_extraction] WRITE_META = 1 REGEX = collection=\"?(?[^\"\n]+)\"?\ncategory=\"?(?[^\"\n]+)\"?\nobject=\"?(?
  
  If you would like to discuss further, feel free to reach out to me in the Discord linked at the top of the page.
  
  Furthermore- here is the post which did include the new method for collecting metrics using metrics MKV:
  https://xtremeownage.com/2020/02/11/splunk-8-0-1-metrics-vs-events-licensing-comparison-updated-with-metrics-mk/
  
  password is: “inaccurate”
  - Chris says:
    
    August 29, 2020 at 10:57 am
    
    Thank you for the detailed and informative answer.
    
    I’m not sure what you mean with: “This will only work with a static number of fields/instances.” Is this only a problem with the perfmon modular input agent or in general?
    
    I was using perfmon data with metrics index for some time (yes, it was more expensive then perfmonmk, but the performance was so much better 😉
    But to get it working with the new format was a painful experience 🙁
    So, I gave up and switched to Telegraf. It supports Splunk multi-event Metrics and works great.
    
    Here are all the information you need:
    
    https://www.splunk.com/en_us/blog/it/the-daily-telegraf-getting-started-with-telegraf-and-splunk.html
    
    Splunk Metrics serializer
    https://github.com/influxdata/telegraf/tree/master/plugins/serializers/splunkmetric
    
    Windows Performance Counters Input Plugin
    https://github.com/influxdata/telegraf/tree/master/plugins/inputs/win_perf_counters
    
    | msearch index=windows_metric
    is giving me:
    
    { [-]
    instance: C:
    metric_name:os.win.logicaldisk.Avg._Disk_Bytes/Read: 33280
    metric_name:os.win.logicaldisk.Avg._Disk_Bytes/Write: 11029.853515625
    metric_name:os.win.logicaldisk.Avg._Disk_Read_Queue_Length: 0.00010245080920867622
    metric_name:os.win.logicaldisk.Avg._Disk_Write_Queue_Length: 0.0019629651214927435
    metric_name:os.win.logicaldisk.Avg._Disk_sec/Read: 0.00031807893537916243
    metric_name:os.win.logicaldisk.Avg._Disk_sec/Write: 0.0001412120764143765
    metric_name:os.win.logicaldisk.Current_Disk_Queue_Length: 0
    metric_name:os.win.logicaldisk.Disk_Read_Bytes_persec: 10719.2177734375
    metric_name:os.win.logicaldisk.Disk_Reads_persec: 0.32209187746047974
    metric_name:os.win.logicaldisk.Disk_Write_Bytes_persec: 153323.875
    metric_name:os.win.logicaldisk.Disk_Writes_persec: 13.90080738067627
    metric_name:os.win.logicaldisk.Free_Megabytes: 23661
    metric_name:os.win.logicaldisk.Percent_Free_Space: 31.267839431762695
    metric_name:os.win.logicaldisk.Percent_Idle_Time: 99.80695343017578
    metric_name:os.win.logicaldisk.Split_IO_persec: 0.33904409408569336
    objectname: LogicalDisk
    }
    
    2 Weeks ago I started to collect vmware performance data with Telegraf and can finally use the metric store for this. The Splunk Add-On for VMware app is still not supporting this.
    https://github.com/influxdata/telegraf/tree/master/plugins/inputs/vsphere
    
    I can provide examples of the config files if needed 🙂
    
    with regards
  - Chris says:
    
    August 29, 2020 at 11:08 am
    
    Thank you for the detailed and informative answer.
    
    I’m not sure what you mean with: “This will only work with a static number of fields/instances.” Is this only a problem with the perfmon modular input agent or in general?
    
    I was using perfmon data with metrics index for some time (yes, it was more expensive then perfmonmk, but the performance was so much better 😉
    But to get it working with the new format was a painful experience 🙁
    So, I gave up and switched to Telegraf. It supports Splunk multi-event Metrics and works great.
    
    Here are all the information you need:
    
    https://www.splunk.com/en_us/blog/it/the-daily-telegraf-getting-started-with-telegraf-and-splunk.html
    
    Splunk Metrics serializer
    https://github.com/influxdata/telegraf/tree/master/plugins/serializers/splunkmetric
    
    Windows Performance Counters Input Plugin
    https://github.com/influxdata/telegraf/tree/master/plugins/inputs/win_perf_counters
    
    2 Weeks ago I started to collect vmware performance data with Telegraf and can finally use the metric store for this. The Splunk Add-On for VMware app is still not supporting this.
    https://github.com/influxdata/telegraf/tree/master/plugins/inputs/vsphere
    
    I can provide examples of the config files if needed 🙂
    
    with regards
  - Chris says:
    
    August 29, 2020 at 3:39 pm
    
    Thank you for the detailed and informative answer.
    
    I’m not sure what you mean with: “This will only work with a static number of fields/instances.” Is this only a problem with the perfmon modular input agent or in general?
    
    I was using perfmon data with metrics index for some time (yes, it was more expensive then perfmonmk, but the performance was so much better 😉
    But to get it working with the new format was a painful experience 🙁
    So, I gave up and switched to Telegraf. It supports Splunk multi-event Metrics and works great.
    
    Here are all the information you need:
    
    https://www.splunk.com/en_us/blog/it/the-daily-telegraf-getting-started-with-telegraf-and-splunk.html
    
    Splunk Metrics serializer
    https://github.com/influxdata/telegraf/tree/master/plugins/serializers/splunkmetric
    
    Windows Performance Counters Input Plugin
    https://github.com/influxdata/telegraf/tree/master/plugins/inputs/win_perf_counters
    
    2 Weeks ago I started to collect vmware performance data with Telegraf and can finally use the metric store for this. The Splunk Add-On for VMware app is still not supporting this.
    https://github.com/influxdata/telegraf/tree/master/plugins/inputs/vsphere
    
    I can provide examples of the config files if needed 🙂
    - XO says:
      
      August 30, 2020 at 2:05 pm
      
      I will have a look at it.
      
      To try and break down the issues we had-
      
      To make everything work directly in vanilla Splunk, you have to have a single expression which can cover the entire event, that part is easy. The challenging part comes, when you have to create your format string, because each field has to have a unique name.
      
      The issue with perfmon data, the number of columns is typically consistent, however, the number of rows is not. Say for example, you are pulling in data for perfmon process- there is no way to determine how many instances will be there at any given time, and your format string, has to work for any number of instances, which is not currently possible within vanilla props/transforms.
      
      To work around this issue, I converted perfmonMK data, to csv format using transforms, and then ingested the csv as log2metrics, which worked without issues…. except- for Splunk breaking the csv into multiple rows, reappending the header to each row, which overall, caused license usage to grow over normal perfmonMK event data.
      
      I guess my issue- is using metrics, we cannot get to equal, or less license usage then normal perfmonMK… which- overall, I would say it is still worth switching to metrics in that case. I do have a GUI project which automatically builds out all of the relavent props/transforms/etc….. perhaps I will finish it using the csv method.
      
      I guess as a positive note from all of the work we did this year, it is a recognized/visible issue to the core splunk teams now…. and they are working on the issue.
      
      On the sad note- we had to cancel our presentation as to why everybody SHOULD be using metrics… due to aforementioned issues with metrics MKV. (multiple KEY and VALUE)