We have a Splunk setup that has two indexers and a search head running Splunk 5.0.4 and SplunkforPaloAltoNetworks 3.3.1 (upgrades to both are planned "soon"). There is another machine configured as a heavy forwarder that accepts PAN logs via syslog and is configured to slurp the resulting logs into Splunk.
On the search head, we have noticed that some number of the default saved searches (many running every five minutes) take longer than 5 minutes to complete. After some snooping, we find that the system is significantly IO bound, in spite of having two mirrored 10K disks. It appears, based on iostat, that the big writers are splunk-optimize processes running for various pan_* indexes, sometimes averaging > 100MB/s of reading and writing combined in a single second.
Example iostat output:
# iostat -dmx 1
Linux 2.6.32-358.11.1.el6.x86_64 ($HOSTNAME) 01/09/2014 x86_64 (48 CPU)
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util
sda 2.70 6861.97 311.25 196.04 18.92 27.57 187.69 0.07 0.14 1.23 62.48
sda 0.00 323.00 727.00 74.00 43.59 9.50 135.75 23.06 66.64 1.25 100.00
sda 1.00 0.00 722.00 0.00 46.54 0.00 132.02 9.69 13.45 1.39 100.10
sda 0.00 17340.00 397.00 167.00 23.70 46.19 253.76 82.39 75.52 1.77 99.90
sda 0.00 13451.00 15.00 536.00 1.21 69.92 264.38 137.79 262.49 1.81 100.00
sda 0.00 103.00 636.00 69.00 43.25 7.59 147.68 19.74 74.71 1.42 100.10
sda 1.00 0.00 606.00 0.00 53.51 0.00 180.83 6.41 10.74 1.65 99.90
sda 12.00 22863.00 351.00 93.00 36.40 36.87 337.96 45.89 31.84 2.25 100.10
sda 9.00 12449.00 12.00 223.00 2.46 85.75 768.75 104.26 385.26 4.25 99.90
Obviously, we can throw more disks at the problem to resolve the io congestion, but we are running the recommended configuration for a search head. Is the PAN App fairly unique in its extensive use of tsidx files on the search head, causing us to require faster io than 'normal', or is there something obvious I'm missing configuration-wise that may help me? For that matter, am I looking in completely the wrong spot?