2 Working with Configuration Files and Indexes

1 Understanding Splunk Admin Basics and License Management

1-1 A Day in Splunk Enterprise Administrator’s Life

Splunk Enterprise Systems Administrators and Data Administrators have different responsibilities

Splunk Enterprise Systems Administrator Tasks

Install Splunk Software
Create and Manage Indexes
Manage Splunk Licenses
Configure Security
Monitor Splunk and respond to Monitoring Console Alerts

Splunk Enterprise Data Administrator Tasks

Create and manage inputs
Manage parsing of data including line-breaking and timestamp extraction
Design and establish new ingestion pipelines
Manage Splunk configuration files
Collaborate with users on data on-boarding

1-2 Splunk Enterprise Components

Alt Image Text

On the left side, you have machine data sources. These are servers, containers or even network devices. They generate mission data.
The mission data from those systems are collected and indexed in Splunk platform.
The users interact with Splunk platform to search, report, and analyze.

Three Core Components

Indexer
- Receive, parse and store machine data in files. Serve search requests
- it is important to know that the same splunkd daemon does both writing and reading the files from the disk
Search Head
- Web interface for the users. Dispatch searches to Indexers
Universal Forwarder
- Collects data from the clients and forwards for Indexing

how Splunk works

Alt Image Text

Client System:

Left side, you have the client system. The applications write logs to log files.

Splunk universal forwarder, which is a Splunk process, reads those log files and sends them to the indexer via TCP.

Indexers:

The indexers receive the data, parses them, organizes them in index files.

Search head process

The search head process, which is again another splunkd process, is used as a web interface for the user. The search head is responsible for dispatching searches to the indexers, collect the data, and make it available for the user.

Other Splunk Components

License Master
Deployment Server
Cluster Master
Search Head Deployer
Monitoring Console
Heavy Forwarder

License master

The License Master manages Splunk licenses
Other Splunk Components(such as indexer and searches) are License Slaves
Can be a co-located with other components such as Monitoring Console
Licenses can be managed through Splunk Web

deployment server

Manages the Configuration files on the Deployment Clients
Maintains configuration in serverclass.conf
- Deployment clients typically mean the servers on which universal forwarders are installed.
Alternative solutions such as ansible/puppet can be used
Configuration files are packaged as apps
- An app is a Splunk terminology. App simply means a set of configuration files organized in order.
Deployment clients periodically poll Deployment Server
- Deployment clients periodically poll deployment server and then download files that are designated for them, so it is rather a pull mechanism and not a push mechanism.

Cluster Master

Cluster Master manages the Indexer Cluster
There is only one Cluster Master
- There is no high availability for the cluster master.
- You can have a passive standby, but it cannot be active.
Maintains data bucket status and handles replication
Distributes configuration files and apps to Cluster members

Search Head

Search Head Deployer distributes apps and configuration files to Search Head Cluster Members
Keeps the files in $SPLUNK_HOME/etc/shd-apps
Cannot run on the same instance as a cluster member

Monitoring Console

Monitoring Console is a Web App that helps to monitor the system health
Rich set of charts and statistics
One-stop-shop to monitor everything
Only Administrators have access

Heavy forwarders

Heavy forwarders can parse data before forwarding to indexer
Full Splunk Enterprise binary with distributed search disabled
Can also index data locally
Smaller footprint compared to indexer

1-3 Splunk Licensing Options

Splunk is not free. Splunk Licensing is based on the amount of data indexed

A critical thing to note because it does not depend on the number of clients, as some of the licensing model works.

It does not depend on how many processes connect to Splunk indexers.

It is the amount of data that is received and indexed into Splunk platform.

Types of Splunk Licenses

Enterprise License
Industrial IoT License
Free License
Forwarder License
Trial License
Dev/Test License

Enterprise License

The Enterprise License can be bought for any indexing volume
Enables all Splunk features including clustering and distributed search
No-enforcement. Users can still search after a License violation
Licenses can be stacked

You can purchase a license for any indexing volume from Splunk.

Free License

The Free license includes 500MB/day indexing, for life
Disabled features
- Clustering
- Authentication
- Distributed Search
- Alerting
- Deployment management

Trial License

The Trial License provides full Splunk features for 60 days
After 60 days, it automatically becomes Free License
Maximum 500 MB daily volume
Sales Trial license can provided for customized limits

Industrial IoT License

Splunk for Industrial IoT License
Not stackable
Access to Splunk Enterprise and a select premium Splunk Apps

Forwarder License

Forwarder License allows forwarding of unlimited data
Cannot be used for indexing
No need to purchase them separately
Universal Forwarders automatically apply Forwarder License
Heavy Forwarders must be converted to Forwarder License group

Dev/Test License

Dev/Test license for running Splunk in Non Prod environments
Cannot be used in distributed environment
Not stackable
Can be used for Splunk App development

What counts towards daily License quota?

Full size of data flowing through the parsing pipeline
- Whatever gets into the parsing queue is what gets charged.
Not the disk storage
- It is not the ultimate final disk storage because the data gets compressed and also added with the metadata and other indexing data in the disk.
Does not include
- What is being charged is the data that flows through the parsing pipeline before it gets written to the disk
Replicated Data
- The data does not include replicated data.
Summary indexes
Internal Logs
- Splunk Splunks itself. There's tons of logs that Splunk generates, and they all are indexed in _internal index.
Metadata

Licensing in Distributed Environment

Alt Image Text

The license master is the only server that has the enterprise license stack installed.

All other components in your environment act as license slaves, and they simply connect to the license master.

It is configured in server.conf on the licensed slaves.

1-4 Managing License Violations

License Warnings and Violations

Exceeding daily volume quota results in a Warning
5 or more warnings in a 30 day rolling period is a Violation
Searching is NOT disabled in violation period
Alert logged in Messages on any Splunk Web pages

Monitoring License Warnings

Monitoring Console: Enable the license monitoring alerts
Licensing Page in Splunk Web: Current and permanent violations
Usage Report in Splunk Web: Current and previous 30 days usage

Handling License Violations

Review heavy hitters (Usage Report) and adjust intake
The daily limit resets at midnight
Buy more license

Demo

Licensing page in Splunk Web
Review Warnings and Violations
Usage Report
- Current
- Past 30 days
Enable Monitoring Console License Alerts

Alt Image Text

2 Working with Splunk Configuration Files

Understanding Splunk configuration files
Structure of Splunk configuration files
Layering the configuration files
- Index-time
- Search-time
Precedence of configuration files
Using btool to troubleshoot

2-1 Understanding Splunk Configuration Files

Splunk platform is configured using set of text-based configuration files

Splunk Configuration Files

Text files located in SPLUNK_HOME/etc/…
.conf extension (Examples: server.conf, deploymentclient.conf)

Typically, it's /opt/splunk in UNIX‑based systems.

Changes made via Splunk Web update configuration files
Contain stanzas and key value pairs
Govern a Splunk functionality

For example, props.conf governs how data is parsed.

Server.conf governs the system‑level parameter of the splunkd process.

There are only two Splunk software packages: Splunk Enterprise and Splunk Universal Forwarder

Splunk Software Packages

Splunk Enterprise

Indexer
Search Head
License Master
Deployment Server
Cluster Master
Search Head Deployer
Monitoring Console
Heavy Forwarder

Splunk Universal Forwarder

Deployment Client

Splunk Platform Directory Structure

Alt Image Text

On top, we have SPLUNK_HOME. In general, it is /opt/splunk in Linux‑based systems, but it is any directory that you chose to install Splunk.
bin directory under SPLUNK_HOMEthat contains Splunk commands, for example, the ./splunk command that we typically use to restart a Splunk process is located under SPLUNK_HOME bin.
etc. It has three major directories.
- The first is apps. Under apps, we have configuration files packaged as apps. We will see that apps is one of the core ways Splunk manages configuration files.
- users directory, under which we have user‑specific configuration files.
  - For example, if a user logs onto Splunk Web and then creates an alert, the alert configuration goes inside the users directory.
- System. Here is where system‑wide configuration files are stored.

If the user shares the alert, the configuration is moved to the apps directory

var directory under SPLUNK_HOME that contains two major directories.
- log in which we have Splunk platform's log files.
  - The index in which the logs are stored is _internal.
- Then we have lib directory, under which the indexes live. Index is where the mission data is stored.

Structure of Splunk Configuration File

Configuration Files Are Made of Stanzas and Key-value Pairs

indexes.conf

[default]
maxTotalDataSizeMB = 650000
maxGlobalRawDataSizeMB = 0

# idx1 index settings
[idx1]
homePath = volume:hot1/idx1
coldPath = volume:cold2/idx1

inputs.conf

[monitor:///var/log/httpd]
sourcetype = access_common
ignoreOlderThan = 7d
index = web

Two Ways to Learn Splunk Configuration Files

README files

$SPLUNK_HOME/etc/system/README directory contains example and spec files

Spec files at docs.splunk.com https://docs.splunk.com/Documentation/Splunk

Alt Image Text

PLUNK HOME/etc/system/README

2-2 Layering Splunk Configuration Files

Splunk App

Splunk's way of organizing configuration files
A directory under SPLUNK_HOME/etc/apps
Contains Splunk configuration files
Can also contain scripts and other necessary artifacts
An add-on is an app that usually does not contain GUI

Search & Reporting app

Splunk ships with Search & Reporting app which is a prebuilt general-purpose app.
The configuration is stored under SPLUNK_HOME/etc/apps/search

Splunk App Directory Structure

Alt Image Text

We have another special folder called metadata that contains default.meta and/or local.meta that defines the permissions for this app

Default vs. Local Directories

Default
- Shipped with Splunk
- Will be overwritten upon Splunk upgrade
- Files should not be updated
- Contains default settings
- Does not override local
Local
- User created
- Will be preserved upon Splunk upgrade
- Recommended place to modify files
- User-specific configuration changes
- Always overrides default

All the Configuration File Locations

etc/system/default
etc/system/local
etc/apps/search/default
etc/apps/search/local
etc/apps/<app>/default
etc/apps/<app>/local
etc/users/<user>/<app>/local
Do not update files in etc/system/default

Never update files in etc/system/default. It is the default system where the configuration, and it is bound to be updated or overwritten whenever you upgrade Splunk platform.

2-3 Mastering Splunk Configuration Files Precedence

Index-time and Search-time

Index-time: Global context, such as input/parsing configuration

configuration with a global context such as inputs.conf

Search-time: App/User scoped, such as a user’s knowledge objects

would be saved to searches.conf in which a user can have scheduled reports and even alerts

Alt Image Text

Index-time Precedence

Alt Image Text

etc/system/local. Any configuration here will override configuration in any other location. This is the gold standard, the mother of all configuration files, etc/system/local.
etc/apps/search/local. Search is a special pre‑built app that comes with Splunk.
etc/apps, any custom app, in this case, app1/local.
etc/apps/search/default.
etc/apps, any app, in this case, app1, default.
etc/system/default. That gets the lowest precedence in any case, etc/system/default.

if there are two or more apps that have conflicting settings, app directory name with the highest ASCII order wins.

Search-time Precedence

Alt Image Text

etc/users, username, and then the particular app he is working on, /local. You can see that etc/system/local does not get the top precedence.
etc/apps, the app in which the user is working on right now, /local.
etc/apps, app name, default.
etc/apps/search/local.
etc/apps/search/default.
etc/system/local. You can see that during index time, etc/system/local is the most important directory, whereas during search time, it is almost the last.
etc/system/default.

How Splunk merges configuration files.

Upon startup, Splunk merges configuration files for each type, very important point.

For example, if there are multiple limits.conf spread across various applications, or apps rather, upon startup, Splunk will merge all the configurations into one single limits.conf.

The resulting file combines settings from various directory locations.
- It is a union of all settings for the same configuration file that is available across different directories.
Only one file per file type will be used at runtime.

It does not matter how many inputs.conf are out there or how many props.conf are out there.

For a given type, there is only one configuration file at runtime. If there is a conflict, precedence is applied.

If there is a conflict, precedence is applied
Local takes precedence over default

How layering and precedence works.

consider props.conf. And in this section of props.conf, it happens to be processed during index time.

Configuration File Merging Example 1

etc/apps/karun_app_props/local/props.conf

[web:log]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\d{4}-

etc/apps/joe_app_props/local/props.conf

[app:log]
TIME_FORMAT = %Y-%m-%d %H:%M:%S%Z
TIME_PREFIX = ^

etc/apps/jack_app_props/local/props.conf

[db:log]
BREAK_ONLY_BEFORE_DATE = true
SHOULD_LINEMERGE=true

[web:log]
category = web

Effective props.conf at runtime

[web:log]
SHOULD_LINEMERGE = false
LINE_BREAKER = ([\r\n]+)\d{4}-
category = web

[app:log]
TIME_FORMAT = %Y-%m-%d %H:%M:%S%Z
TIME_PREFIX = ^

[db:log]
BREAK_ONLY_BEFORE_DATE = true
SHOULD_LINEMERGE=true

Configuration File Merging Example 2

etc/system/default/limits.conf
[inputproc]
file_tracking_db_threshold_mb = 500
learned_sourcetypes_limit = 1000


etc/apps/karun_limits_app/local/limits.conf
[inputproc]
max_fd = 500

etc/system/local/limits.conf
[inputproc]
max_fd = 300

Effective limits.conf at runtime

[inputproc]
file_tracking_db_threshold_mb = 500
learned_sourcetypes_limit = 1000
max_fd = 300

file_tracking_db_threshold_mb： The first parameter defines the maximum size of fishbucket database，after which a new database file will be created. Fishbucket database is used to keep track of monitoring the input files and its progress.
Learned_sourcetypes_limit limits the maximum number of automatically created source types.

When overriding configuration with a local file, do not copy the entire file from default. Just add the overriding configuration.

Overriding the Default Configurations

Correct Way to Override Defaults

etc/system/default/props.conf

[default]
CHARSET = UTF-8
LINE_BREAKER_LOOKBEHIND = 100
TRUNCATE = 10000
DATETIME_CONFIG = /etc/datetime.xml
ADD_EXTRA_TIME_FIELDS = True
ANNOTATE_PUNCT = True

TRUNCATE: defines the maximum number of lines after which an event will be truncated. So by default, it is 10000.

Now in order to override this, all you need to do is to simply redefine TRUNCATE with the value that you want in etc/system/local/props.conf.

Again, do not copy the entire props.conf from default. Simply override it with etc/system/local/props.conf.

etc/system/local/props.conf

[default]
TRUNCATE = 50000

Note that etc/system/local/props.conf is the mother of all configuration files.

As an added tip, the two locations where default configurations are stored out of the box is etc/system/default and etc/apps/search/default.

2-4 Using btool to Work with Splunk Configuration Files

A Splunk command
Located in SPLUNK_HOME/bin
Retrieves the on-disk configuration of a Splunk configuration file
Syntax: splunk btool <conf file name> list [options]
--debug option shows the exact .conf file location

Btool Example

/opt/splunk/bin# ./splunk btool inputs list monitor:///var/log

[monitor:///var/log]
_rcvbuf = 1572864
disabled = false
host = Karun-Mac-Container
index = main


/opt/splunk/bin# ./splunk btool inputs list monitor:///var/log --debug
/opt/splunk/etc/apps/search/local/inputs.conf [monitor:///var/log]
/opt/splunk/etc/system/default/inputs.conf _rcvbuf = 1572864
/opt/splunk/etc/apps/search/local/inputs.conf disabled = false
/opt/splunk/etc/apps/search/local/inputs.conf host = Karun-Mac-Container
/opt/splunk/etc/apps/search/local/inputs.conf index = main

retrieving the contents of inputs

As you can see, there are two inputs.conf in play here. One comes from etc/apps/search/local, and the other one comes from etc/system/default.inputs.conf.

/opt/splunk/bin$ ./splunk status

/opt/splunk/bin$ ps -ef | grep splunk

splunk -p 8089 start

cd system/default
/opt/splunk/etc/system/default$ ls
alert_actions.conf ....
app.conf

$ vi props. conf
# This file specifies how to parse a given log file

Add new data inputs

Alt Image Text

Start searching

Alt Image Text

source="/var/log/*" host="Karun-Mac"

Alt Image Text

New input file created

/opt/splunk/etc/apps/search/local: 1s -lrt

inputs.conf

[splunktep://9997]
connection_host = ip

[monitor:///var/log]
disabled = false
host = Karun-Mac
index = main

Alt Image Text

/opt/splunk/bin$ ./splunk btool inputs list


/opt/splunk/bin$ ./splunk tool inputs lis monitor

[monitor:///opt/splunk/etc/splunk.version]
...
[monitor:///opt/splunk/var/loq/introspection]

./splunk btool inputs list monitor --debug
...
/opt/splunk/etc/apps/search/local/inputs.conf   [monitor:///var/log]  
/opt/splunk/etc/system/default/inputs.conf      _revbuf = 1572864
/opt/splunk/etc/apps/search/local/inputs.conf   disabled = false
/opt/splunk/etc/apps/search/local/inputs.conf   host = Karun-Mac
/opt/splunk/etc/apps/search/local/inputs.conf   index = main

/opt/splunk/bin$./splunk btool inputs list monitor:///var/log

[monitor:///var/log]
_revbuf = 1572864
disabled = false
host = Karun-Mac
index = main

2-5 More Examples of Using Btool

Review a Splunk configuration file
Locate the btool command
Use btool to retrieve configuration
Use btool to analyze conflicts
Using REST API to retrieve configuration

dpkg source type

Now I know that I did not manually create this sourcetype. Splunk uses sourcetype to parse the data, so I need to find out where the source type dpkg is defined.

Alt Image Text

./splunk btool props list dpkg

/opt/splunk/etc/apps/learned/local/props.conf [dpkg]

install Splunk *NIX TA, the Splunk UNIX TA

The *NIX TA can be obtained from splunkbase.solunk.com

Alt Image Text

/opt/splunk/etc/apps$ ls-lrt
...
Splunk_TA_nix


cd opt/splunk/etc/apps/Splunk_TA_nix/local$ ls -lrt

vi inputs.conf

Default is disable, enable all the input

Alt Image Text

./splunk btool inputs list script://./bin/top.sh --debug

How we can access Splunk configuration via rest API

| rest services/properties

Alt Image Text

| rest services/properties/props

| rest services/properties/props/dpkg/MAX_EVENTS

Alt Image Text

3 Understanding Splunk Index

Overview

How Splunk organizes data
Index buckets
- Hot
- Warm
- Cold
- Frozen
- Thawed
Creating an index
- Using Splunk Web
- Using configuration files
- One index vs multiple indexes
Index data integrity

3-1 How Splunk Organizes Data

Splunk stores data in indexes, which are organized in directories and files in disk

Splunk Indexes

Indexes are stored in SPLUNK_HOME/var/lib/splunk
The directory location is customizable for each index
Indexes contain raw data and index files
Indexes can be created by an Administrator
Many prebuilt indexes
- _internal
- _audit
- main

Alt Image Text

Another useful internal index is _introspectionwhich stores Spunk platform's performance metrics

Inside a Splunk Index

Raw data: Raw data is stored in compressed format
Tsidx files: Time series index files that point to the raw data
metadata: Metadata files such as Sources.data, SourceTypes.data and Hosts.data

3-2 Splunk Data Buckets

Buckets

Indexes store data in buckets
Buckets are set of directories organized by age
Contain raw data, tsidx files and metadata
As the index grows, number of buckets grows as well

Types of Buckets

Hot
Warm
Cold
Frozen
Thawed

Hot

Hot buckets contain the newest data
They are open for both read and write
There can be more then one hot bucket in an index
Searchable
Roll to warm bucket when buckets reach certain size, or upon Splunk restart

maxHotBuckets in indexes.conf can beconfigured (default 3)

Warm

Warm buckets are created when hot buckets roll
Not open for writing, but searchable
They reside in the same directory as hot buckets but renamed
Roll to cold buckets when exceeding maximum warm buckets setting

maxWarmDBCount in indexes.conf can be configured (default 300)

Cold

Starting from oldest bucket (based on time), warm buckets roll to cold
Reside in different directory from hot and warm
Searchable
The directory location can be configured
Possible to save cost by using cheaper storage

Frozen

After cold buckets age out based on retention policy, they roll to frozen
The default action is to delete; can be configured to archive
coldToFrozenDir or coldToFrozenScript in indexes.conf configures archiving
Archived frozen buckets are not searchable

All events in a cold bucket must be eligible to be frozen

Thawed

Frozen buckets can be thawed
Thawed buckets are rebuilt into the index and searchable
Location of thawed buckets can be configured in indexes.conf
No age restriction for thawed buckets
Use splunk rebuild command to rebuild

Bucket Locations

SPLUNK_HOME/var/lib/splunk/myfirstindex/db/
SPLUNK_HOME/var/lib/splunk/myfirstindex/db/
SPLUNK_HOME/var/lib/splunk/myfirstindex/colddb/
Frozen buckets are deleted by default. Can be optionally archived.
SPLUNK_HOME/var/lib/splunk/myfirstindex/thaweddb/

Bucket Naming

Hot

hot_v1_<local id>
hot_v1_5

Warm

db_<newest time>_<oldest_time>_<local id>
db_1559676230_1559676181_0
Local id is the ID of the bucket
Newest and oldest time are in UTC epoch time in seconds
During a search, Splunk uses the time range in the bucket name before opening it

Index Directory Structure

Alt Image Text

3-3 Creating Splunk Indexes

Why Create Multiple Indexes?

Security： Restrict access to index by Splunk role
Retention： Retention policies are applied at index level

authorize.conf

Security is defined in authorize.conf

[role_myrole]      # Definition for myrole
importRoles = user   # Inherit user role’s capabilities
srchIndexesAllowed = os,idx1   #Allowed indexes are os and idx1
srchIndexesDefault = idx1  # Default index is idx1

[role_mysuperuserrole]
importRoles = user
srchIndexesAllowed = os,idx1,idx2  # Allowed indexes are os,idx1,idx2
srchIndexesDefault = idx2   # Default index is idx2

indexes.conf

Indexes are configured in indexes.conf

[myfirstindex]   # Definition for myfirstindex
coldPath =  $SPLUNK_DB/myfirstindex/colddb   # Location for cold buckets
enableDataIntegrityControl = 0  # Disable Data Integrity Control
enableTsidxReduction = 0        # Disable TSIDX reduction

homePath = $SPLUNK_DB/myfirstindex/db   # Location for hot and warm buckets

maxTotalDataSizeMB = 512000   # Maximum size of the index
frozenTimePeriodInSecs = 1209600  # Number of seconds after which bucket roles to frozen (every event in a bucket must be older than this limit)

thawedPath = $SPLUNK_DB/myfirstindex/thaweddb # Location for thawed buckets

Creating Splunk Index

Using Splunk Web
- Settings -> Indexes -> New Index
Using configuration files
- Copy existing stanza in indexes.conf and update.
- Restart of Splunk required

Creating Index Using Splunk Web

Alt Image Text

Index Data Integrity

enableDataIntegrityControl=true in indexes.conf.
- Check integrity:

./splunk check-integrity -index [ index name ]

Allows to ensure indexed data has not been tampered with.
- Regenerate hash:

./splunk generate-hash-files -index [ index name ]

Creates hash files (using SHA 256) as the data is indexed.
- Hash files are stored in rawdata directory within the index.

One Index vs. Multiple Indexes

Security: Create separate indexes if you want to allow access selectively
Retention: Create separate indexes if you want varying data retention periods
Management: Easier management for chargeback processes

3-4 Demo

Create a Splunk index using Splunk Web

Alt Image Text

Upload data

Alt Image Text

Review the indexes.conf

cd var/lib/splunk/
cd oslogs

/opt/splunk/var/lib/splunk/oslogs$ ls -lrt
thaweddb
datamodel_summary

/opt/splunk/var/lib/splunk/oslogs/db$ ls -lrt
CreationTime
hot_v1_1
hot_v1_0

cd hot_v1_0
1577042620-1577042619-577230494255484
$ ls -lrt
bucket_info.csv
Strings.data
Sources.data
SourceTypes.data
Hosts. data
1559676212-1559676181-5772278361002267635.tsidx
splunk-autogen-params.dat
1559676230-1559676212-5772318639205556589.tsidx
rawdata

/opt/splunk/var/lib/splunk/oslos/db/db_1559676230_1559676181_1/rawdata$ ls
journal.az
12Hash_1_2013DDA2-630D-4FFE-BD9E-7EC7E2D8C74B.dat
slicesv2.dat

Review the index directory structure
- buckets
- tsidx files
- raw data
Check data integrity

./slunk btool indexes list oslogs --debug

/opt/splunk/bin$./splunkcheck-integrity

WatchdogActionsManager reload started
Starting WatchdogThread for process pid=11068. Threads monitoring is enabled with response timeout set to 8000 ms
Operating on: id=oslogs bucket='/opt/splunk/var/lib/splunk/oslogs/db/db_1577042620_1577042619_0
Operating on: id=oslogs bucket='/opt/splunk/var/lib/splunk/oslogs/db/db_1559676230_1559676181_1'
Total buckets checked=2, succeeded=2, failed=0

4 Configuring Indexes

Learning to tune indexes.conf
Understanding fishbucket
Applying a data retention policy
Managing Splunk index
- Backup
- Backup
- Cleaning out
Using monitoring-console to understand index configuration

4-1 Splunk Index Configuration File

Indexes.conf file is used to configure Splunk indexes and their properties

Location of indexes.conf

There is a default indexes.conf in SPLUNK_HOME/etc/system/default directory.
DO NOT edit this file.
Create a new indexes.conf in SPLUNK_HOME/etc/system/local directory and add specific customization there.

You can also place indexes.conf as part of an app under SPLUNK_HOME/etc/apps/<app name>/local.

Restart of splunkd required for any configuration changes

Structure of indexes.conf

The file can have a 'default' stanza for defining global properties

If a property is defined outside of any stanza, at the top of the file, it is considered a global property.

If a property is defined at both global level and in a specific stanza, value in the stanza takes precedence.

If there are multiple definitions of the same settings in a stanza, last setting wins.

Indexes.conf

# A basic index definition

[weblogs]

homePath = $SPLUNK_DB/weblogs/db
coldPath = $SPLUNK_DB/weblogs/colddb
thawedPath = $SPLUNK_DB/weblogs/thaweddb

tstatsHomePath = $SPLUNK_DB/weblogs/datamodel_summary

frozenTimePeriodInSecs = 5184000

homePath = $SPLUNK_DB/weblogs/db  # Directory for hot and warm buckets

coldPath = $SPLUNK_DB/weblogs/colddb # Directory for cold buckets
thawedPath = $SPLUNK_DB/weblogs/thaweddb  # Directory for thawed buckets
tstatsHomePath = $SPLUNK_DB/weblogs/datamodel_summary  # Directory for accelerated data model summary tsidx files

frozenTimePeriodInSecs = 5184000 # Freeze data older than this many seconds

maxHotBuckets = 3  # Maximum number of hot buckets

maxDataSize = auto  # Maximum size of hot buckets (default 750MB)

maxTotalDataSizeMB = 400000  # Maximum size of an index

maxWarmDBCount = 300   # Maximum number of warm buckets

4-2 Understanding Fishbucket

Splunk Fishbucket

fishbucket is a special Splunk internal index that is automatically created.

Checkpoint

Fishbucket keeps track of the ingestion progress of monitored files and directories

Location

It is located in SPLUNK_HOME/var/lib/splunk/fishbucket.

Use

Upon restart, using fishbucket, Splunk can start ingesting from where it left off

Using Fishbucket

To re-index a particular file

# Use btprobe command
./splunk cmd btprobe -d SPLUNK_HOME/var/lib/splunk/fishbucket/splunk_private_db --file <file name> --reset

To re-index all monitored file

# Remove the entire fishbucket directory

rm -rf /opt/splunk/var/lib/splunk/fishbucket

You must restart Splunk Forwarder

4-3 Applying a Data Retention Policy

Data Retention in Splunk

you must define a retention policy for the data indexed
Retention policy is applied at index level
Set maxTotalDataSizeMB and/or frozenTimePeriodInSecs
maxTotalDataSizeMB overrides frozenTimePeriodInSecs
indexes.conf is used to configure retention ploicy

Configuring Data Retention

maxTotalDataSizeMB
- Maximum size of the index in Mega Bytes. Oldest data is frozen after this limit. Default is 500GB
frozenTimePeriodInSecs
- Time period in seconds after which the data rolls to frozen. Default is 6 years

What Happens to the Expired Data

When the bucket rolls from cold to frozen, by default the data is deleted
If coldToFrozenScript is configured, the script is executed
If coldToFrozenDir is configured, Splunk moves the expired buckets to this directory
To restore expired data, copy the archived buckets to thaweddb location and rebuild
- ./splunk rebuild SPLUNK_HOME/var/lib/splunk/<index name>/thaweddb/<bucket name>

4-4 Managing Splunk Indexes

Backing up Splunk

You must regularly backup Splunk. Daily incremental backups recommended.
Objects to backup: Warm and cold buckets; Entire etc directory.
Hot buckets can't be backed up without stopping Splunk.

In a clustered environment, you may not need to backup data buckets as data is replicated.

In distributed environments, ensure you backup search heads and heavy forwarders.

Deleting Events in a Splunk Index

Best way is to let the data expire instead of deleting

You can only do a virtual delete (i.e data is not removed from disk)

Even admins can't delete data by default

Create a user with can_delete capability.
Run a search to list the desired events to be deleted.
Pipe the delete command.

Cleaning out a Splunk Index

Extremely dangerous command in production
Destroys index data from disk
syntax:
- splunk clean all -index <index name>

4-5 Demo

Review indexes.conf
- Data retention policy

cd ../etc/system/default/
vi indexes.conf

cd /opt/splunk/etc/apps/search/locals
vi indexes.conf

[myfirstindex]
coldPath= $SPLUNK_DB/myfirstindex/colddb
enableDataIntegrityControl=1
enableTsidxReduction=0
homePath=$SPLUNK_DB/myfirstindex/db
maxTotalDataSizeMB=512000
thawedPath=$SPLUNK_DB/myfirstindex/thaweddb
archiver.enableDataArchive=0
bucketRebuildMemoryHint=0
compressRawdata = 1
enable0nlineBucketRepair = 1
minHotIdleSecsBeforeForceRoll=0
rtRouterQueueSize =
rtRouterThreads =
selfStorageThreads =
suspendHotRollByDeleteQuery=0
syncMeta = 1
tsidxWritingLevel =


[oslogs]
coldPath=$SPLUNK_DB/oslogs/colddb
enableDataIntegrityControl=1
enableTsidxReduction=0
homePath=$SPLUNK_DB/oslogs/db
maxTotalDataSizeMB=512000
thawedPath=$SPLUNK_DB/oslogs/thaweddb

cd /opt/splunk/bin
./splunk btool indexes list oslogs --debug | grep frozen

/opt/splunk/etc/system/default/indexes.conf  frozenTimePeriodInSecs = 188697600

Review location and contents of fishbucket

cd fishbucket/
/opt/splunk/var/lib/splunk/fishbucket/splunk_private_db/

$ ls -lrt
snapshot
btree_index.dat
btree_records.dat

Alt Image Text

Reset fishbucket to re-index data

index=oslogs source="/var/log/apt/history.log" "Commandline: apt-get install -y --no-install-recommends python-pip"

Alt Image Text

index=slogs source="/var/log/apt/history.log" "Commandline: apt-get install -y --no-install-recommends python-pip" | convert ctime
(_indextime) AS Indextime | table Indextime,_time,_raw

index=oslogs source="/var/log/apt/history.log" "Commandline: apt-get install -y --no-install-recommends python-pip" | convert ctime
(_indextime) AS Indextime | search Indextime = "12/31/2019 12:21:57"

Alt Image Text

$rm -rf fishbucket/
cd /opt/splunk/bin

opt/splunk/bins./splunk restart

Deleting events from an Index

The user must have can_delete capability to execute the delete command

index=oslogs source="/var/log/apt/history.log" "Commandline: apt-get install -y --no-install-recommends python-pip" | convert ctime
(_indextime) AS Indextime | search Indextime = "12/31/2019 12:21:57"  | delete

Alt Image Text

Use monitoring console to monitor indexes

Alt Image Text

Course Summary

Licensing options and handling license violations
Configuration files layering and precedence
Using btool to troubleshoot
Creating and tuning Splunk indexes
Various Splunk data buckets
Configuring a data retention policy
Managing Splunk indexes