Use Extract Fields functionality to parse the data in your source types and create field extractions.
Parse data
To extract fields from your data, you must parse the data for each of the source types in your add-on. The Field Extractor supports parsing for the following data formats:
- Unstructured Data. Typically used for log files.
- Table. Data in tabular formats, such as comma-separated values (CSV) and tab-separated values (TSV).
- Key Value. Data that contains key-value pairs.
- JSON. Data in the JavaScript Object Notation (JSON) format.
- XML. Data in the Extensible Markup Language (XML) format.
To parse data for a source type and extract fields
- On your add-on homepage, click Extract Fields on the Add-on Builder navigation bar.
- On the Extract Fields page, from Sourcetype, select a source type to parse.
- From Format, select the data format of the data. Any detected format type is automatically selected and you can change the format type as needed. If you aren't sure what format type you need and a format type has not been automatically selected, use "Unstructured Data" as the format type.
- Click Parse.
Extract fields
After parsing the data, the Add-on Builder displays the results on a summary page.
- If you are satisfied with the results, click Save.
- If you want to try parsing again using a different format, click Cancel to return to the previous page.
After data for a source type has been parsed, the source type is added to the table on the Extract Fields page.
- To retrieve parsed field extractions, click Load Results for the source type.
Unstructured Data
The Add-on Builder's field extractor displays a selection of events in groups, along with the extracted fields. Use this display to:
- Select one or more groups to represent the data.
- Display the regular expression that the field extractor used, and modify it to improve the field extraction.
- Click on individual field names to include or exclude the field for extraction.
- Click the Edit icon next to a field name to edit the field name.
- Click the Trash icon next to a field name to remove its capture group from the regular expression.
Table
The Table format is used with tabular data and lets you:
- Change how data is parsed by selecting the delimiter character that is used to separate fields. To specify a different character, click Other and enter the character.
- Change the field names after you have selected the correct delimiter. Note that each time you change delimiters, the number of columns might change and cause you to lose changes to field names.
Key Value
The Key Value format is used with data containing key-value pairs and lets you do the following:
- Change how data is parsed. For Extraction Methods, you can select:
- Auto to let the Add-on Builder parse data automatically.
- Delimiters to use delimiters.
- Regex to use regular expressions.
- For Delimiters, select the delimiters for the key-value pairs:
- Specify the pair delimiter character, which is used to separate key-value pairs.
Using the examplekey_a=value_a, key_b=value_b
, the correct character is a comma. - Specify the key-value delimiter character, which is used to separate keys and values.
Using the examplekey_a=value_a, key_b=value_b
, the correct character is an equals sign.
- Specify the pair delimiter character, which is used to separate key-value pairs.
- For Regex: select the regular expression to use, or create your own.
JSON
The JSON format is used with JSON data. There are no additional parsing options.
XML
The XML format is used with XML data. There are no additional parsing options.
Troubleshooting
What if I need to upload different sample data?
If you decide that you need to upload a different sample data file for a source type, for example you want to clean the data first, go to Manage source types, delete the sample data, then upload additional data files.
A regular expression had too many capture groups, what do I do?
This error is displayed after attempting to parse a file, and the regular expression created by the Field Extractor contains more than 100 capture groups (fields).
This error might indicate a problem with the Event Break setting for the source type:
- Go to Manage source types.
- Edit the source type and select a different option for Event Break.
- Upload the sample events again. Because the Event Break option is applied when indexing the data, changing this value does not affect events that have already been indexed.
- Parse the data again.
The sample data might contain an event that is too long:
- Edit the sample data file by splitting the long lines to clean up the data.
- Go back to Manage source types.
- Upload the sample events again.
- Parse the data again.
Why are the field names not detected in my tabular data?
The Add-on Builder uses the first 1000 events for field extraction. If your data contains more than 1000 events, the parser cannot automatically detect the field names.
The parser assumes that all entries except the table header contain a timestamp. If entries in your tabular data do not contain a timestamp, the parser will not correctly detect which entry is the table header.
Learn more
For more information, see the following Splunk Enterprise documentation:
- About fields in the Knowledge Manager Manual
- Build field extractions with the field extractor in the Knowledge Manager Manual
- Field Extractor: Select Fields step in the Knowledge Manager Manual
Version 2.2.0 and later of the Splunk Add-on Builder lets you map the fields from your data events to the fields in any data model, including CIM data models.
- If you want to map your data to a CIM data model, the Splunk Common Information Model add-on is required to use this feature. Download the Splunk Common Information Model add-on from Splunkbase and see Install the Splunk Common Information Model Add-on for details on how to install this add-on.
- If you want to map to your own data model, the model needs to support the standard defined under the Create a data model section.
Before you apply the data model mapping to your add-on, you must configure one or more source types for your add-on by creating a data input, by adding data from a sample file, or by adding indexed data from Splunk.
Configure the following,
In Map to data model, map the fields from your data to the fields in one of the predefined data models to normalize data at search time.
- On your add-on homepage, click Map to data model on the Add-on Builder navigation bar.
- On the Data Model Mapping page, click New Data Model Mapping.
- On the Data Model Mapping >> Define Event Type page, define an event type to generate events from which to extract fields:
- Enter a name for the event type.
- Select a source type from which to generate events.
- Enter a search to select events. By default, the search selects all events for the source type you selected, but you can apply additional search criteria as needed.
- Click Save.
- From the center panel, select one or more data models to use. Then you can also select individual datasets within a data model. Fields from your event type are displayed for reference, and fields from the selected data models are also displayed.
- When you have finished selecting data models, click Done.
- Select FIELDALIAS to map a field from the data model to a field from your event type.
- Select EVAL to map a field from the data model to an expression based on a field from your event type.
- To define a field alias, click one field name from the Data Model Fields list and one from the Event Type Fields list. Define the field alias, then click OK at the end of the new row in the Data Model Mapping List.
- If you are defining an expression, click one field name from the Data Model Fields list and one or more fields from the Event Type Fields list. Edit the expression in the Event Type Field or Expression column, then click OK at the end of the new row in the Data Model Mapping List.
The Data Model Mapping page displays an entry for the mapping you just completed.
Learn more
For more information, see the following Splunk Enterprise documentation:
- About event types in the Knowledge Manager Manual
- About tags and aliases in the Knowledge Manager Manual
- eval in the Search Reference manual.
- Use the CIM to normalize data at search time in the Common Information Model Add-on Manual