> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/reference-1/sql-commands/jobs/create-job/ingestion/content-types.md).

# Content Types

When reading in your data, additional options can be configured for the following content types:

* [`CSV`](#csv)
* [`TSV`](#tsv)
* [`JSON`](#json)
* [`AVRO_SCHEMA_REGISTRY`](#avro_schema_registry)
* [`FIXED_WIDTH`](#fixed_width)
* [`REGEX`](#regex)
* [`SPLIT_LINES`](#split_lines)
* [`XML`](#xml)

## CSV

```sql
CONTENT_TYPE = (
    TYPE = CSV
    INFER_TYPES = { TRUE | FALSE }
    [ HEADER = ('<col1>', '<col2>', '<col3>',...) ]
    [ HEADER_LINE = '<header>, <header>,...' ]
    [ DELIMITER = '<delimiter>' ]
    [ QUOTE_ESCAPE_CHAR = '<char>' ]
    [ NULL_VALUE = '<null_value>' ]
    [ MAX_COLUMNS = <integer> ]
    [ ALLOW_DUPLICATE_HEADERS = { TRUE | FALSE } ]
)
```

#### `INFER_TYPES`

Type: `Boolean`

(Optional) When `true`, each column's data type is inferred as one of the following types: `string`, `integer`, `double`, `Boolean`.

When `false`, all data is treated as a string.

#### `HEADER`

Type: `array`

Default: Empty string

(Optional) An comma-separated list of column names.

When the CSV data include a header as the first row, `HEADER` property can be omitted. By omitting this property, it tells Upsolver that a header row can be found in the data and it will take the following actions:

1. Use the first row for column names
2. Skip the first row when processing the data

If the source data does not include a header as the first row, meaning the first row contains actual data, you must include the `HEADER` property when creating a `JOB`. This tells Upsolver to take the following actions:

1. Use the provided `HEADER` property for column names
2. Do not skip the first row since it contains data

If your data does not include a header row and you do not set a `HEADER` property when creating the job, Upsolver will assume the first row is a header and not process it.

#### `HEADER_LINE`

Type: `string`

Default: Empty string

(Optional) A string containing a comma-separated list of header names. This is an alternative to `HEADER`.

#### `DELIMITER`

Type: `text`

Default: `,`

(Optional) The delimiter used for columns in the CSV file

#### `QUOTE_ESCAPE_CHAR`

Type: `text`

Default: `"`

(Optional) Defines the character used for escaping quotes inside an already quoted value.&#x20;

#### `NULL_VALUE`

Type: `text`

(Optional) Values in the CSV that match the provided value are interpreted as null.&#x20;

#### `MAX_COLUMNS`

Type: `integer`

(Optional) The number of columns to allocate when reading a row. Note that larger values may perform poorly.&#x20;

#### `ALLOW_DUPLICATE_HEADERS`

Type: `Boolean`

Default: `false`&#x20;

(Optional) When `true`, repeat headers are allowed. Numeric suffixes are added for disambiguation.

## TSV

```sql
CONTENT_TYPE = (
    TYPE = TSV
    INFER_TYPES = { TRUE | FALSE } 
    [ HEADER = ('<col1>', '<col2>', '<col3>',...) ]
    [ HEADER_LINE = '<header>, <header>,...' ]
    [ NULL_VALUE = '<null_value>' ] 
    [ MAX_COLUMNS = <integer> ]
    [ ALLOW_DUPLICATE_HEADERS = { TRUE | FALSE } ]
)
```

#### `INFER_TYPES`

Type: `Boolean`

(Optional) When `true`, each column's data types are inferred as one of the following types: `string`, `integer`, `double`, `Boolean`.

When `false`, all data is treated as a string.

#### `HEADER`

Type: `string`

Default: Empty string

(Optional) A string containing a comma separated list of column names.

When the TSV data include a header as the first row, `HEADER` property can be omitted. By omitting this property, it tells Upsolver that a header row can be found in the data and it will take the following actions:

1. Use the first row for column names
2. Skip the first row when processing the data

If the source data does not include a header as the first row, meaning the first row contains actual data, you must include the `HEADER` property when creating a `JOB`. This tells Upsolver to take the following actions:

1. Use the provided `HEADER` property for column names
2. Do not skip the first row since it contains data

If your data does not include a header row and you do not set a `HEADER` property when creating the job, Upsolver will assume the first row is a header and not process it.

#### `HEADER_LINE`

Type: `string`

Default: Empty string

(Optional) A string containing a comma-separated list of header names. This is an alternative to `HEADER`.

#### `NULL_VALUE`

Type: `text`

(Optional) Values in the TSV that match the provided value are interpreted as null.&#x20;

#### `MAX_COLUMNS`

Type: `integer`

(Optional) The number of columns to allocate when reading a row. Note that larger values may perform poorly.&#x20;

#### `ALLOW_DUPLICATE_HEADERS`

Type: `Boolean`

Default: `false`&#x20;

(Optional) When `true`, repeat headers are allowed. Numeric suffixes are added for disambiguation.

## JSON

```sql
CONTENT_TYPE = (
    TYPE = JSON
    [ SPLIT_ROOT_ARRAY = { TRUE | FALSE } ] 
    [ STORE_JSON_AS_STRING = { TRUE | FALSE } ]
)
```

#### `SPLIT_ROOT_ARRAY`

Type: `Boolean`

Default: `true`

(Optional) When `true`, a root object that is an array is parsed as separate events. When `false`, it is parsed as a single event that contains only an array.

#### `STORE_JSON_AS_STRING`

Type: `Boolean`

Default: `false`&#x20;

(Optional) When `true`, a copy of the original JSON is stored as a string value in an additional column.

## AVRO\_SCHEMA\_REGISTRY

{% hint style="info" %}
Note that only Avro schemas are currently supported.
{% endhint %}

```sql
CONTENT_TYPE = (
    TYPE = AVRO_SCHEMA_REGISTRY
    SCHEMA_REGISTRY_URL = '<url>'
)
```

#### `SCHEMA_REGISTRY_URL`

Type: `text`

Avro schema registry URL. To support schema evolution add `{id}` to the URL and Upsolver will embed the id from the AVRO header.&#x20;

For example, `https://schema-registry.service.yourdomain.com/schemas/ids/{id}`

## FIXED\_WIDTH

```sql
CONTENT_TYPE = (
    TYPE = FIXED_WIDTH
    [ COLUMNS =  ( (COLUMN_NAME = '<column_name>' 
                    START_INDEX = <integer> 
                    END_INDEX = <integer>) [,...] ) ]
    [ INFER_TYPES = { TRUE | FALSE } ]
)    
```

#### `COLUMNS`

Type: `list`

(Optional) An array of the name, start index, and end index for each column in the file.

#### `INFER_TYPES`

Type: `Boolean`

Default: `false`

(Optional) When `true`, each column's data type is inferred. When `false`, all data is treated as a string.

## REGEX

See [Java Pattern](https://docs.oracle.com/javase/7/docs/api/java/util/regex/Pattern.html) for more information.

```sql
CONTENT_TYPE = (
    TYPE = REGEX
    [ PATTERN = '<pattern>' ]
    [ MULTILINE = { TRUE | FALSE } ]
    [ INFER_TYPES = { TRUE | FALSE } ]
)
```

#### `PATTERN`

Type: `text`

(Optional) The pattern to match against the input. Named groups are extracted from the data.

#### `MULTILINE`

Type: `Boolean`

Default: `false`

(Optional) When `true`, the pattern is matched against the whole input. When `false`, it is matched against each line of the input.

#### `INFER_TYPES`

Type: `Boolean`

Default: `false`

(Optional) When `true`, each column's data types is inferred. When `false`, all data is treated as a string.

## SPLIT\_LINES

```sql
CONTENT_TYPE = (
    TYPE = SPLIT_LINES
    PATTERN = '<pattern>'
)
```

#### `PATTERN`

Type: `text`

(Optional) A regular expression pattern to split the data by. If left empty, the data is split by lines.

## XML

```sql
CONTENT_TYPE = (
    TYPE = XML
    [ STORE_ROOT_AS_STRING = { TRUE | FALSE } ]                 
)
```

#### `STORE_ROOT_AS_STRING`

Type: `Boolean`

Default: `false`&#x20;

(Optional) When `true`, a copy of the XML is stored as a string in an additional column.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/reference-1/sql-commands/jobs/create-job/ingestion/content-types.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
