> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/reference-1/sql-commands/jobs/create-job/transformation/job-options/clickhouse.md).

# ClickHouse

## Job options

```sql
[ COMMENT = '<comment>' ]
[ COMPUTE_CLUSTER = <cluster_identifier> ]
[ START_FROM = { NOW | BEGINNING | timestamp } ]
[ END_AT = { NOW | timestamp } ]
[ RUN_INTERVAL = <integer> { MINUTE[S] | HOUR[S] | DAY[S] } ]
[ RUN_PARALLELISM = <integer> ]
```

**Jump to**

General job options:

* [COMMENT](/content/reference-1/sql-commands/jobs/create-job/transformation/job-options.md#comment-editable)
* [COMPUTE CLUSTER](/content/reference-1/sql-commands/jobs/create-job/transformation/job-options.md#compute_cluster-editable)
* [END\_AT](/content/reference-1/sql-commands/jobs/create-job/transformation/job-options.md#end_at-editable)
* [RUN\_INTERVAL](/content/reference-1/sql-commands/jobs/create-job/transformation/job-options.md#run_interval)
* [RUN\_PARALLELISM](#run_parallelism-editable)
* [START\_FROM](/content/reference-1/sql-commands/jobs/create-job/transformation/job-options.md#start_from)

## Prerequisites <a href="#prerequisites" id="prerequisites"></a>

* Ensure your [ClickHouse connection](/content/reference-1/sql-commands/connections/create-connection/clickhouse.md) is set up and accessible
* Pre-define your target ClickHouse table, keeping in mind that schema evolution is not currently supported. Fields that might not always have values should be marked as nullable. Learn more about nullable fields in [ClickHouse](https://clickhouse.com/docs/en/sql-reference/data-types/nullable).
* Currently tested with ClickHouse Server Version 24.1

### Creating a target table in ClickHouse

Before executing your job, the target table in ClickHouse needs to be predefined.&#x20;

Example:

```sql
CREATE TABLE default.orders
(
    `netTotal` Float64,
    `orderDate` DateTime64(3),
    `orderId` String,
    `orderType` Nullable(String),
    `taxRate` Nullable(Float64),
    `customer` Tuple(email String, firstName String, lastName String, address Tuple(address1 String, address2 Nullable(String), city String, postCode String, state String)),
    `data` Tuple(items Array(Tuple(category String, itemId Int64, name String, quantity Int64, unitPrice Float64))),
    `shippingInfo` Tuple(method Nullable(String), recipient Tuple(firstName Nullable(String), lastName Nullable(String)))
)
ENGINE = MergeTree
ORDER BY orderId
SETTINGS non_replicated_deduplication_window = 300;
```

### **Supported ClickHouse data types**

Upsolver supports writing to the following defined types:

* **Basic Types**:&#x20;
  * Integers (Int8, Int16, Int32, Int64, UInt8, UInt16, UInt32, UInt64)
  * Floating-Point Numbers (Float32, Float64)
  * Decimal Numbers (Decimal32, Decimal64, Decimal128)
  * Date and Time (Date, DateTime, DateTime64)
  * Strings (String, FixedString)
  * Bool
  * UUID
  * Enums (Enum8, Enum16)
  * LowCardinality
* **Complex Types**:&#x20;
  * Arrays
  * Tuples
  * Maps

### **Parallelism support**

To prevent data duplication during retries, we utilize a deduplication token that relies on the "deduplication window" setting. This setting must be defined for target tables and specifies the number of the most recently inserted blocks for which hash sums are stored. This mechanism restricts the number of concurrent insert operations by checking for duplicates using these hash sums.

The higher the deduplication window number, the greater the concurrency that can be safely supported, ensuring both data integrity and operational efficiency.

**Required Setting:**

* **Non-Replicated Tables:** `non_replicated_deduplication_window`
* **Replicated Tables:** `replicated_deduplication_window`<br>

### **Upsert support**

Selecting an engine such as `ReplacingMergeTree`, `CollapsingMergeTree`, or `VersionedCollapsingMergeTree` enables ClickHouse to perform upserts, where duplicates are merged based on the table's sorting key. While this ensures data integrity by preventing duplicate entries, it may limit parallelism as data must be inserted in sequence to maintain order.

***

## Examples

### Ingestion job

```sql
CREATE SYNC JOB copy_s3_to_clickhouse
    CONTENT_TYPE = PARQUET
    DATE_PATTERN = 'yyyy/MM/dd/HH/'
    START_FROM = BEGINNING
AS COPY FROM s3_connection 
    LOCATION = 's3://bucket_name/data/'
INTO clickhouse_connection.db_name.target_table;
```

### Transformation job

```sql
CREATE SYNC JOB insert_into_clickhouse
  START_FROM = BEGINNING
  RUN_INTERVAL = 1 MINUTE
  RUN_PARALLELISM = 20
AS INSERT INTO CLICKHOUSE clickhouse_conn.db_name.target_table 
  MAP_COLUMNS_BY_NAME
    SELECT 
      orders,
      MD5(customer_email) AS customer_id,
      ARRAY_SUM(data.items[].quantity * data.items[].price) AS net_total
    FROM default_glue_catalog.schema_name.clickhouse_staging
    WHERE TIME_FILTER();
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/reference-1/sql-commands/jobs/create-job/transformation/job-options/clickhouse.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.