> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/articles-1/data/schema-evolution/redshift-schema-evolution.md).

# Redshift Schema Evolution

When writing data from Upsolver to Amazon Redshift, schema evolution is enabled for the following job types:

1. **Ingestion Job**: Directly writing data to a Redshift table using the `COPY FROM` syntax.
2. **Transformation Job**: Using a `SELECT *` expression mapped to the root, for example:&#x20;

```sql
CREATE JOB redshift_job
...
INSERT INTO redshift_connection.schema_name.table_name MAP_COLUMNS_BY_NAME
  SELECT *
  FROM source_table
```

3. **Replication Job**: Continuously replicating CDC data to Redshift.

### Schema Evolution Overview

Schema evolution in Upsolver means that the system will dynamically create and modify Redshift tables to accommodate changes in the incoming schema. Specifically, Upsolver handles the addition of new columns when new data types are encountered.

### Type Mapping

Upsolver maps its primitive types to Redshift column types as follows:

<table><thead><tr><th width="202.33333333333331">Upsolver Type</th><th width="261">Redshift Column Type</th><th data-hidden>Expression</th></tr></thead><tbody><tr><td>DOUBLE</td><td>FLOAT8</td><td>TRY_CAST($v / 1000 as TIMESTAMP)</td></tr><tr><td>BOOLEAN</td><td>BOOLEAN</td><td>EXTRACT(EPOCH_MILLISECONDS FROM $v)</td></tr><tr><td>STRING</td><td>VARCHAR (max size 65535)</td><td>TRY_CAST(TRY_CAST($v * 86400 AS VARCHAR) AS DATE</td></tr><tr><td>TIMESTAMP</td><td>TIMESTAMP</td><td>TRY_CAST(EXTRACT(EPOCH_SECOND FROM $v) / 86400 AS BIGINT)</td></tr><tr><td>DATE</td><td>DATE</td><td></td></tr><tr><td>BIGINT</td><td>BIGINT</td><td></td></tr></tbody></table>

Nested fields and arrays are mapped to the Redshift type `SUPER`.

### Handling Conflicts

A key challenge in schema evolution is resolving data type conflicts. For instance, if a field initially ingested as `BIGINT` changes to `VARCHAR`, Upsolver must handle this discrepancy.

Upsolver resolves data type conflicts in the following way:

* If the new type is narrower than the original type e.g., varchar to bigint, the value will be inserted into the original column if casting is possible.
* If the new type is wider than the original type e.g. bigint to varchar, a new column of the new type will be created with a suffix indicating the new type e.g. `_varchar` for string data. The new value will be written to both the new column and the previous column if a cast is possible. For example, in the case of bigint to varchar, if the value of the varchar is `'2'`, it will also be written as `2` to the original bigint column.

### Example

1. **Initial Data**: `{"my_field": 1.2}`
   * Redshift Column: `my_field` of type `FLOAT8`.
2. **New Data**: `{"my_field": "str"}`
   * Redshift Columns:
     * `my_field` remains `FLOAT8`, with `NULL` for the new string data.
     * `my_field_varchar` is created to store the value "str".

### Special Cast Handling

<table><thead><tr><th width="136">From Type</th><th width="117">To Type</th><th>Cast Expression</th></tr></thead><tbody><tr><td>BigInt</td><td>Timestamp</td><td><code>TIMESTAMP 'epoch' + source_column * INTERVAL '0.001 second'</code></td></tr><tr><td>Timestamp</td><td>BigInt</td><td><code>EXTRACT(EPOCH FROM source_column)::BIGINT * 1000 + EXTRACT(MILLISECOND FROM source_column)</code></td></tr><tr><td>BigInt</td><td>Date</td><td><code>'1970-01-01'::DATE + source_column * INTERVAL '1 day'</code></td></tr><tr><td>Date</td><td>BigInt</td><td><code>(source_column - '1970-01-01'::DATE)::BIGINT</code></td></tr><tr><td>Float8</td><td>BigInt</td><td><code>CASE WHEN source_column >= ${Long.MinValue} AND source_column &#x3C;= ${Long.MaxValue} THEN source_column::BIGINT END</code></td></tr><tr><td>Super</td><td>Varchar</td><td><code>JSON_SERIALIZE(source_column)</code></td></tr><tr><td>Boolean</td><td>Varchar</td><td><code>CASE source_column WHEN true THEN 'true' ELSE 'false' END</code></td></tr><tr><td>Varchar</td><td>BigInt</td><td><code>CASE WHEN source_column ~ '^[0-9]+$$' THEN source_column::BIGINT ELSE NULL END</code></td></tr><tr><td>Varchar</td><td>Float8</td><td><code>CASE WHEN source_column ~ '^[+-]?[0-9]*\.?[0-9]+$$' THEN source_column::FLOAT8 ELSE NULL END</code></td></tr><tr><td>Varchar</td><td>Timestamp</td><td><code>CASE WHEN source_column ~ '^\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}$$' THEN source_column::TIMESTAMP ELSE NULL END</code></td></tr><tr><td>Varchar</td><td>Date</td><td><code>CASE WHEN source_column ~ '^\d{4}-\d{2}-\d{2}$$' THEN source_column::DATE ELSE NULL END</code></td></tr><tr><td>Varchar</td><td>Boolean</td><td><code>CASE LOWER(source_column) WHEN 'true' THEN true WHEN 'false' THEN false END</code></td></tr></tbody></table>

Other types will be cast using Redshift's `CAST` expression.

The following type tuples will not be cast:

* Float8 - Timestamp
* Float8 - Date
* Float8 - Boolean
* Boolean - Date
* Boolean - Timestamp


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/articles-1/data/schema-evolution/redshift-schema-evolution.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
