> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/articles-1/data/column-transformations.md).

# Column Transformations

If transformations must be applied prior to data landing in your target, you can use this option to perform data transformations during ingestion. When ingesting into the data lake, it is recommended that you only apply essential transformations, such as protecting PII, as it is easier to make amendments or corrections at a later date if the data remains in its raw state and instead use a transformation job to apply modifications. Therefore, as a general rule, you should only transform data that must be modified before it reaches the target. &#x20;

#### Flexible data shaping

However, transformations provide the flexibility to shape your data before it lands in the target. You can use all the functions and operators supported by Upsolver to create calculated fields within your ingestion job. New columns can be added to your target, and existing column data can be transformed. You can perform actions such as converting data types, formatting string values, and concatenating columns to create a new column.   &#x20;

For example, in the following job, the **ordertype** type column has a transformation applied to convert all values to upper case to maintain consistency in the target table in Snowflake. Furthermore, the **review** column has a right trim function applied to remove trailing spaces that commonly appear in this column:&#x20;

```sql
CREATE SYNC JOB ingest_s3_to_snowflake
   COMMENT = 'Load orders into Snowflake'
   CONTENT_TYPE = JSON
   COLUMN_TRANSFORMATIONS = (ordertype = UPPER(ordertype), review = RTRIM(review)) 
   COMMIT_INTERVAL = 10 MINUTES
AS COPY FROM S3 upsolver_s3_samples 
   LOCATION = 's3://upsolver-samples/orders/' 
INTO SNOWFLAKE my_snowflake_connection.demo.orders_transformed;
```

#### Protect sensitive data

If you need to mask sensitive or personally identifiable information (PII) prior to loading into your staging tables or when performing direct ingestion into your target destination, you can use hashing functions to prevent data from being exposed downstream.&#x20;

In the following example, the `MD5()` masking function is applied to the **customer.email** column to hide the raw data, and transformed into a new column named **hashed\_email** in the target.&#x20;

```sql
CREATE SYNC JOB ingest_kinesis_to_snowflake
    COMMENT = 'Ingest orders to Snowflake'
    START_FROM = BEGINNING
    CONTENT_TYPE = JSON
    EXCLUDE_COLUMNS = ('customer.email') 
    COLUMN_TRANSFORMATIONS = (hashed_email = MD5(customer.email))       
    COMMIT_INTERVAL = 5 MINUTES
AS COPY FROM KINESIS my_kinesis_connection 
    STREAM = 'orders'  
INTO SNOWFLAKE my_snowflake_connection.demo.orders_transformed;
```

In the example above, the customer's email address has been transformed into the new column, **hashed\_email**. However, the existing column - **customer.email** - would also be included in the ingestion and loaded into Snowflake in its raw format.&#x20;

So in this case, the `EXCLUDE_COLUMNS` option has been included and the **customer.email** column does not get ingested to the target, thereby protecting this information.&#x20;

***

{% hint style="success" %}
**Learn More**

See the [Functions](/content/reference-1/functions-and-operators/functions.md) and [Operators](/content/reference-1/functions-and-operators/operators.md) references for a full list of transformations that you can apply to your in-flight data.
{% endhint %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/articles-1/data/column-transformations.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
