> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/reference-1/sql-commands/jobs/create-job/ingestion/mysql.md).

# MySQL

## Syntax

```sql
CREATE JOB <job_name>
    [{ job_options }]
AS COPY FROM MYSQL <connection_identifier>
    [{ source_options }]
INTO <table_identifier>
[ WITH EXPECTATION <exp_name> EXPECT <sql_predicate> ON VIOLATION { DROP | WARN } ];
```

#### **Jump to**

* [Job options](#job-options)
* [Source options](#source-options)

## Job options

The following job properties configure the behavior of the ingestion job.

{% code overflow="wrap" %}

```sql
[ COLUMN_TRANSFORMATIONS = (<column> = <expression>, ...) ]
[ COMMENT = '<comment>' ]
[ COMPUTE_CLUSTER = <cluster_identifier> ]  
[ DDL_FILTER = ('<ddl_expression>', ...) ]
[ END_AT = { NOW | <timestamp> } ]
[ EXCLUDE_COLUMNS = ( <col>, ...) ]   
[ PARSE_JSON_COLUMNS = { TRUE | FALSE } ]    
[ SKIP_SNAPSHOTS = { TRUE | FALSE } ]
[ SNAPSHOT_PARALLELISM = <integer> ]
```

{% endcode %}

#### Jump to

MySQL job options:

* [`DDL FILTERS`](#ddl_filters)
* [`PARSE_JSON_COLUMNS`](#parse_json_columns)
* [`SKIP_SNAPSHOTS`](#skip_snapshots-editable)
* [`SNAPSHOT_PARALLELISM`](#snapshot_parallelism)

General job options:

* [`COLUMN_TRANSFORMATIONS`](/content/reference-1/sql-commands/jobs/create-job/ingestion.md#column_transformations)&#x20;
* [`COMMENT`](/content/reference-1/sql-commands/jobs/create-job/ingestion.md#comment-editable)&#x20;
* [`COMPUTE_CLUSTER`](/content/reference-1/sql-commands/jobs/create-job/ingestion.md#compute_cluster-editable)&#x20;
* [`END_AT`](/content/reference-1/sql-commands/jobs/create-job/ingestion.md#end_at-editable)&#x20;
* [`EXCLUDE_COLUMNS`](/content/reference-1/sql-commands/jobs/create-job/ingestion.md#exclude_columns)

See also:

* [`WITH EXPECTATION`](broken://spaces/j7NDvdvbkHS8wtX8aREE/pages/Y2QSVaw0n1NMmbDpVePG)&#x20;

#### `DDL_FILTERS`

Type: `array[string]`

Default: `''`&#x20;

(Optional) Comma-separated list of DDL expressions that the job will ignore when there are errors from the Debezium engine.

#### `PARSE_JSON_COLUMNS`

Type: `Boolean`

Default: `false`

If enabled, Upsolver will parse JSON columns into a struct matching the JSON value.

#### `SKIP_SNAPSHOTS`— editable

Type: `Boolean`

Default: `false`

(Optional) By default, snapshots are enabled for new tables. This means that Upsolver will take a full snapshot of the table(s) and ingest it into the staging table before it continues to listen for change events. When set to `True`, Upsolver will not take an initial snapshot and only process change events starting from the time the ingestion job is created.

In the majority of cases, when you connect to your source tables, you want to take a full snapshot and ingest it as the baseline of your table. This creates a full copy of the source table in your data lake before you begin to stream the most recent change events. If you skip taking a snapshot, you will not have the historical data in the target table, only the newly added or changed rows.

Skipping a snapshot is useful in scenarios where your primary database instance crashed or became unreachable, failing over to the secondary. In this case, you will need to re-establish the CDC connection but would not want to take a full snapshot because you already have all of the history in your table. In this case, you would want to restart processing from the moment you left off when the connection to the primary database went down.

#### `SNAPSHOT_PARALLELISM`

Type: `int`&#x20;

Default: `1`&#x20;

(Optional) Configures how many snapshots are performed concurrently. The more snapshots performed concurrently, the quicker the tables are streaming. However, doing more snapshots in parallel increases the load on the source database.

## Source options

The following data source properties configure how to replicate data from MySQL.&#x20;

```sql
[ TABLE_INCLUDE_LIST = ('regexFilter1', 'regexFilter2') ]
[ COLUMN_EXCLUDE_LIST = ('regexFilter1', 'regexFilter2') ]
```

#### Jump to

* [`TABLE_INCLUDE_LIST`](#table_include_list-editable)
* [`COLUMN_EXCLUDE_LIST`](#column_exclude_list-editable)

#### `TABLE_INCLUDE_LIST` — `editable`

Type: `text`

Default: `''`&#x20;

(Optional) Comma-separated list of regular expressions that match fully-qualified table identifiers of tables whose changes you want to capture. This maps to the Debezium [table.include.list](https://debezium.io/documentation/reference/2.0/connectors/mysql.html#mysql-property-table-include-list) property.&#x20;

By default, the connector captures changes in every non-system table in all databases. To match the name of a table, Upsolver applies the regular expression that you specify as an *anchored* regular expression. That is, the specified expression is matched against the entire name string of the table.  It does not match substrings that might be present in a table name.

Each RegEx pattern matches against the full string `databaseName.tableName`, for example:

<table><thead><tr><th width="297">RegEx Pattern</th><th>Results</th></tr></thead><tbody><tr><td>db_name.*</td><td>Select all tables under <code>db_name</code> database</td></tr><tr><td>db_name.users, db_name.items</td><td>Selects tables <code>users</code> and <code>items</code> under <code>db_name</code> database</td></tr><tr><td>db1.items_.*</td><td>Selects all tables from <code>db1</code>, that start with <code>items_</code></td></tr></tbody></table>

#### `COLUMN_EXCLUDE_LIST` — editable

Type: `text`

Default: `''`

(Optional) Comma-separated list of regular expressions that match the fully-qualified names of columns to exclude from change event record values. This maps to Debezium [column.exclude.list](https://debezium.io/documentation/reference/2.0/connectors/mysql.html#mysql-property-column-exclude-list) property.

By default, the connector matches all columns of the tables listed in `TABLE_INCLUDE_LIST`. To match the name of a column, Upsolver applies the regular expression that you specify as an *anchored* regular expression. That is, the specified expression is matched against the entire name string of the column; it does not match substrings that might be present in a column name.

Each RegEx pattern matches against the full string `databaseName.tableName.columnName`, for example:

<table><thead><tr><th width="326">RegEx Pattern</th><th>Results</th></tr></thead><tbody><tr><td>db.users.address_.*</td><td>Selects all of the columns that start with <code>address_</code> in the <code>users</code> table of database <code>db</code>.</td></tr><tr><td>db.*.(.*_pii)</td><td>Selects all of the columns ending with <code>_pii</code> across all tables within <code>db</code> database.</td></tr></tbody></table>

## Examples

### Ingest multiple tables

The following job creates a synchronized job that replicates the data from the **samples.orders** and **samples.customers** tables into the target table in the data lake. Use the `TABLE_INCLUDE_LIST` source option to specify which data to ingest.

```sql
CREATE SYNC JOB replicate_mysql_tables
AS COPY FROM MYSQL upsolver_mysql_samples 
      TABLE_INCLUDE_LIST = ('samples.orders', 'samples.customers')
INTO default_glue_catalog.upsolver_samples.orders_raw_data;
```

### Disable the initial snapshot

This example uses the `SKIP_SNAPSHOTS` option to instruct the job not to take a snapshot. Setting this value to `TRUE` ensures that only events that arrive from the time when the job starts, will be ingested. All historical data is ignored.

```sql
CREATE SYNC JOB copy_all_table_no_history_job
   SKIP_SNAPSHOTS = TRUE
AS COPY FROM MYSQL upsolver_mysql_samples
INTO default_glue_catalog.upsolver_samples.raw_cdc_tables;
```

### Exclude columns from ingestion

You can use the `TABLE_INCLUDE_LIST` to specify the tables you want to ingest. In the example below, the job will copy data from the **customers** and **orders** tables. However, personally identifiable information is stored in these tables, and this data should not be held in the staging or target tables as this would violate privacy laws. The `COLUMN_EXCLUDE_LIST` option enables the columns containing sensitive information to be excluded from the ingestion and, in this case, **credit\_card** and **customer address** data are ignored from the process. This option could equally be used to remove extraneous data.

```sql
CREATE SYNC JOB copy_table_exclude_cols
AS COPY FROM MYSQL upsolver_mysql_samples
     TABLE_INCLUDE_LIST = ('db.customers', 'db.orders')
     COLUMN_EXCLUDE_LIST = ('db.*.credit_card', 'db.customers.address_.*')
INTO default_glue_catalog.upsolver_samples.raw_cdc_tables;
```


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/reference-1/sql-commands/jobs/create-job/ingestion/mysql.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
