> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/reference-1/sql-commands/jobs/create-job/ingestion/mongodb.md).

# MongoDB

## Syntax

```sql
CREATE JOB <job_name>
    [{ job_options }]
AS COPY FROM MONGODB <connection_identifier>
    [{ source_options }]
    INTO <table_identifier>
 WITH EXPECTATION <exp_name> EXPECT <sql_predicate> ON VIOLATION { DROP | WARN } ];
```

**Jump to**

* [Job options](#job-options)
* [Source options](#source-options)

## Job options

The following job properties configure the behavior of the ingestion job.

{% code overflow="wrap" %}

```sql
[ COLUMN_TRANSFORMATIONS = (<column> = <expression>, ...) ]
[ COMMENT = '<comment>' ]
[ COMPUTE_CLUSTER = <cluster_identifier> ]  
[ END_AT = { NOW | <timestamp> } ]
[ EXCLUDE_COLUMNS = ( <col>, ...) ] 
[ SKIP_SNAPSHOTS = { TRUE | FALSE } ]
[ SNAPSHOT_PARALLELISM = <integer> ]
```

{% endcode %}

MongoDB job options:

* [`SKIP_SNAPSHOTS`](#skip_snapshots-editable)
* [`SNAPSHOT_PARALLELIS`](#snapshot_parallelism)

General job options:

* [`COLUMN_TRANSFORMATIONS`](/content/reference-1/sql-commands/jobs/create-job/ingestion.md#column_transformations)&#x20;
* [`COMMENT`](/content/reference-1/sql-commands/jobs/create-job/ingestion.md#comment-editable)&#x20;
* [`COMPUTE_CLUSTER`](/content/reference-1/sql-commands/jobs/create-job/ingestion.md#compute_cluster-editable)&#x20;
* [`END_AT`](/content/reference-1/sql-commands/jobs/create-job/ingestion.md#end_at-editable)&#x20;
* [`EXCLUDE_COLUMNS`](/content/reference-1/sql-commands/jobs/create-job/ingestion.md#exclude_columns)

See also:

* [`WITH EXPECTATION`](broken://spaces/j7NDvdvbkHS8wtX8aREE/pages/Y2QSVaw0n1NMmbDpVePG)

#### `SKIP_SNAPSHOTS`— editable

Type: `Boolean`

Default: `false`

(Optional) By default, snapshots are enabled for new collections. This means that Upsolver will take a full snapshot of the collection and ingest it into the staging table before it continues to listen for change events. When `True`, Upsolver will not take an initial snapshot and only process change events starting from the time the ingestion job is created.

In the majority of cases, when you connect to your source collections, you want to take a full snapshot and ingest a baseline of your data. This creates a full copy of the source data in your data lake before you begin to stream the most recent change events. If you skip taking a snapshot, you will not have the historical data in the target table, only the newly added or changed rows.

Skipping a snapshot is useful in scenarios where your primary database instance crashed or became unreachable, failing over to the secondary. In this case, you will need to re-establish the CDC connection but would not want to take a full snapshot because you already have all of the history in your table. In this case, you would want to restart processing from the moment you left off when the connection to the primary database went down.

#### `SNAPSHOT_PARALLELISM`&#x20;

Type: `integer`&#x20;

Default Value: `1`&#x20;

(Optional) Configures how many snapshots are performed concurrently. The more snapshots performed concurrently, the quicker it is to have all collections streaming. However, doing more snapshots in parallel increases the load on the source database.

## Source options

```sql
[ COLLECTION_INCLUDE_LIST = ( '..' [ , '...' ] ) ]
```

#### `COLLECTION_INCLUDE_LIST` — editable

Type: `text`

Default: `''`&#x20;

(Optional) Comma-separated list of regular expressions that match fully-qualified namespaces of collections whose changes you want to capture. Collections not included in this list will not be loaded. If the list is left empty all collections will be loaded. This maps to the Debezium [collections.include.list](https://debezium.io/documentation/reference/stable/connectors/mongodb.html#mongodb-property-collection-include-list) property.&#x20;

By default, the connector captures changes in every non-system collection in all databases. To match the name of a collection, Upsolver applies the regular expression that you specify as an *anchored* regular expression, so that the specified expression is matched against the name string of the namespace; it does not match substrings in the name.

Each RegEx pattern matches against the full string `databaseName.CollectionName`, for example:

<table><thead><tr><th width="287.3333333333333">RegEx Pattern</th><th>Results</th></tr></thead><tbody><tr><td>db_name.*</td><td>Select all collections from the <code>db_name</code> database.</td></tr><tr><td>db_name.users, db_name.items</td><td>Select the <code>users</code> and <code>items</code> collections from the <code>db_name</code> database.</td></tr><tr><td>db1.items_.*</td><td>Select all collections from <code>db1</code> that start with <code>items_</code>.</td></tr></tbody></table>

## Examples

### Ingest data into the data lake

The following example creates a job to ingest data from MongoDB into a table in the data lake.&#x20;

```sql
CREATE SYNC JOB load_raw_data_from_mongodb
AS COPY FROM MONGODB my_mongodb_connection
INTO default_glue_catalog.upsolver_samples.orders_raw_data;
```

### Ingest data with additional options

The above example shows you how to create a job using minimal code, however, you can use job and source options to enhance your job. In the following example, additional options are included:

```sql
CREATE SYNC JOB load_raw_data_from_mongodb
  COLUMN_TRANSFORMATIONS = (email = MD5(email))
  COMMENT = 'Ingest CDC data from MongoDB'
AS COPY FROM MONGODB my_mongodb_connection
  COLLECTION_INCLUDE_LIST = ('sales.customers', 'sales.orders')
INTO default_glue_catalog.upsolver_samples.orders_raw_data;
```

The `COLUMN_TRANSFORMATION` job option is included to mask the value in the **email** column to protect PII. Furthermore, the `COLLECTION_INCLUDE_LIST` source option limits the ingested data to the **customers** and **orders** data in the **sales** database.&#x20;


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/reference-1/sql-commands/jobs/create-job/ingestion/mongodb.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
RegEx Pattern	Results
db_name.*	Select all collections from the `db_name` database.
db_name.users, db_name.items	Select the `users` and `items` collections from the `db_name` database.
db1.items_.*	Select all collections from `db1` that start with `items_`.