> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/quickstarts-1/jobs/ingestion/stream-and-file-sources/amazon-s3.md).

# Amazon S3

{% hint style="success" %}
**Prerequisites**

Ensure that you have an [Amazon S3](/content/reference-1/sql-commands/connections/create-connection/amazon-s3.md) connection with the correct permissions to read from your intended bucket.

Additionally, if you are ingesting to the data lake, you need a metastore connection that can be used to create a staging table as well as a corresponding storage connection that can be used to store your table's underlying files.
{% endhint %}

## Create a job that reads from Amazon S3

You can create a job to ingest your data from S3 into a staging table in the data lake or ingest directly into your target.&#x20;

**Jump to**

* [Ingest to the data lake](#ingest-to-the-data-lake)
* [Ingest directly to the target](#ingest-directly-to-the-target)
* [Job options](#job-options)

### **Ingest to the data lake**&#x20;

After completing the prerequisites, you can create your staging tables. The example below creates a table without defining columns or data types, as these will be inferred automatically by Upsolver, though you can define columns if required:&#x20;

```sql
CREATE TABLE default_glue_catalog.upsolver_samples.orders_raw_data()
    PARTITIONED BY $event_date;
```

Upsolver recommends partitioning by the system column `$event_date` or another date column within the data in order to optimize your query performance.

Next, you can create an ingestion job as follows:

```sql
CREATE SYNC JOB load_orders_raw_data_from_s3
   CONTENT_TYPE = JSON
AS COPY FROM S3 upsolver_s3_samples 
   LOCATION = 's3://upsolver-samples/orders/' 
INTO default_glue_catalog.upsolver_samples.orders_raw_data;
```

{% hint style="warning" %}
Note that multiple ingestion jobs can write to the same table, resulting in a final table that contains a `UNION ALL` of all data copied into that table. This means that any duplicate rows that are written are not removed and the columns list may expand if new columns are detected.

This may not be your intended behavior, so ensure you are writing to the correct table before running your job.
{% endhint %}

The example above only uses a small subset of all job options available when reading from Amazon S3. Depending on your use case, you may want to configure a different set of options. For instance, if you're reading from a folder partitioned by date, you may want to use the `DATE_PATTERN` option.

### Ingest directly to the target

Directly ingesting your data enables you to copy your data straight into the target system, bypassing the need for a staging table. The syntax and job options are identical to ingesting into a staging table, however, the target connector differs:

```sql
CREATE SYNC JOB ingest_s3_to_snowflake
   CONTENT_TYPE = JSON
AS COPY FROM S3 upsolver_s3_samples 
   LOCATION = 's3://upsolver-samples/orders/' 
INTO SNOWFLAKE my_snowflake_connection.demo.orders_transformed;
```

### Job options

Transformations can be applied to your ingestion job to correct issues, exclude columns, or mask data before it lands in the target. Furthermore, you can use expectations to define data quality rules on your data stream and take appropriate action.&#x20;

## Alter a job that reads from Amazon S3

Some job options are considered mutable, enabling you to run a SQL command to alter an existing ingestion job rather than create a new job. The job options apply equally to jobs that ingest into the data lake or directly to the target and the syntax to alter a job is identical.

For example, take the job we created earlier:

```sql
CREATE SYNC JOB load_orders_raw_data_from_s3
   CONTENT_TYPE = JSON
AS COPY FROM S3 upsolver_s3_samples 
   LOCATION = 's3://upsolver-samples/orders/' 
INTO default_glue_catalog.upsolver_samples.orders_raw_data;
```

If you want to keep the job as is, but only change the cluster that is running the job, execute the following command:

```sql
ALTER JOB load_orders_raw_data_from_s3 
    SET COMPUTE_CLUSTER = my_new_cluster;
```

Note that some options such as `COMPRESSION` cannot be altered once the connection has been created.

## Drop a job that reads from Amazon S3

If you no longer need a job, you can easily drop it using the following SQL command. This applies to jobs that ingest into the data lake and directly into the target:

```sql
DROP JOB load_orders_raw_data_from_s3;
```

***

{% hint style="success" %}
**Learn More**

To learn about the available job options, see the [Ingestion](/content/reference-1/sql-commands/jobs/create-job/ingestion.md) jobs page, which describes each option in detail and includes examples.&#x20;

To check which job options are mutable, see [Amazon S3](/content/reference-1/sql-commands/connections/create-connection/amazon-s3.md).
{% endhint %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/quickstarts-1/jobs/ingestion/stream-and-file-sources/amazon-s3.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
