> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/quickstarts-1/data-ingestion-wizard/using-the-wizard/source-set-up/amazon-s3.md).

# Amazon S3

## Step 1 - Connect to Amazon S3

### **Create a new connection**

Click **Create a new connection**, if it is not already selected. In the **Name your connection**, type in the name you want to give to this connection.

For the **Authentication Method**, select either the **Role-based** or the **AccessKey/SecretKey** option:

#### Role-based

Upsolver recommends that you use **Role-based** access.

* To define the correct permissions for the role, follow the [Amazon S3 access configuration](/content/how-to-guides-1/connectors/configure-access/amazon-s3.md) guide to create an IAM policy.
* If your S3 bucket runs on a different AWS account than the one running Upsolver, you need to create trust between the role and the account running Upsolver. Follow the [Role-Based AWS Credentials](/content/how-to-guides-1/setup/deploy-upsolver-on-aws/role-based-aws-credentials.md) guide to create a trusted **AWS Role** and find your **External Id**.

#### **AccessKey/SecretKey**

To create your **Access key id** and **Secret access key**, follow the [AWS Account and Access Keys guide](https://docs.aws.amazon.com/powershell/latest/userguide/pstools-appendix-sign-up.html).

**Encryption Key**

By default, Upsolver uses the default encryption defined in the AWS bucket to read the files. Alternatively, you can provide the Base64 text representation of the encryption key to use or an ARN for an existing AWS KMS key.

When you have entered your authentication information, click **Test Connection**.

<figure><img src="/files/6EIHwacZzN8uwAMK7ZTW" alt=""><figcaption><p>Create a new Amazon S3 connection to use as your ingestion source.</p></figcaption></figure>

### Use an existing connection

By default, if you have already created a connection, Upsolver selects **Use an existing connection**, and your Amazon S3 connection is populated in the list.&#x20;

For organizations with multiple connections, select the source connection you want to use.&#x20;

<figure><img src="/files/5949jYOV5BPpjYWZwUpC" alt=""><figcaption><p>Select an existing Amazon S3 connection for your ingestion job.</p></figcaption></figure>

## Step 2 - Select a source location to ingest from

When the connection is established, Upsolver will attempt to list your buckets if the **s3:ListAllMyBuckets** permission was provided by the connection above. As an alternative, you can specify the name of your bucket, e.g. **s3://upsolver-samples**.

Next, you can optionally **Select the location to read the files from**. Leave this empty to ingest the entire bucket.

To specify the file types to ingest, choose an option from the **Select the file's content type / Parse files using** list, e.g. JSON, CSV, Parquet. This list defaults to **Automatic**.&#x20;

### Advanced options

**Select the file name pattern for the files you would like to ingest**

Upsolver ingests all files in the selected location by default. To change this option, in the list, select **Ingest files matching a regular expression**.&#x20;

#### Load files by partition using a date pattern

If your source files are partitioned by a date pattern, Upsolver can load existing and new files using the pattern. This affects the order of files loaded and avoids delays when many changes occur across the bucket.

By default, Upsolver will list and ingest files in the ingest job’s bucket and folder as soon as they are discovered. When you set a date pattern, Upsolver uses the date in the folder path to understand when new files are added. The date in the path is used to process data in order of arrival. If files are added to a folder named with a future date, these files will not be ingested until that date becomes the present.

**Delete the source files following ingestion**

To discover new files, when a date pattern is not set, Upsolver lists the top-level prefix and performs a diff to detect newly created files. It then lists the paths adjacent to these newly added files and assumes that if a file was added here, others will be as well. This process is performed at regular intervals to ensure files are not missed.

For buckets with few files and predictable changes, this works well. However, for buckets with many changes across millions of files and hundreds of prefixes, the scanning and diffing process may result in ingestion and processing delays.

To optimize this process, consider setting the **Delete the source files following ingestion** option to **TRUE**. This moves ingested files to another staging location, leaving the source folder empty and making it easier and faster for Upsolver to discover new files. Be aware that configuring Upsolver to move ingested files could impact other systems if they depend on the same raw files.

<figure><img src="/files/S9GbPJHR4Rb2Oh88x7zU" alt=""><figcaption><p>Amend the Advanced Options to configure your ingestion job.</p></figcaption></figure>

## Step 3 - Check that files are read successfully

When you select a bucket and folder, Upsolver will attempt to load a sample of the files.&#x20;

If Upsolver did not load any sample files, try the following:

1. Verify that the location on your bucket contains files.
2. Select a [content type](#step-2-select-a-source-location-to-ingest-from) that matches the content type of your stream.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/quickstarts-1/data-ingestion-wizard/using-the-wizard/source-set-up/amazon-s3.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
