> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/support-1/faqs/iceberg-cloud-storage-breakdown.md).

# Iceberg Cloud Storage Breakdown

## Background

Iceberg tables managed by Upsolver are made up of two types of files:

1. Iceberg Table Files - These files make up the iceberg table itself and are used by query engines when querying the table. These include metadata files and data files.&#x20;
2. Upsolver Files - As part of ingesting data into an Iceberg Table or Table Optimization tasks upsolver creates some intermediate, state, and statistics files. These are used to keep track of the internal processes and display information in the UI.

The Upsolver files may be inside the Iceberg table root folder or in a dedicated location depending on the job/table settings.

Below is a breakdown of the folders.

## Folder Structure Overview

### Table and Table Maintenance Related Storage

#### `<iceberg_table_root>/data/`

* **Purpose**: Stores the Iceberg table's data files.
* **Contents**: Includes data files from both the latest snapshot and older snapshots, which are required for Time Travel queries.
* **Retention Control**: The retention of historical snapshots can be managed via the following table properties:
  * `history.expire.max-snapshot-age-ms`: Maximum age of snapshots before they expire.
  * `history.expire.min-snapshots-to-keep`: Minimum number of snapshots to retain.

#### `<iceberg_table_root>/metadata/`

* **Purpose**: Stores the Iceberg table's metadata files.
* **Contents**: Similar to the `data/` folder, it contains metadata from both the latest and older snapshots used in Time Travel queries.
* **Retention Control**: Managed by the same properties used for data retention: `history.expire.max-snapshot-age-ms` and `history.expire.min-snapshots-to-keep`.

#### `<upsolver_storage_location>/tables/<table_id>/dangling_files_backup/`

* **Purpose**: Holds orphaned files found in the data or metadata folders that are not associated with any Iceberg snapshot.
* **Process**: Upsolver periodically (about once per day) checks for such files. If found, they are moved to this backup folder from the table's root location.
* **Typical Size:** Dangling files should not be very common under normal circumstances so this folder should not contain a lot of data usually.
* **Retention**: By default, files are retained for 7 days before permanent deletion. Files are considered orphans if they have not been used by any snapshots and are at least 3 days old.
* **Recovery**: If a file is incorrectly identified as orphaned, it can be restored by moving it back to the table's root folder.
* **Customization**: Retention settings can be changed, but reducing durations decreases recovery capability. Contact [Upsolver Support](https://support.upsolver.com) for assistance.

#### `<upsolver_storage_location>/tables/<table_id>/expired_files_backup/`

* **Purpose**: Stores files that are no longer referenced by any Iceberg table snapshots due to snapshot expiration.
* **Process**: When expiring snapshots, Upsolver first removes them from the Iceberg table metadata and moves files related to those snapshots into this folder.
* **Typical Size:** If new data is constantly streaming into the table, new snapshots will constantly be created and old ones expired. The size of the folder is relative to the volume of data expired by old snapshots.
* **Retention**: Files are retained here for 7 days before permanent deletion.
* **Customization**: These settings can be changed, but shorter retention periods may reduce recovery flexibility. Contact [Upsolver Support](mailto:support@upsolver.com) for changes.

#### `<upsolver_storage_location>/tables/<table_id>/used_files_index/`

* **Purpose**: An index of files used by the Iceberg table. This index is maintained by Upsolver to help identify which files are orphaned and which may be safely deleted when snapshots expire.
* **Retention:** Indefinite, retention will be added in the future

#### `<upsolver_storage_location>/tables/<table_id>/compaction_results/`

* **Purpose**: Stores details and information about completed compaction tasks. Each file in this folder belongs to a single compaction shard.
* **Retention:** Indefinite, retention will be added in the future

#### Static Files

#### `<upsolver_storage_location>/tables/<table_id>/used_files_groupings.json.gz`

* **Purpose**: A state file used while building the `used_files_index`.

#### `<upsolver_storage_location>/tables/<table_id>/iceberg_coordinator.json`

* **Purpose**: A state file used for planning which compactions to run.

#### `<upsolver_storage_location>/tables/<table_id>/recent_compactions.json`

* **Purpose**: Contains a list of recent compactions, which are used to display compaction statistics in the frontend and system tables.

#### `<upsolver_storage_location>/tables/<table_id>/statistics.json`

* **Purpose**: Contains Iceberg table statistics that are periodically collected.
* **Usage**: These statistics are displayed in the frontend for monitoring table performance.

### Job-Related Storage

#### `<upsolver_storage_location>/inputs/<job_id>/`

* **Purpose**: Contains files related to the job loading data into a table.
* **Retention**: Most of these files are ephemeral and will be deleted once the data is loaded and committed to the table. Specifically the metadata folder inside this folder is not ephemeral, see below for more details.
* **Folder Size**: The size of this folder will stabilize depending on the data volume being streamed into the table.

{% hint style="info" %}
To find out which job a specific `job_id` refers to, you can query the `system.information_schema.jobs` table.
{% endhint %}

#### `<upsolver_storage_location>/inputs/<job_id>/metadata/`

* **Purpose**: Stores statistics about data written to the table. This metadata is used by the system to discover schema information and by the frontend and system tables to display data statistics.
* **Retention Control**: The retention of these files can be controlled via the [`METADATA_RETENTION`](/content/reference-1/sql-commands/jobs.md) property of the job. Reducing retention will affect the display of data statistics in the frontend and system tables.

{% hint style="info" %}
**\<upsolver\_storage\_location>** may be **\<iceberg\_table\_root>** or in a dedicated location. This can be controlled via job / table settings such as **INTERMEDIATE\_STORAGE\_LOCATION** and **INTERMEDIATE\_STORAGE\_CONNECTION**. \
\
By default new tables should not place **\<upsolver\_storage\_location>** inside the table root. However, older tables/jobs were created this way by default.
{% endhint %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/support-1/faqs/iceberg-cloud-storage-breakdown.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
