> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/articles-1/data/optimization-processes-for-iceberg-tables-in-upsolver.md).

# Optimization Processes for Iceberg Tables in Upsolver

Upsolver employs several optimization processes to enhance the performance and manageability of Iceberg tables. These processes are designed to maintain efficient storage, ensure high query performance, and reduce operational overhead. Below are the key optimization processes performed by Upsolver:

### 1. Continuous Compaction

Compaction in Upsolver runs continuously and is specifically optimized for streaming data. The compaction process involves:

* **Monitoring and Selection**: Regularly checking for potential compaction opportunities.
* **Optimization Criteria**: Selecting compactions that offer the highest predicted query performance gains and cost reduction relative to the cost of performing the compaction.

This approach ensures that the Iceberg tables remain optimized for query performance without incurring unnecessary computational costs.

### 2. Snapshot Expiration

Iceberg operations generate new snapshots, which are available for user queries, enabling features such as time travel. However, storing these snapshots can lead to increased storage requirements. To manage this, Upsolver automatically cleans up old snapshots.

Users can configure the retention of snapshots using Iceberg table properties as detailed in the [Iceberg documentation](https://iceberg.apache.org/docs/1.5.0/configuration/#table-behavior-properties).

This clean-up process occurs every few hours, ensuring that only necessary snapshots are retained, thereby optimizing storage usage.

### 3. Dangling File Clean-up

During Iceberg operations, files may sometimes become unreferenced or "dangling". These files can accumulate, leading to increased storage costs. Upsolver addresses this by performing a daily clean-up of detected dangling files.&#x20;

* **Daily Clean-up**: Automatically seeking out and removing dangling files from the table's storage location.

This daily clean-up helps maintain a tidy and cost-effective storage environment.

### 4. Data Retention

Upsolver provides configurable data retention policies, allowing users to define how long data should be retained based on date partitions. This process involves:

* **Retention Configuration**: Setting a retention period based on a date partition in the table.
* **Automatic Deletion**: Automatically deleting data in partitions that fall outside the retention period.

This ensures that outdated data is removed in a timely manner, helping manage storage and maintain compliance with data governance policies.

By incorporating these optimization processes, Upsolver ensures that Iceberg tables are efficient, performant, and cost-effective, while also providing flexibility and control to customers.

{% hint style="info" %}
Read the guide on how to [Optimize Your Iceberg Tables](/content/how-to-guides-1/apache-iceberg/optimize-your-iceberg-tables.md) to learn how to leverage this functionality in Upsolver.
{% endhint %}


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/articles-1/data/optimization-processes-for-iceberg-tables-in-upsolver.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
