> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/reference-1/monitoring/datasets.md).

# Datasets

Datasets serve as your gateway to performing data observability on your pipelines, and enable you to optimize and monitor the performance of your Apache Iceberg tables.

## Real-Time Data Observability

The Datasets tab in Upsolver provides essential insights into your data and tables, enabling you easily to uncover performance problems, and troubleshoot and diagnose data quality issues. These insights are available to everyone in your organization, meaning anyone can drill deep into the data statistics and observe the health of your data.

Whether you're a data engineer responding to end-user queries about the data lineage in your pipelines, or a consumer investigating the freshness of your data, the Datasets tab is your go-to location for data observability.&#x20;

Using Datasets, you can drill into source data stored in your staging tables in your data lake and lakehouse, and view the data in your analytics targets (if you have created a direct ingestion job, you will only see the target schema). Get the bigger picture and discover where your dataset fits into the wider ecosystem using the Lineage visuals to immediately understand the flow and connection between pipelines and data.

Datasets make it easy to compare the results in your target with the data from your source, so you can quickly trace back to uncover where problems first appeared. The written rows chart delivers instant insight into the volume of data flowing to your staging tables and targets, making spikes and dips in your data easy to identify. &#x20;

<figure><img src="/files/oouWD4mlSIZoLlMX4s79" alt=""><figcaption><p>The <strong>Schema</strong> tab provides instant visibility into your dataset.</p></figcaption></figure>

***

## Optimize Apache Iceberg Tables

Upsolver supports ingesting data to Apache Iceberg tables, and the optimization of your lakehouse for external tables for data that has not been ingested by Upsolver. Whether you are using Upsolver to create your pipelines or simply to optimize your existing lakehouse, you can view statistics for your Iceberg tables within Datasets. Here, you will discover the storage space savings and performance benefits you will gain when Upsolver has optimized your tables.&#x20;

When you ingest data to Iceberg using Upsolver, your tables are automatically tuned and compacted for you. If you have an existing lakehouse, you can select the tables you want Upsolver to optimize and compactions are run automatically to deliver continuous performant tables.&#x20;

<figure><img src="/files/Lb107Gb8PGMtHHiHedLu" alt=""><figcaption><p>Use the <strong>Table Statistics</strong> tab to view storage details on your Apache Iceberg tables.</p></figcaption></figure>

***

## Viewing Datasets

To open your datasets, click on the **Datasets** icon in the sidebar menu in Upsolver. You may want to expand the menu if it is collapsed by clicking on the **arrow icon** at the bottom of the menu. The entities tree then displays your datasets.

Expand a catalog in the tree to view the schemas and tables. The tree will only display schemas and tables that are ingestion targets for jobs created in Upsolver. Alternatively, use the **Search** box to find an object: you can search by schema or table name. Click the **cross icon** in the search box to clear your results and return to the default view.

From the entities tree, you can click on a schema name to view the details for the full dataset, or click on a column name to drill through to view the column level data. The system columns are included to provide you with full observability.&#x20;

Each dataset provides the following tabs:

* [Ingested Data](/content/reference-1/monitoring/datasets/ingested-data.md)
  * [Column](/content/reference-1/monitoring/datasets/ingested-data/column.md)
* [Lineage](/content/reference-1/monitoring/datasets/lineage.md)
* [Data Violations](/content/reference-1/monitoring/datasets/data-violations.md)
* [Statistics](/content/reference-1/monitoring/datasets/statistics.md) (applicable to Apache Iceberg tables only)
* [Maintenance](/content/reference-1/monitoring/datasets/maintenance.md) (applicable to Apache Iceberg tables only)
  * [Compactions](/content/reference-1/monitoring/datasets/maintenance/compactions.md)
  * [Expire Snapshots](/content/reference-1/monitoring/datasets/maintenance/expire-snapshots.md)
  * [Orphan Files](/content/reference-1/monitoring/datasets/maintenance/orphan-files.md)
* [Columns](/content/reference-1/monitoring/datasets/columns.md)
* [Partitions](/content/reference-1/monitoring/datasets/partitions.md)
* [Properties](/content/reference-1/monitoring/datasets/properties.md) (applicable to staging tables in your data lake)&#x20;


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/reference-1/monitoring/datasets.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
