> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/reference-1/monitoring/datasets/ingested-data/column.md).

# Column

To drill into the column-level statistics, click on the column name in the **Datasets** entity tree, or click on the linked column name in the **Written Data Statistics** table in the **Ingested Data** tab. This tab is visible for datasets ingested and transformed by Upsolver. &#x20;

## Data Type

This column type that was automatically inferred by Upsolver during ingestion, and is useful for comparing source and target schemas to ensure data is stored in the correct and expected type.

***

## Overview

The following information is provided for your selected column:

<figure><img src="/files/PwXdjfr4uFHoderiE1Kq" alt=""><figcaption><p>The <strong>Overview</strong> card provides column statistics to help you measure the health of your data.</p></figcaption></figure>

<table><thead><tr><th width="208">Measurement</th><th>Description</th></tr></thead><tbody><tr><td>Total Values</td><td>Total number of rows with a non NULL value.</td></tr><tr><td>Distinct Values</td><td>The count of distinct values within the column.</td></tr><tr><td>Density</td><td>The percentage of rows that have a value.</td></tr><tr><td>Density in Parent</td><td>The percentage of rows that have a value.</td></tr><tr><td>First Seen</td><td>The first date and time that data was written in this column.</td></tr><tr><td>Last Seen</td><td>The last date and time that data was written, or updated, in this column.</td></tr><tr><td>Min Value</td><td>The lowest value in the column in the dataset, available for string, date, and numerical data.</td></tr><tr><td>Max Value</td><td>The highest value in the column in the dataset, available for string, date, and numerical data.</td></tr></tbody></table>

## Written Rows Over Time

The **Written Rows Over Time** chart will help you become familiar with the volumes of data ingested, enabling you to determine an expected baseline. This visual displays the number of rows ingested over the selected timespan, enabling you to quickly discover spikes or drops in your data to troubleshoot unexpected volumes.&#x20;

Click on the **Lifetime** button to change the timespan displayed in the dataset report:

<figure><img src="/files/DNMYEe9JLW7hqdxfNUwH" alt=""><figcaption><p><strong>Written Rows Over Time</strong> displays the flow of data into your dataset so you can check for spikes and dips in volume. </p></figcaption></figure>

***

## Values by Frequency

The **Values** **by Frequency** card displays the top 10K distinct values in the selected column and the percentage of rows for each value that appears within the dataset. This is particularly helpful for columns written to dimensions, as the values here should match the values in your analytics target:

<figure><img src="/files/gx6POa2gujpLcvJkr2KO" alt=""><figcaption><p>The <strong>Values by Frequency</strong> card displays the distinct values in the column and percentage of each value within the dataset.</p></figcaption></figure>

For datasets written to the data lake, you can download the results in the **Values by Frequency** card to a column inspections CSV file to further investigate your data. Click on the **Download** icon in the top right-hand corner of the card to download your file. The file includes the list of distinct column values, the number of times each value appears in the dataset, and the percentage breakdown representing the density of the value within the column.  &#x20;

***

## Distribution

The column data type determines the label of the distribution card, either **String Length Distribution** for a string type column, or **Values Distribution** for date and numeric types.&#x20;

If you discover data that frequently falls outside of your remit, you can use [expectations](/content/articles-1/data/expectations.md) in your job to warn of errant rows, or eliminate them from your target dataset, thereby preventing bad data from polluting downstream analytics data.&#x20;

### String Length Distribution

The **Distribution** card enables you to easily discover anomalies or incorrect string data, for example if you are expecting a fixed-width character length, or values to be within a range.&#x20;

The **Min** and **Max** stats provide real-time visibility into the strings in your columns. If you have a column whereby the minimum string length should be 13, such as a barcode field, you can use this card to check ingested values.&#x20;

For example, the following chart displays a visual representation of the string length distribution for the values within an **address1** column:

<figure><img src="/files/R6By4fg8tSxM78SUPFGx" alt=""><figcaption><p>The <strong>String Length Distribution</strong> chart provides a visual representation of the string character length within the column.</p></figcaption></figure>

The chart instantly alerts us to any strings that may be too long or too short, indicating an an issue in our data.

### Values Distribution

The **Values Distribution** chart makes it easy to visualize your data and find anomalies and outliers. It is easy to see the spread of values across the range within your dataset and also discover the **Min** and **Max** values within the column.

In the following example, we can see that the minimum value for the **nettotal** column is **1,331.02**, and the maximum value in **4,457.55**. However, **48.8K** rows have a value of **0**. As this is the nettotal value, we may need to investigate further to check if this is correct. It might be that we have missing data, or simply that we have a high number of customers in an open shopping session that have not added anything to their basket yet. If we don't want rows with a **0** value for nettotal to reach our data warehouse, we can add an expectation to the job to filter out these rows:   &#x20;

<figure><img src="/files/lrL0oeZ9jGsWSVef5zh7" alt=""><figcaption><p>The <strong>Values Distribution</strong> chart shows the spread of values across the <strong>nettotal</strong> column.</p></figcaption></figure>


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/reference-1/monitoring/datasets/ingested-data/column.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
