> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/reference-1/monitoring/job-status/stream-and-file-sources/monitoring-v1/data-scanned.md). # Data Scanned ### Rows read (completed executions) {% tabs %} {% tab title="Metric" %}


Metric type	Informational
About this metric	The total number of rows scanned by completed executions today. This is a measure of rows that were processed successfully.
Timeframe	Today (midnight UTC to now)

#### More Information This informative metric shows accumulative progress. If the value is 0, then the job has not yet processed anything or has not started. If [Job executions completed - today](/content/reference-1/monitoring/job-status/stream-and-file-sources/monitoring-v1/job-execution-status.md#job-executions-completed-today) and [Job executions completed - lifetime ](/content/reference-1/monitoring/job-status/stream-and-file-sources/monitoring-v1/job-execution-status.md#job-executions-completed-lifetime)is greater than 0, but the rows scanned in completed executions are 0, then your source doesn't contain any data. This should increase in line with the number of completed executions. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT STRING_FORMAT('{0,number,#,###}', rows_scanned_by_completed_tasks_today) AS rows_scanned_by_completed_tasks_today, 'OK' AS rows_scanned_by_completed_tasks_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% endtabs %} ### Rows filtered by WHERE clause {% tabs %} {% tab title="Metric" %}


Metric type	Warning
About this metric	The number of rows that were filtered out because they didn’t pass the WHERE clause predicate defined in the job.
Limits	Error when > 0 AND equal to the Max rows scanned in an execution
Timeframe	Today (midnight UTC to now)

#### More information The number of rows that were filtered out because some or all of the primary key columns were NULL. If this behavior is intended, the rows can be filtered out in the `WHERE` clause. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT STRING_FORMAT('{0,number,#,###}', rows_filtered_by_where_clause_today) AS rows_filtered_by_where_clause_today, IF_ELSE(rows_filtered_by_where_clause_today > 0 AND rows_filtered_by_where_clause_today >= rows_scanned_by_completed_tasks_today, 'ERROR', 'OK') AS rows_filtered_by_where_clause_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% tab title="Troubleshooting" %} All rows were filtered out by the `WHERE` clause. Please evaluate the `WHERE` clause of your job's `SELECT` statement to confirm it isn't filtering out too many events. Click on **Job Details** to view the query used to create your job. You can see the `WHERE` clause and rewrite the predicates to adjust the rows explicitly filtered out. {% endtab %} {% endtabs %} ### Average rows scanned per execution {% tabs %} {% tab title="Metric" %}


Metric type	Informational
About this metric	The average number of rows scanned per job execution.
Timeframe	Today (midnight UTC to now)

#### More information This informational metric explains how much work is taking place within each execution. If this number is low, then there is a lot of overhead for a single execution, or if high, it may indicate that you have high latency. There is no target value for this metric, however, it should be viewed in comparison with your expectations of how much work should be done in each execution. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT STRING_FORMAT('{0,number,#,###.##}', avg_rows_scanned_per_execution_today) AS avg_rows_scanned_per_execution_today, 'OK' AS avg_rows_scanned_per_execution_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% endtabs %} ### Maximum rows scanned in an execution {% tabs %} {% tab title="Metric" %}


Metric type	Warning
About this metric	The maximum number of rows scanned in a single job execution today.
Limits	Warn when > 1,000,000 AND 10 * Average rows scanned per job execution today
Timeframe	Today (midnight UTC to now)

#### More information In streaming data, data should arrive at a fixed cadence. This means you should not experience a cycle of seeing a spike of data arriving, and then no work. This value should be similar to [Average rows scanned per job execution](#average-rows-scanned-per-job-execution) to ensure spikes and dips are not happening, and some jobs are not working harder than other executions. A big difference between the two may be indicative of performance and latency issues. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT STRING_FORMAT('{0,number,#,###}', max_rows_scanned_in_execution_today) AS max_rows_scanned_in_execution_today, IF_ELSE(max_rows_scanned_in_execution_today > 1000000 AND max_rows_scanned_in_execution_today > avg_rows_scanned_per_execution_today * 10, 'WARNING', 'OK') AS max_rows_scanned_in_execution_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% tab title="Troubleshooting" %} The amount of data scanned by a single execution exceeds the historical normal. Consider checking that the extra data processed is intentional. If the new data volume is expected, ensure the cluster is sized appropriately. {% endtab %} {% endtabs %} ### Rows Pending Processing {% tabs %} {% tab title="Metric" %}


Metric type	Informational
About this metric	The number of rows in the source table that have not been processed yet. Only rows that have been committed to the source table are included.

#### More information The number of rows in the source table that have not been processed yet. Only rows that have been committed to the source table are included. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT STRING_FORMAT('{0,number,#,###}', rows_pending_processing) AS rows_pending_processing, 'OK' AS rows_pending_processing_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% endtabs %} ### Discovered Files {% tabs %} {% tab title="Metric" %}


Metric type	Warning
About this metric	The number of files to load discovered by the job.
Limits	Error when = 0
Timeframe	Today (midnight UTC to now)

#### More information This metric applies to ingestion jobs copying data from Amazon S3, and counts the number of discovered files that match the job, but have not yet been parsed. If your job didn't find any files, the pattern you used to discover the files needs correcting. However, this can be 0 at the very start of the job, otherwise, you need to recreate the job with the correct file pattern. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT STRING_FORMAT('{0,number,#,###}', discovered_files_today) AS discovered_files_today, IF_ELSE(discovered_files_today = 0, 'ERROR', 'OK') AS discovered_files_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% tab title="Troubleshooting" %} No files to load were detected. Ensure the job is reading from the correct location and that files exist in that location. If using a date pattern make sure the pattern matches file paths. Click on **Job Details** to view the query used to create your job. You can check if the file pattern is correct. If not, you will need to create a new job and drop the old one. {% endtab %} {% endtabs %} ### Discovered Bytes {% tabs %} {% tab title="Metric" %}


Metric type	Informational
About this metric	The number of bytes to load discovered in the source stream.

#### More information This provides a general indication of the amount of work to be done, enabling you to understand the size of your data stream. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT CASE WHEN discovered_bytes_today::BIGINT < POWER(1024, 1) THEN CAST(discovered_bytes_today::BIGINT AS STRING) || ' Bytes' WHEN discovered_bytes_today::BIGINT < POWER(1024, 2) THEN CAST(ROUND(discovered_bytes_today::BIGINT / POWER(1024, 1), 2) AS STRING) || ' KB' WHEN discovered_bytes_today::BIGINT < POWER(1024, 3) THEN CAST(ROUND(discovered_bytes_today::BIGINT / POWER(1024, 2), 2) AS STRING) || ' MB' WHEN discovered_bytes_today::BIGINT < POWER(1024, 4) THEN CAST(ROUND(discovered_bytes_today::BIGINT / POWER(1024, 3), 2) AS STRING) || ' GB' ELSE CAST(ROUND(discovered_bytes_today::BIGINT / POWER(1024, 4), 2) AS STRING) || ' TB' END AS discovered_bytes_today, IF_ELSE(discovered_bytes_today::BIGINT = 0, 'ERROR', 'OK') AS discovered_bytes_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% tab title="Troubleshooting" %} No items detected in the source stream. Please ensure the source stream exists and contains items to be ingested. {% endtab %} {% endtabs %} ### Parse Errors (for ingestion jobs) {% tabs %} {% tab title="Metric" %}


Metric type	Informational
About this metric	The number of items that failed to parse. This value represents a lower bound as malformed items may corrupt subsequent items in the same file as well.
Limits	Error when > 0
Timeframe	Today (midnight UTC to now)

#### More information This metric only applies to ingestion jobs and counts the number of errors when a file or row could not be parsed. Generally, the value should be 0. If this value is above 0, you should understand why these parse errors exist e.g. the file is in the wrong format, or not formed, or corrupted. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT STRING_FORMAT('{0,number,#,###}', parse_errors_today) AS parse_errors_today, IF_ELSE(parse_errors_today > 0, 'ERROR', 'OK') AS parse_errors_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% tab title="Troubleshooting" %} Failed to parse some of the events in the source location. See the job monitoring page for [details and error messages](/content/reference-1/monitoring/job-status/stream-and-file-sources/monitoring-v1/job-execution-status.md#execution-failure-reason). {% endtab %} {% endtabs %} ### Rows written (completed executions) {% tabs %} {% tab title="Metric" %}


Metric type	Informational
About this metric	The number of rows written to the target by the job.
Timeframe	Today (midnight UTC to now)

#### More information Written rows relate to the [Average rows scanned per execution](#average-rows-scanned-per-execution). A scanned row will result in a written row unless it was filtered, or an aggregation reduced the number of scanned to written rows. For example, it may scan 1,000,000 rows, perform an aggregation, and write the result as a single row. Conversely, a flattening operation to unnest data can result in more rows written than scanned. If you are expecting scanned and written rows to match and they don’t, you need to investigate the cause of this. Similarly if you have a flattening operation that you expect to increase the number of written rows and this doesn’t happen, investigation is required. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT CASE WHEN bytes_written_today::BIGINT < POWER(1024, 1) THEN CAST(bytes_written_today::BIGINT AS STRING) || ' Bytes' WHEN bytes_written_today::BIGINT < POWER(1024, 2) THEN CAST(ROUND(bytes_written_today::BIGINT / POWER(1024, 1), 2) AS STRING) || ' KB' WHEN bytes_written_today::BIGINT < POWER(1024, 3) THEN CAST(ROUND(bytes_written_today::BIGINT / POWER(1024, 2), 2) AS STRING) || ' MB' WHEN bytes_written_today::BIGINT < POWER(1024, 4) THEN CAST(ROUND(bytes_written_today::BIGINT / POWER(1024, 3), 2) AS STRING) || ' GB' ELSE CAST(ROUND(bytes_written_today::BIGINT / POWER(1024, 4), 2) AS STRING) || ' TB' END AS bytes_written_today, 'OK' AS bytes_written_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% endtabs %} ### Rows filtered by HAVING clause {% tabs %} {% tab title="Metric" %}


Metric type	Informational
About this metric	The number of rows that were filtered out because they didn't pass the HAVING clause predicate defined in the job.
Timeframe	Today (midnight UTC to now)

#### More information The number of rows that were filtered out because they didn't pass the `HAVING` clause predicate defined in the job. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT STRING_FORMAT('{0,number,#,###}', rows_filtered_by_having_clause_today) AS rows_filtered_by_having_clause_today, 'OK' AS rows_filtered_by_having_clause_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% endtabs %} ### Rows filtered due to missing partition {% tabs %} {% tab title="Metric" %}


Metric type	Warning
About this metric	The number of rows that were filtered out because some or all of the partition columns were NULL or empty string.
Limit	Error when > 0
Timeframe	Today (midnight UTC to now)

#### More information If you are writing to a partition table and one of the partitions has a NULL value or empty string, the row will be filtered out. This is not usually intended behavior and flags that this is a user error requiring investigation. If this behavior is intended, the rows can be filtered out in the `WHERE` clause. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT STRING_FORMAT('{0,number,#,###}', rows_filtered_by_missing_partition_today) AS rows_filtered_by_missing_partition_today, 'OK' AS rows_filtered_by_missing_partition_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% endtabs %} ### Rows filtered due to missing Primary Key {% tabs %} {% tab title="Metric" %}


Metric type	Warning
About this metric	The number of rows that were filtered out because some or all of the primary key columns were NULL.
Limits	Error when > 0
Timeframe	Today (midnight UTC to now)

#### More information Rows are filtered out when a primary key is NULL. If this behavior is intended, the rows can be filtered out in the `WHERE` clause. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT STRING_FORMAT('{0,number,#,###}', rows_filtered_by_missing_primary_key_today) AS rows_filtered_by_missing_primary_key_today, 'OK' AS rows_filtered_by_missing_primary_key_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% endtabs %} ### Bytes written (completed executions) {% tabs %} {% tab title="Metric" %}


Metric type	Informational
About this metric	The size of the data written by the job.
Timeframe	Today (midnight UTC to now)

#### More information Informative metric to provide a sense of scale of the data and how much is being done. If you expect this value to be more or less there is most likely a mistake in the configuration of the job. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT CASE WHEN bytes_written_today::BIGINT < POWER(1024, 1) THEN CAST(bytes_written_today::BIGINT AS STRING) || ' Bytes' WHEN bytes_written_today::BIGINT < POWER(1024, 2) THEN CAST(ROUND(bytes_written_today::BIGINT / POWER(1024, 1), 2) AS STRING) || ' KB' WHEN bytes_written_today::BIGINT < POWER(1024, 3) THEN CAST(ROUND(bytes_written_today::BIGINT / POWER(1024, 2), 2) AS STRING) || ' MB' WHEN bytes_written_today::BIGINT < POWER(1024, 4) THEN CAST(ROUND(bytes_written_today::BIGINT / POWER(1024, 3), 2) AS STRING) || ' GB' ELSE CAST(ROUND(bytes_written_today::BIGINT / POWER(1024, 4), 2) AS STRING) || ' TB' END AS bytes_written_today, 'OK' AS bytes_written_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% endtabs %} ### Columns written {% tabs %} {% tab title="Metric" %}


Metric type	Warning
About this metric	The number of columns written to by the job. This value can change over time if the query uses * in the select clause.
Limits	Warn when > 500
Timeframe	Today (midnight UTC to now)

#### More information This is a fixed number if you’re not using a `SELECT *` statement. You can have as many columns as you want in Upsolver, but a lot of columns can cause problems downstream in query engines such as Athena or Glue. Furthermore, this may not be what the user intended, as it can be difficult to work with a lot of columns. It is best practice to ensure you keep your tables to a maximum of a few hundred columns for downstream support and performance. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT STRING_FORMAT('{0,number,#,###}', columns_written_to_today) AS columns_written_to_today, IF_ELSE(columns_written_to_today > 500, 'WARNING', 'OK') AS columns_written_to_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% tab title="Troubleshooting" %} The job is writing a large number of columns. Consider transforming this table into a new table with a specific list of required columns or consider selecting the required columns explicitly {% endtab %} {% endtabs %} ### Columns written - sparse {% tabs %} {% tab title="Metric" %}


Metric type	Warning
About this metric	The number of sparse columns written to today. A sparse column is a column that appears in less than 0.01% of all rows.
Limits	Warn when > 50% of the number of columns
Timeframe	Today (midnight UTC to now)

#### More information The number of sparse columns written today. A sparse column is a column that appears in less than 0.01% of all rows. This often happens when the job is writing to a high number of columns, but those columns only show up in one or two events. If you have a lot of sparse columns in your data, this is often because of malformed data or unexpected results. This makes it hard to work with the data downstream, so it is best to transform the data so that there are fewer columns. {% endtab %} {% tab title="See All Events (SQL Syntax)" %} Run the following SQL command in a query window in Upsolver, replacing **\** with the Id for your job. The Id for your job can be found in the [Details](/content/reference-1/monitoring/job-status/stream-and-file-sources/settings.md#details) section under the Settings tab. For additional columns, alter this statement and use `SELECT *`. {% code overflow="wrap" %} ```sql SELECT STRING_FORMAT('{0,number,#,###}', sparse_columns_written_to_today) AS sparse_columns_written_to_today, IF_ELSE(sparse_columns_written_to_today > columns_written_to_today * 0.5, 'WARNING', 'OK') AS sparse_columns_written_to_today_severity FROM system.monitoring.jobs WHERE job_id = ''; ``` {% endcode %} {% endtab %} {% tab title="Troubleshooting" %} A large number of sparse columns was detected. Consider changing the data structure to use static column names and/or using arrays and structs where appropriate. {% endtab %} {% endtabs %} --- # Agent Instructions This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com. ## Querying This Documentation If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question. Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter: ``` GET https://upsolver.gitbook.io/content/reference-1/monitoring/job-status/stream-and-file-sources/monitoring-v1/data-scanned.md?ask=&goal= ``` `ask` is the immediate question: it should be specific, self-contained, and written in natural language. `goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal. The response will contain a direct answer to the question and relevant excerpts and sources from the documentation. Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.