> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/articles-1/jobs/ingest-data-using-cdc/mysql-binlog-retention.md).

# MySQL Binlog Retention

### **What is a binary log file?** <a href="#what-is-a-binary-log-file" id="what-is-a-binary-log-file"></a>

The binary log keeps track of all modifications made to the databases' data and structure. It comprises an index and a collection of binary log files. **Binlog retention** is the period for which a binary log is retained by MySQL, after that period the logs are deleted and are inaccessible.

### **Binlog delays** <a href="#binlog-delays" id="binlog-delays"></a>

The engine [binlog delay metric](https://docs.upsolver.com/upsolver-1/administration/monitoring/monitoring-reports/monitoring-metrics) calculates the difference between the last committed event and the current time. This delay will increase if we didn't get any event in the data source, or we ingest events slower than the speed they are generated. Some possible causes might be:

* There are no new events in the tracked tables
* The cdc engine fails to start
* The connection to the database is slow or throttled

### **Effects of a large binlog delay** <a href="#effects-of-a-large-binlog-delay" id="effects-of-a-large-binlog-delay"></a>

To read the CDC in a MySQL database, Upsolver reads the binary log and parses the events.

If we experience delays we keep trying to read the older binlog files we were not able to read before. If the binlog files that the system is trying to read are deleted by MySQL (due to the binlog retention setting), the CDC input will become stuck and won't be able to advance and read further messages. The missing files can't be skipped since it will lead to an inconsistent state with missing, duplicate, and/or invalid events. The only way to recover from this situation is to re-create the Data Source.

Because of this ensuring the binlog delay doesn't exceed the binlog retention is essential for the stability and health of the Data Source and any outputs that depend on it.

#### **How to configure binlog retention in your MySQL (from RDS)** <a href="#how-to-configure-binlog-retention-in-your-mysql-from-rds" id="how-to-configure-binlog-retention-in-your-mysql-from-rds"></a>

Log in to the source database and run the following SQL statement to set the retention period of binlog:

`call mysql.rds_set_configuration('binlog retention hours', n);`

The value **n** indicates an integer from 1 to 168 (7 days).

### Monitoring binlog delays <a href="#monitoring-binlog-delays" id="monitoring-binlog-delays"></a>

Connect your monitoring system to Upsolver via the [monitoring reports](https://docs.upsolver.com/upsolver-1/administration/monitoring/monitoring-reports) page.

Use the `upsolver.binlog-delay` [metric](https://docs.upsolver.com/upsolver-1/administration/monitoring/monitoring-reports/monitoring-metrics) to monitor the binlog delay of your CDC Data Sources.

{% hint style="info" %}
We recommend you configure an alert that will let you know if the binlog delay is reaching the binlog retention configured in MySQL.

Once the delay exceeds the retention the data source is **no longer recoverable**.
{% endhint %}

Additionally, you should have a metastore connection that can be used to create a staging table as well as a corresponding storage connection that can be used to store your table's underlying files.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/articles-1/jobs/ingest-data-using-cdc/mysql-binlog-retention.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
