> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/articles-1/get-started/core-concepts/core-components.md).

# Core Components

<figure><img src="/files/u94aYyjAJKKRvHtjzIJb" alt=""><figcaption><p>Upsolver private VPC deployment</p></figcaption></figure>

The Upsolver Compute Cluster is a group of EC2 instances that are responsible for data processing. These servers provide the compute power for transforming, aggregating, and enriching data in Upsolver. They don’t interact with outside processes. The instances poll work and process the data, then write the data to Amazon S3.&#x20;

## Resources

### Kinesis Stream

Upsolver uses the Kinesis Stream to enable all servers to communicate with each other. They do so by reporting information directly to Kinesis; they do not communicate with each other. Each server polls data from Kinesis to discover information about other servers.&#x20;

All servers exchange information with the Kinesis Stream to communicate state and to synchronize the work between the servers running in the account. This applies to:

* Upsolver's fully-managed architecture deployed in Upsolver’s environment
* Private VPC deployment architecture deployed in your AWS account

### Metadata Store

The Metadata Store is a global component for Upsolver's fully-managed deployment model and for the private VPC deployment model. It is a centralized space to store the configurations. If you create an object in Upsolver, Upsolver always stores the definitions in the global metadata store. The metadata store is a key-value store that the clients communicate using the API server. Clients don’t interact with the Metadata Store directly. Outgoing traffic goes through the API server for the purpose of storing and requesting information from the key-value store. The same entities are reflected on Amazon S3 in the user’s account to provide durability. The production servers only poll data from Amazon S3. In the unlikely event that the global environment is unavailable, there is no impact on any data being processed.&#x20;

The servers report to several Kinesis streams, including the status of the tasks and operational metrics such as CPU and memory. It keeps track of servers’ health and the information is used to replace servers if necessary. Servers report some of the metrics directly to the user's CloudWatch environment, which is used for scaling and auto-healing of the cluster. The metrics are also used for spinning servers up and down. Servers report additional metrics, such as billing information and telemetry data, to Upsolver-managed Kinesis streams. The data reported directly to CloudWatch is operational data about the servers.&#x20;

### Logs and environment&#x20;

By default, Upsolver sends application logs to a centralized location for easy debugging from an Upsolver bucket. Optionally, you can choose to send the logs to your own dedicated bucket if you want direct access to the logs.&#x20;

The Upsolver environment polls data from various locations. Environment configurations including geo IP map and user agent map files are polled from global configurations as well as from static initialization files. They’re polled from various buckets on Amazon S3. During the initialization phase, the servers also install components such as Java and Docker containers polled from Upsolver’s Docker Hub repository. Servers also report to the monitoring infrastructure (InfluxDB).&#x20;

Upsolver’s web interface is hosted on a CDN out of an Amazon S3 bucket. The web interface accesses the private API directly to populate the entities.

## Operations

### File-based data

Upsolver lists the files that it reads from the source and then creates a list of all files that need to be loaded. By default, the poll operation executes every minute. Upsolver then takes the list of files, parses the data, and pushes them into a parsed folder. The parsed folders are the same for both file-based and event-based data.

For file-based data sources, Upsolver reinforces exactly-once semantics by:

1. Sending metadata on which files exist.
2. Storing which files exist in the Kinesis stream.
3. Reading existing file information from the Kinesis stream to ensure exactly-once processing.

### Event-based data

For event-based data sources, Upsolver reinforces exactly-once semantics by:

1. Finding the events up until which timestamp/offset have already been polled.
2. Writing the information to the Kinesis stream.
3. Reading offset information from the Kinesis stream to ensure exactly-once processing.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/articles-1/get-started/core-concepts/core-components.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
