> For the complete documentation index, see [llms.txt](https://upsolver.gitbook.io/content/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://upsolver.gitbook.io/content/articles-1/jobs/transformation/working-with-arrays.md).

# Working with Arrays

Upsolver supports arrays natively and the data transformations are defined on fields directly.&#x20;

Understanding how arrays are handled for data transformations will give you better control of your output and assist you in achieving your intended data structure.

## Transformations on array fields&#x20;

In Upsolver, when performing a calculation on inputs that are arrays, the transformation is defined on the field names and the values of those fields are then passed into the calculation at runtime.

### Example 1: Calculation without an array

To start with the simplest calculation possible, let's first look at calculations that do not involve arrays.&#x20;

Suppose you have the following data:

```json
{
    "data": {
        "value1": 1, 
        "value2": 2
    }
} 
```

Given the calculation:

$$
\text{data.value1} + \text{data.value2}
$$

Upsolver calculates:&#x20;

$$
1 + 2 = 3
$$

### Example 2: Calculation with an array

To see what happens when one input is an array, suppose you have the following data:

```json
{
    "data": {
        "value1": [1,2], 
        "value2": 2
    }
}
```

Given the calculation:&#x20;

$$
\text{data.value1\[]} + \text{data.value2}
$$

Upsolver performs the addition operation for each value of the first input with the value of the second input.

As a result, the calculation of $$\[1,2] + 2$$ is evaluated as:

$$
\[1+2, 2+2] = \[3,4]
$$

### Example 3: Calculation with two arrays

Now suppose both of your inputs are arrays:

```json
{
    "data": {
        "value1": [1,2], 
        "value2": [20,30]
    }
}
```

Given the calculation:&#x20;

$$
\text{data.value1\[]} + \text{data.value2\[]}
$$

In this case, the following calculation is performed:

$$
\[1,2] + \[20,30]
$$

Since two arrays are being added together, Upsolver will perform a Cartesian product on the arrays before performing the addition operation on the resulting pairs.

By taking the Cartesian product, we combine the two arrays into an array of ordered pairs:

$$
\[1,2] \times \[20,30] = \[(1,20), (1,30), (2,20), (2,30)]
$$

Here we ended up with an array of $$2 \times 2 = 4$$ pairs. In general, Cartesian product calculations result in $$\text{length(value1\[])} \cdot \text{length(value2\[])}$$ elements.&#x20;

Finally, we add the values within each pair together to get our end result:&#x20;

$$
\[1+20, 1+30, 2+20, 2+30] = \[21,31,22,32]
$$

As Cartesian product calculations are typically undesirable, the next sections describe how to avoid them.

## Array context: Define transformations

To demonstrate how Upsolver handles transformations and understand what happens internally, the examples below use hierarchical data and discuss how to flatten this data to a tabular format later on.

### Example 1&#x20;

Assume the following input event:

```json
{    
    "data": {
        "salaries": [{
            "employee": "Jhon",
            "baseRate": 100000,
            "bonus": 23000
        },
        {
            "employee": "Jacob",
            "baseRate": 78000,
            "bonus": 12000
        }]
    }
}
```

This sample data includes an array of salaries that include the base rate and a bonus value, and we are interested in calculating the final amount to be paid for every employee.

In other words, we want to get an output that looks like this:

```json
{
    "toPay": [{
        "employee": "Jhon",
        "salary": 123000
    },
    {
        "employee": "Jacob",
        "salary": 100000
    }]
}
```

{% hint style="info" %}
We use SQL syntax to quickly demonstrate the transformation; the same applies for transformations defined using the UI.
{% endhint %}

We define the following `SELECT` statement:

```sql
SELECT data.salaries[].employee as toPay.employee, 
       salary[] as toPay.salary 
FROM my_data_source
    LET salary = data.salaries[].baseRate + data.salaries[].bonus
```

If you preview the result of the above query, you will find that it is not the desired result:

```json
{
    "toPay": {
        "employee": [
            "Jhon",
            "Jacob"
        ],
        "salary": [
            123000,
            112000,
            101000,
            90000
        ]
    }
}
```

As you can see, the results is two independent arrays: one for employees and one for salaries. Moreover, the array of salaries has 4 items instead of the 2 items we wanted.

Let's examine the query more carefully to pinpoint the source of the issue.

First of all, since the calculated `salary` field had 2 fields from within an array as inputs, we selected it as an array `salary[]`. However, we chose to **write it outside of the context of the input array `data.salaries[]`.**&#x20;

{% hint style="info" %}
In layman's terms, the context of a field is the field's location within the nested structure.&#x20;
{% endhint %}

As a result, we actually wrote the array to a field named `salary` in the root of our object:

```json
{
    "data": {
        "salaries": [{
            "employee": "Jhon",
            "baseRate": 100000,
            "bonus": 23000
        },
        {
            "employee": "Jacob",
            "baseRate": 78000,
            "bonus": 12000
        }
        ],
        "salary": [
            123000,
            112000,
            101000,
            90000
        ]
    }
}
```

With the calculated field placed outside the array, it has no local context when performing the calculation. Therefore, Upsolver takes all the values available and passes them into the `SUM` function.&#x20;

This resulted in the following calculation:&#x20;

$$
\[10000, 78000] + \[23000, 12000]
$$

As mentioned previously, this leads to a Cartesian product of the two arrays before performing the calculation on the resulting pair values:&#x20;

$$
\[100000, 78000] \times \[23000, 12000]
$$

Thus, our calculated field becomes:&#x20;

$$
\[100000 + 23000, 100000 + 12000, 78000 + 23000, 78000 +12000]
$$

$$
\= \[123000, 112000, 101000, 90000]
$$

We can see this is why `salary` is an array field; and because we lost the context of our calculation, we also don't know which salary is associated with which employee.

#### Solution

To fix this issue, all we need to do is change the context of the calculation.&#x20;

#### **Step 1: Change where we save the calculated field**

```sql
SELECT data.salaries[].employee as toPay.employee, 
       data.salaries[].salary as toPay.salary 
FROM salaries
    LET data.salaries[].salary = data.salaries[].baseRate + data.salaries[].bonus
```

By changing the target field name from `salary` to `data.salaries[].salary`, we are able to avoid the Cartesian product and compute an array with the 2 salaries we are interested in:

```json
{
    "data": {
        "toPay": {
            "employee": [
                "Jhon",
                "Jacob"
            ],
            "salary": [
                123000,
                90000
            ]
        }
    }
}
```

However, we still need to match the salary values to the correct employee.

#### **Step 2: Add context to calculated values**

As our query sent the results to the fields `toPay.employee` and `toPay.salary`, these two fields, while both nested under `toPay`, are not within the same array record.

To associate an employee with their corresponding salary, we place the fields within a shared array:

```sql
SELECT data.salaries[].employee as toPay[].employee,
       data.salaries[].salary as toPay[].salary 
FROM salaries
    LET data.salaries[].salary = data.salaries[].baseRate + data.salaries[].bonus;
```

Now that the result fields are within a shared array, the result is as expected:

```json
{
    "toPay": [
        {
            "employee": "Jhon",
            "salary": 123000
        },
        {
            "employee": "Jacob",
            "salary": 90000
        }
    ]
}
```

### Example 2: Using the `ZIP` function

In some cases, the data may arrive in a slightly different format:

```json
{
    "data": {
        "employee": ["Jhon", "Jacob"],
        "baseRate": [100000, 78000],
        "bonus": [23000, 12000]
    }
}
```

In this case, we want the first element of each array to be related to the first element of every other array, and likewise for each following element.&#x20;

Unlike the first example, we can't achieve the desired calculated salary by simply placing the calculation in the correct context as there is no shared context array.&#x20;

We can, however, create the required context by zipping together the arrays using the `ZIP` function:

```sql
LET salaries = ZIP('employee,baseRate,bonus', 
               data.employee[], 
               data.baseRate[], 
               data.bonus[]);
```

{% hint style="info" %}
The `ZIP` function takes in an optional comma-separated list of field names. If the field names aren't provided, it defaults to a list of the array inputs (`field1, field2,...`).&#x20;

The `ZIP` function stitches the arrays together on an element-by-element basis: the first element with the first element,  the second element with the second element, etc.
{% endhint %}

Using the `ZIP` function here allows access to a calculated field that is in the exact same structure as the first example: &#x20;

```json
{
    "data": {
        "salaries": [
            {
                "employee": "Jhon",
                "baseRate": 100000,
                "bonus": 23000
            },
            {
            	"employee": "Jacob",
                "baseRate": 78000,
                "bonus": 12000
            }
        ]
    }
}
```

Then, to define the output, you can use the output query that was used initially:

```sql
SELECT 
       salaries[].employee AS toPay[].employee,
       salaries[].salary AS toPay[].salary
FROM "salaries"  
  LET salaries = ZIP('employee,baseRate,bonus', 
                     data.employee[], 
                     data.baseRate[], 
                     data.bonus[]),
      salaries.salary = salaries[].baseRate + salaries[].bonus;
```

## Flattening data

Now that we understand how transformations on arrays work and the role of context and position, we will discuss flattening arrays. This is useful when we want to convert nested objects with arrays into flat tables.&#x20;

For example, if we have the following event:

```sql
{
    "values": [1,2,3],
    "name": "Oleg",
    "id": 123
}
```

You can flatten the array by using the [UNNEST](broken://spaces/WKMq8oT1OPM3KjP8vlg2/pages/TZgiYjfvBAEze5xnVk4I) operator:

```sql
UNNEST(SELECT data.values[] as value,
              data.name as name,
              data.id as id
           FROM my_data_source)
```

The resulting table looks as follows:

| value | name | id  |
| ----- | ---- | --- |
| 1     | Oleg | 123 |
| 2     | Oleg | 123 |
| 3     | Oleg | 123 |

As the data was flattened based on the `values[]` array that contains three values, the table shows three rows from our one source event.

### Flatten multiple arrays

Suppose we have the following data:

```json
{
    "data": {
        "values": ["apple", "NY"],
        "type": ["fruit", "city"]
    }
}
```

If we were to simply `SELECT` the arrays as is:

```sql
UNNEST(SELECT data.values[] as value,
              data.type[] as type
           FROM my_data_source)
```

With two independent arrays selected without a shared context, this query results in a Cartesian product:

| value | type  |
| ----- | ----- |
| apple | fruit |
| NY    | fruit |
| apple | city  |
| NY    | city  |

To avoid this, you should use the `ZIP`function to combine the arrays into a single context:

```sql
UNNEST(SELECT zipped[].value as value
              zipped[].type as type
           FROM my_data_source
           LET zipped = ZIP('type,value', data.type[], data.values[]))
```

Then the result is as expected:&#x20;

| value | type  |
| ----- | ----- |
| apple | fruit |
| NY    | city  |


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://upsolver.gitbook.io/content/articles-1/jobs/transformation/working-with-arrays.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
