Cube Logo0.33.4

Joining Data from Multiple Data Sources

Let's imagine we store information about products and their suppliers in separate databases. We want to aggregate data from these data sources while having decent performance. In the recipe below, we'll learn how to create a rollup join between two databases to achieve our goal.

First of all, we should define our database connections with the dataSource option in the cube.js configuration file:

module.exports = {
  driverFactory: ({ dataSource }) => {
    if (dataSource === 'suppliers') {
      return {
        type: 'postgres',
        database: 'recipes',
        host: 'demo-db-recipes.cube.dev',
        user: 'cube',
        password: '12345',
        port: '5432',
      };
    }

    if (dataSource === 'products') {
      return {
        type: 'postgres',
        database: 'ecom',
        host: 'demo-db-recipes.cube.dev',
        user: 'cube',
        password: '12345',
        port: '5432',
      };
    }

    throw new Error('dataSource is undefined');
  },
};

First, we'll define rollup pre-aggregations for products and suppliers. Note that these pre-aggregations should contain the dimension on which they're joined. In this case, it's the supplier_id dimension in the products cube, and the id dimension in the suppliers cube:

YAML
JS
cubes:
  - name: products
    # ...

    pre_aggregations:
      - name: products_rollup
        type: rollup
        dimensions:
          - name
          - supplier_id
        indexes:
          category_index:
            columns:
              - supplier_id

    joins:
      suppliers:
        sql: "{supplier_id} = ${suppliers.id}"
        relationship: many_to_one
YAML
JS
cubes:
  - name: suppliers
    # ...

    pre_aggregations:
      - name: suppliers_rollup
        type: rollup
        dimensions:
          - id
          - company
          - email
        indexes:
          category_index:
            columns:
              - id

Then, we'll also define a rollup_join pre-aggregation in the products cube, which will enable aggregating data from multiple data sources:

YAML
JS
cubes:
  - name: products
    # ...

    pre_aggregations:
      - name: combined_rollup
        type: rollup_join
        dimensions:
          - suppliers.email
          - suppliers.company
          - name
        rollups:
          - suppliers.suppliers_rollup
          - products_rollup

Let's get the product names and their suppliers' info, such as company name and email, with the following query:

{
  "order": {
    "products.name": "asc"
  },
  "dimensions": [
    "products.name",
    "suppliers.company",
    "suppliers.email"
  ],
  "limit": 3
}

We'll get the data from two pre-aggregations joined into one rollup_join:

[
  {
    "products.name": "Awesome Cotton Sausages",
    "suppliers.company": "Justo Eu Arcu Inc.",
    "suppliers.email": "id.risus@luctuslobortisClass.net"
  },
  {
    "products.name": "Awesome Fresh Keyboard",
    "suppliers.company": "Quisque Purus Sapien Limited",
    "suppliers.email": "Cras@consectetuercursuset.co.uk"
  },
  {
    "products.name": "Awesome Rubber Soap",
    "suppliers.company": "Tortor Inc.",
    "suppliers.email": "Mauris@ac.com"
  }
]

Please feel free to check out the full source code or run it with the docker-compose up command. You'll see the result, including queried data, in the console.

Did you find this page useful?