postgresql

Allows SELECT and INSERT queries to be performed on data that is stored on a remote PostgreSQL server.

Syntax

postgresql({host:port, database, table, user, password[, schema, [, on_conflict]] | named_collection[, option=value [,..]]})

Arguments

Argument	Description
`host:port`	PostgreSQL server address.
`database`	Remote database name.
`table`	Remote table name, or a query passed to PostgreSQL as is (see Passing a query instead of a table name).
`user`	PostgreSQL user.
`password`	User password.
`schema`	Non-default table schema. Optional.
`on_conflict`	Conflict resolution strategy. Example: `ON CONFLICT DO NOTHING`. Optional.

Arguments also can be passed using named collections. In this case host and port should be specified separately. This approach is recommended for production environment.

Returned value

A table object with the same columns as the original PostgreSQL table.

Note

In the INSERT query to distinguish table function postgresql(...) from table name with column names list you must use keywords FUNCTION or TABLE FUNCTION. See examples below.

Implementation Details

SELECT queries on PostgreSQL side run as COPY (SELECT ...) TO STDOUT inside read-only PostgreSQL transaction with commit after each SELECT query.

Simple WHERE clauses such as =, !=, >, >=, <, <=, and IN are executed on the PostgreSQL server.

All joins, aggregations, sorting, IN [ array ] conditions and the LIMIT sampling constraint are executed in ClickHouse only after the query to PostgreSQL finishes.

Passing a query instead of a table name

Instead of a table name, the third argument can be a SELECT query that is passed to PostgreSQL as is. The structure of the resulting table is inferred from the query result. The query can be written either as a subquery, or wrapped into the query function:

SELECT * FROM postgresql('localhost:5432', 'test', (SELECT a, b FROM t1 JOIN t2 USING (id) WHERE a > 0), 'user', 'password');
SELECT * FROM postgresql('localhost:5432', 'test', query('SELECT a, b FROM t1 JOIN t2 USING (id) WHERE a > 0'), 'user', 'password');

This is useful to push down joins, aggregations or any other processing to PostgreSQL. Such a table is read-only: INSERT into it is not allowed. The same syntax is supported by the PostgreSQL table engine.

Note

The subquery form (SELECT ...) is parsed by ClickHouse and re-serialized in the PostgreSQL dialect (PostgreSQL identifier quoting and string-literal escaping) before being sent to the server. It must therefore be valid ClickHouse SQL. To pass PostgreSQL-specific syntax that ClickHouse does not parse, use the query('...') form, whose text is sent to PostgreSQL verbatim.

Any outer WHERE, LIMIT, aggregation, etc. of the surrounding ClickHouse query is not pushed down into the passed query — it is applied in ClickHouse after the full query result is fetched. To restrict the data read from PostgreSQL, put the filter inside the passed query. With external_table_strict_query = 1 an outer filter that cannot be pushed down is rejected with an exception instead of being applied locally.

INSERT queries on PostgreSQL side run as COPY "table_name" (field1, field2, ... fieldN) FROM STDIN inside PostgreSQL transaction with auto-commit after each INSERT statement.

PostgreSQL Array types converts into ClickHouse arrays.

Note

Be careful, in PostgreSQL an array data type column like Integer[] may contain arrays of different dimensions in different rows, but in ClickHouse it is only allowed to have multidimensional arrays of the same dimension in all rows.

Supports multiple replicas that must be listed by |. For example:

SELECT name FROM postgresql(`postgres{1|2|3}:5432`, 'postgres_database', 'postgres_table', 'user', 'password');

SELECT name FROM postgresql(`postgres1:5431|postgres2:5432`, 'postgres_database', 'postgres_table', 'user', 'password');

Supports replicas priority for PostgreSQL dictionary source. The bigger the number in map, the less the priority. The highest priority is 0.

Examples

Table in PostgreSQL:

postgres=# CREATE TABLE "public"."test" (
"int_id" SERIAL,
"int_nullable" INT NULL DEFAULT NULL,
"float" FLOAT NOT NULL,
"str" VARCHAR(100) NOT NULL DEFAULT '',
"float_nullable" FLOAT NULL DEFAULT NULL,
PRIMARY KEY (int_id));

CREATE TABLE

postgres=# INSERT INTO test (int_id, str, "float") VALUES (1,'test',2);
INSERT 0 1

postgresql> SELECT * FROM test;
  int_id | int_nullable | float | str  | float_nullable
 --------+--------------+-------+------+----------------
       1 |              |     2 | test |
(1 row)

Selecting data from ClickHouse using plain arguments:

SELECT * FROM postgresql('localhost:5432', 'test', 'test', 'postgresql_user', 'password') WHERE str IN ('test');

Or using named collections:

CREATE NAMED COLLECTION mypg AS
        host = 'localhost',
        port = 5432,
        database = 'test',
        user = 'postgresql_user',
        password = 'password';
SELECT * FROM postgresql(mypg, table='test') WHERE str IN ('test');

┌─int_id─┬─int_nullable─┬─float─┬─str──┬─float_nullable─┐
│      1 │         ᴺᵁᴸᴸ │     2 │ test │           ᴺᵁᴸᴸ │
└────────┴──────────────┴───────┴──────┴────────────────┘

Inserting:

INSERT INTO TABLE FUNCTION postgresql('localhost:5432', 'test', 'test', 'postgrsql_user', 'password') (int_id, float) VALUES (2, 3);
SELECT * FROM postgresql('localhost:5432', 'test', 'test', 'postgresql_user', 'password');

┌─int_id─┬─int_nullable─┬─float─┬─str──┬─float_nullable─┐
│      1 │         ᴺᵁᴸᴸ │     2 │ test │           ᴺᵁᴸᴸ │
│      2 │         ᴺᵁᴸᴸ │     3 │      │           ᴺᵁᴸᴸ │
└────────┴──────────────┴───────┴──────┴────────────────┘

Using Non-default Schema:

postgres=# CREATE SCHEMA "nice.schema";

postgres=# CREATE TABLE "nice.schema"."nice.table" (a integer);

postgres=# INSERT INTO "nice.schema"."nice.table" SELECT i FROM generate_series(0, 99) as t(i)

CREATE TABLE pg_table_schema_with_dots (a UInt32)
        ENGINE PostgreSQL('localhost:5432', 'clickhouse', 'nice.table', 'postgrsql_user', 'password', 'nice.schema');

Replicating or migrating Postgres data with PeerDB

In addition to table functions, you can always use PeerDB by ClickHouse to set up a continuous data pipeline from Postgres to ClickHouse. PeerDB is a tool designed specifically to replicate data from Postgres to ClickHouse using change data capture (CDC).

Syntax​

Arguments​

Returned value​

Implementation Details​

Passing a query instead of a table name​

Examples​

Related​

Replicating or migrating Postgres data with PeerDB​

Syntax

Arguments

Returned value

Implementation Details

Passing a query instead of a table name

Examples

Related

Replicating or migrating Postgres data with PeerDB