Skip to main content
Skip to main content

UDFs User Defined Functions

ClickHouse supports several types of user defined functions (UDFs):

  • Executable UDFs start an external program or script (Python, Bash, etc.) and stream blocks of data to it over STDIN / STDOUT. Use them to integrate existing code or tooling without recompiling ClickHouse. They have higher per‑call overhead compared to in‑process options and are best for heavier logic or where a different runtime is required.
  • SQL UDFs are defined with CREATE FUNCTION purely in SQL. They are inlined/expanded into the query plan (no process boundary), making them lightweight and ideal for reusing expression logic or simplifying complex calculated columns.
  • Experimental WebAssembly UDFs run code compiled to WebAssembly inside a sandbox within the server process. They offer lower per‑call overhead than external executables with better isolation than native extensions, making them suitable for custom algorithms written in languages that can target WASM (e.g. C/C++/Rust).
  • Experimental driver-based executable UDFs let an operator-supplied "driver" turn a code snippet supplied in CREATE FUNCTION ... ENGINE = DriverName(...) AS '...' into an executable UDF at function-creation time (for example, by compiling it). They build on executable UDFs and require server-side driver configuration.

Executable User Defined Functions

Beta feature. Learn more.
Note

In ClickHouse Cloud, executable UDFs are in public beta and are created through the Cloud console UI. See User-defined functions in Cloud for the Cloud-specific workflow.

ClickHouse can call any external executable program or script to process data.

The configuration of executable user defined functions can be located in one or more xml-files. The path to the configuration is specified in the user_defined_executable_functions_config parameter.

A function configuration contains the following settings:

ParameterDescriptionRequiredDefault Value
nameA function nameYes-
commandScript name to execute or command if execute_direct is falseYes-
argumentArgument description with the type, and optional name of an argument. Each argument is described in a separate setting. Specifying name is necessary if argument names are part of serialization for user defined function format like Native or JSONEachRowYesc + argument_number
formatA format in which arguments are passed to the command. The command output is expected to use the same format tooYes-
return_typeThe type of a returned valueYes-
return_nameName of returned value. Specifying return name is necessary if return name is part of serialization for user defined function format like Native or JSONEachRowOptionalresult
typeAn executable type. If type is set to executable then single command is started. If it is set to executable_pool then a pool of commands is createdYes-
max_command_execution_timeMaximum execution time in seconds for processing block of data. This setting is valid for executable_pool commands onlyOptional10
command_termination_timeoutTime in seconds during which a command should finish after its pipe is closed. After that time SIGTERM is sent to the process executing the commandOptional10
command_read_timeoutTimeout for reading data from command stdout in millisecondsOptional10000
command_write_timeoutTimeout for writing data to command stdin in millisecondsOptional10000
pool_sizeThe size of a command poolOptional16
send_chunk_headerControls whether to send row count before sending a chunk of data to processOptionalfalse
execute_directIf execute_direct = 1, then command will be searched inside user_scripts folder specified by user_scripts_path. Additional script arguments can be specified using whitespace separator. Example: script_name arg1 arg2. If execute_direct = 0, command is passed as argument for bin/sh -cOptional1
lifetimeThe reload interval of a function in seconds. If it is set to 0 then the function is not reloadedOptional0
deterministicIf the function is deterministic (returns the same result for the same input)Optionalfalse
stderr_reactionHow to handle the command's stderr output. Values: none (ignore), log (log all stderr immediately), log_first (log first 4 KiB after exit), log_last (log last 4 KiB after exit), throw (throw exception immediately on any stderr output). When using log_first or log_last with a non-zero exit code, the stderr content is included in the exception messageOptionallog_last
check_exit_codeIf true, ClickHouse will check the exit code of the command. A non-zero exit code causes an exceptionOptionaltrue

The command must read arguments from STDIN and must output the result to STDOUT. The command must process arguments iteratively. That is after processing a chunk of arguments it must wait for the next chunk.

Executable User Defined Functions

Examples

UDF from inline script

Create test_function_sum manually specifying execute_direct to 0 using either XML or YAML configuration.

File test_function.xml (/etc/clickhouse-server/test_function.xml with default path settings).

<functions>
    <function>
        <type>executable</type>
        <name>test_function_sum</name>
        <return_type>UInt64</return_type>
        <argument>
            <type>UInt64</type>
            <name>lhs</name>
        </argument>
        <argument>
            <type>UInt64</type>
            <name>rhs</name>
        </argument>
        <format>TabSeparated</format>
        <command>cd /; clickhouse-local --input-format TabSeparated --output-format TabSeparated --structure 'x UInt64, y UInt64' --query "SELECT x + y FROM table"</command>
        <execute_direct>0</execute_direct>
        <deterministic>true</deterministic>
    </function>
</functions>

SELECT test_function_sum(2, 2);
┌─test_function_sum(2, 2)─┐
│                       4 │
└─────────────────────────┘

UDF from Python script

In this example we create a UDF which reads a value from STDIN and returns it as a string.

Create test_function using either XML OR YAML configuration.

File test_function.xml (/etc/clickhouse-server/test_function.xml with default path settings).

<functions>
    <function>
        <type>executable</type>
        <name>test_function_python</name>
        <return_type>String</return_type>
        <argument>
            <type>UInt64</type>
            <name>value</name>
        </argument>
        <format>TabSeparated</format>
        <command>test_function.py</command>
    </function>
</functions>

Create a script file test_function.py inside user_scripts folder (/var/lib/clickhouse/user_scripts/test_function.py with default path settings).

#!/usr/bin/python3

import sys

if __name__ == '__main__':
    for line in sys.stdin:
        print("Value " + line, end='')
        sys.stdout.flush()
SELECT test_function_python(toUInt64(2));
┌─test_function_python(2)─┐
│ Value 2                 │
└─────────────────────────┘

Read two values from STDIN and return their sum as a JSON object

Create test_function_sum_json with named arguments and format JSONEachRow using either XML or YAML configuration.

File test_function.xml (/etc/clickhouse-server/test_function.xml with default path settings).

<functions>
    <function>
        <type>executable</type>
        <name>test_function_sum_json</name>
        <return_type>UInt64</return_type>
        <return_name>result_name</return_name>
        <argument>
            <type>UInt64</type>
            <name>argument_1</name>
        </argument>
        <argument>
            <type>UInt64</type>
            <name>argument_2</name>
        </argument>
        <format>JSONEachRow</format>
        <command>test_function_sum_json.py</command>
    </function>
</functions>

Create script file test_function_sum_json.py inside the user_scripts folder (/var/lib/clickhouse/user_scripts/test_function_sum_json.py with default path settings).

#!/usr/bin/python3

import sys
import json

if __name__ == '__main__':
    for line in sys.stdin:
        value = json.loads(line)
        first_arg = int(value['argument_1'])
        second_arg = int(value['argument_2'])
        result = {'result_name': first_arg + second_arg}
        print(json.dumps(result), end='\n')
        sys.stdout.flush()
SELECT test_function_sum_json(2, 2);
┌─test_function_sum_json(2, 2)─┐
│                            4 │
└──────────────────────────────┘

Use parameters in command setting

Executable user defined functions can take constant parameters configured in command setting (this works only for user defined functions with executable type). It also requires the execute_direct option to ensure no shell argument expansion vulnerability.

File test_function_parameter_python.xml (/etc/clickhouse-server/test_function_parameter_python.xml with default path settings).

<functions>
    <function>
        <type>executable</type>
        <execute_direct>true</execute_direct>
        <name>test_function_parameter_python</name>
        <return_type>String</return_type>
        <argument>
            <type>UInt64</type>
        </argument>
        <format>TabSeparated</format>
        <command>test_function_parameter_python.py {test_parameter:UInt64}</command>
    </function>
</functions>

Create script file test_function_parameter_python.py inside the user_scripts folder (/var/lib/clickhouse/user_scripts/test_function_parameter_python.py with default path settings).

#!/usr/bin/python3

import sys

if __name__ == "__main__":
    for line in sys.stdin:
        print("Parameter " + str(sys.argv[1]) + " value " + str(line), end="")
        sys.stdout.flush()
SELECT test_function_parameter_python(1)(2);
┌─test_function_parameter_python(1)(2)─┐
│ Parameter 1 value 2                  │
└──────────────────────────────────────┘

UDF from shell script

In this example, we create a shell script that multiplies each value by 2.

File test_function_shell.xml (/etc/clickhouse-server/test_function_shell.xml with default path settings).

<functions>
    <function>
        <type>executable</type>
        <name>test_shell</name>
        <return_type>String</return_type>
        <argument>
            <type>UInt8</type>
            <name>value</name>
        </argument>
        <format>TabSeparated</format>
        <command>test_shell.sh</command>
    </function>
</functions>

Create a script file test_shell.sh inside the user_scripts folder (/var/lib/clickhouse/user_scripts/test_shell.sh with default path settings).

#!/bin/bash

while read read_data;
    do printf "$(expr $read_data \* 2)\n";
done
SELECT test_shell(number) FROM numbers(10);
    ┌─test_shell(number)─┐
 1. │ 0                  │
 2. │ 2                  │
 3. │ 4                  │
 4. │ 6                  │
 5. │ 8                  │
 6. │ 10                 │
 7. │ 12                 │
 8. │ 14                 │
 9. │ 16                 │
10. │ 18                 │
    └────────────────────┘

Error Handling

Some functions might throw an exception if the data is invalid. In this case, the query is canceled and an error text is returned to the client. For distributed processing, when an exception occurs on one of the servers, the other servers also attempt to abort the query.

Evaluation of Argument Expressions

In almost all programming languages, one of the arguments might not be evaluated for certain operators. This is usually the operators &&, ||, and ?:. In ClickHouse, arguments of functions (operators) are always evaluated. This is because entire parts of columns are evaluated at once, instead of calculating each row separately.

Performing Functions for Distributed Query Processing

For distributed query processing, as many stages of query processing as possible are performed on remote servers, and the rest of the stages (merging intermediate results and everything after that) are performed on the requestor server.

This means that functions can be performed on different servers. For example, in the query SELECT f(sum(g(x))) FROM distributed_table GROUP BY h(y),

  • if a distributed_table has at least two shards, the functions 'g' and 'h' are performed on remote servers, and the function 'f' is performed on the requestor server.
  • if a distributed_table has only one shard, all the 'f', 'g', and 'h' functions are performed on this shard's server.

The result of a function usually does not depend on which server it is performed on. However, sometimes this is important. For example, functions that work with dictionaries use the dictionary that exists on the server they are running on. Another example is the hostName function, which returns the name of the server it is running on in order to make GROUP BY by servers in a SELECT query.

If a function in a query is performed on the requestor server, but you need to perform it on remote servers, you can wrap it in an 'any' aggregate function or add it to a key in GROUP BY.

SQL User Defined Functions

Custom functions from lambda expressions can be created using the CREATE FUNCTION statement. To delete these functions use the DROP FUNCTION statement.

WebAssembly User Defined Functions

Not supported in ClickHouse Cloud
Experimental feature. Learn more.

WebAssembly User Defined Functions (WASM UDFs) allow you to run custom code compiled to WebAssembly inside the ClickHouse server process.

Quick Start

Enable experimental WebAssembly support in your ClickHouse configuration:

<clickhouse>
    <allow_experimental_webassembly_udf>true</allow_experimental_webassembly_udf>
</clickhouse>

Insert your compiled WASM module into the system table:

INSERT INTO system.webassembly_modules (name, code)
SELECT 'my_module', base64Decode('AGFzbQEAAAA...');

Create a function using your WASM module:

CREATE FUNCTION my_function
LANGUAGE WASM
ABI ROW_DIRECT
FROM 'my_module'
ARGUMENTS (x UInt32, y UInt32)
RETURNS UInt32;

Use the function in your queries:

SELECT my_function(10, 20);

More Information

Refer to the documentation on WebAssembly User Defined Functions for more details.

Driver-based Executable User Defined Functions

Not supported in ClickHouse Cloud
Experimental feature. Learn more.
Note

This is an experimental feature that may change in backward-incompatible ways in future releases. Enable it with the allow_experimental_executable_udf_drivers server setting.

A driver is an operator-supplied adapter that turns a user code snippet into a runnable executable UDF. When a function is created with ENGINE = DriverName(...), ClickHouse runs the driver's create_command, passing it the function signature and the code body; the driver compiles or otherwise processes the body and prints an executable UDF configuration, which ClickHouse then stores and loads.

This lets administrators offer users a safe, narrow way to define functions in an arbitrary language (for example, C compiled inside a sandboxed container) without giving them access to the server's configuration files or filesystem. The set of available drivers is entirely controlled by the operator.

Enabling drivers

Driver-based executable UDFs are disabled by default. To enable them:

  1. Set the experimental gate in the server configuration:

    <clickhouse>
        <allow_experimental_executable_udf_drivers>true</allow_experimental_executable_udf_drivers>
    </clickhouse>
    
  2. Point user_defined_executable_function_drivers_config at one or more driver configuration files (a glob is supported), and optionally set dynamic_user_defined_executable_functions_path, the directory where the generated executable UDF configurations are stored:

    <clickhouse>
        <user_defined_executable_function_drivers_config>user_defined_executable_function_drivers_config.d/*_driver.xml</user_defined_executable_function_drivers_config>
        <dynamic_user_defined_executable_functions_path>/var/lib/clickhouse/dynamic_user_defined_executable_functions/</dynamic_user_defined_executable_functions_path>
    </clickhouse>
    

The driver registry is loaded on server start and refreshed on SYSTEM RELOAD CONFIG, so drivers can be added, changed, or removed without restarting the server.

Driver configuration

A driver is described by an XML (or YAML) file with a top-level <driver> element. The following fields are supported:

FieldDescriptionRequired
nameThe driver name, as used in CREATE FUNCTION ... ENGINE = <name>(...).Yes
create_commandPath to the program invoked to create a UDF from a code snippet. Relative paths are resolved against the driver configuration file.Yes
drop_commandPath to the program invoked when a function based on this driver is dropped.No
engine_argumentsDeclares the arguments allowed inside ENGINE = DriverName(...). Each child element is an argument name; a <required>true</required> child marks it as mandatory.No
envEnvironment variables exported when invoking the driver commands.No

Example driver configuration:

<clickhouse>
    <driver>
        <name>DockerC</name>
        <create_command>../user_defined_executable_function_drivers/docker_c_create.sh</create_command>
        <drop_command>../user_defined_executable_function_drivers/docker_c_drop.sh</drop_command>
        <engine_arguments>
            <opt_level><required>false</required></opt_level>
        </engine_arguments>
        <env>
            <CLICKHOUSE_C_DRIVER_MEMORY>256m</CLICKHOUSE_C_DRIVER_MEMORY>
            <CLICKHOUSE_C_DRIVER_CPUS>1.0</CLICKHOUSE_C_DRIVER_CPUS>
        </env>
    </driver>
</clickhouse>

Driver invocation contract

When CREATE FUNCTION runs, create_command is invoked with the configured env variables set and the following arguments:

  • --name <function_name>
  • --return <return_type> (if a RETURNS clause is present)
  • --args <signature> (if an ARGUMENTS clause is present), where the signature is the declared argument list, for example x UInt8, y DateTime
  • --<key> <value> for every declared engine argument supplied in ENGINE = DriverName(key = value)

The user code body (the text after AS) is sent to the command's standard input. The command must print the configuration of an executable UDF to its standard output. The format is auto-detected: output that starts with < is treated as XML, otherwise as YAML. The defined function name in the generated configuration must match the name being created. If create_command exits with a non-zero status, the statement fails with an exception that includes the exit code and the driver's standard error.

drop_command, when present, is invoked the same way (without a code body on stdin) when the function is dropped.

Creating a function

CREATE [OR REPLACE] FUNCTION [IF NOT EXISTS] name [ON CLUSTER cluster]
    ARGUMENTS (a UInt8, b String) RETURNS UInt64
    ENGINE = DriverName(key1 = 'value1', key2 = 42)
    AS '...code body...'

ClickHouse runs the driver's create_command, writes the generated configuration into dynamic_user_defined_executable_functions_path, and the existing executable UDF loader picks it up. The function can then be called like any other function.

Dropping a function

DROP FUNCTION [IF EXISTS] name [ON CLUSTER cluster]

DROP FUNCTION invokes the driver's drop_command (if present), removes the generated dynamic configuration and the per-function working directory, reloads the executable UDF loader, and removes the persisted query.

Persistence and restart

The originating query is persisted as an ATTACH FUNCTION ... statement in the user-defined SQL objects directory, so the function survives a server restart. On start, the generated configurations in dynamic_user_defined_executable_functions_path are loaded directly without re-running the driver. If a persisted ATTACH FUNCTION has no matching generated configuration (for example, the dynamic directory was lost), the driver is re-run to recreate it.

Limitations

  • The feature is experimental and gated behind allow_experimental_executable_udf_drivers.
  • Driver-based functions are not supported with replicated user-defined function storage (ON CLUSTER and <user_defined_zookeeper_path>), because only the originating query is replicated, not the generated artifacts.
  • RESTORE of a backed-up driver-based function persists the query but does not re-run the driver; the generated configuration is materialized later by restart recovery.

Example C drivers

The source tree ships proof-of-concept drivers under programs/server/user_defined_executable_function_drivers_config.d/ that compile and run a C function body. They are examples and are not installed by packages:

  • DockerC - compiles and runs the code inside sandboxed Docker containers (--network=none --read-only --cap-drop=ALL --security-opt=no-new-privileges, plus memory/CPU/PID limits), emitting an executable_pool UDF.
  • GVisorC - a variant that runs the compiled binary under the gVisor runsc runtime.
  • UnsafeC - compiles and runs the code directly on the host without a sandbox. As the name indicates, it provides no isolation and is intended only for trusted environments and testing.

These example drivers are intended as a starting point; review and harden the sandboxing for your environment before exposing them to untrusted users.