Implement predicate functions: all(), any(), none(), single()#2359
Implement predicate functions: all(), any(), none(), single()#2359gregfelice wants to merge 3 commits intoapache:masterfrom
Conversation
Implement the four openCypher predicate functions (issues apache#552, apache#553, apache#555, apache#556) that test list elements against a predicate: all(x IN list WHERE predicate) -- true if all elements match any(x IN list WHERE predicate) -- true if at least one matches none(x IN list WHERE predicate) -- true if no elements match single(x IN list WHERE predicate) -- true if exactly one matches Implementation approach: - Add cypher_predicate_function node type with CPFK_ALL/ANY/NONE/SINGLE kind enum, reusing the list comprehension's unnest-based transformation - Grammar rules in expr_func_subexpr (alongside EXISTS, COALESCE, COUNT) - Transform to efficient SQL sublinks: all() -> NOT EXISTS (SELECT 1 FROM unnest WHERE NOT pred) any() -> EXISTS (SELECT 1 FROM unnest WHERE pred) none() -> NOT EXISTS (SELECT 1 FROM unnest WHERE pred) single() -> (SELECT count(*) FROM unnest WHERE pred) = 1 - Three new keywords (ANY_P, NONE, SINGLE) added to safe_keywords for backward compatibility as property keys and label names - Shared extract_iter_variable_name() helper for variable validation All 32 regression tests pass. New predicate_functions test covers basic semantics, empty lists, graph data integration, boolean combinations, nested predicates, and keyword backward compatibility. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds support for the openCypher predicate functions all(), any(), none(), and single() by introducing a new AST node and transforming it into unnest(...)-based SQL subqueries during parsing/analyzing, with a new regression test suite.
Changes:
- Add new
cypher_predicate_functionAST node (+ enum kind) and register/serialize it as an ExtensibleNode. - Extend the Cypher grammar with
all/any/none/single(variable IN list WHERE predicate)and add keywords to the lexer +safe_keywords. - Implement query-tree transformation for predicate functions and add regression tests (
predicate_functions).
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/include/parser/cypher_kwlist.h | Adds any/none/single as Cypher keywords. |
| src/backend/parser/cypher_gram.y | Adds grammar rules + helper to build predicate-function nodes and wraps them in SubLinks. |
| src/include/nodes/cypher_nodes.h | Introduces cypher_predicate_function node + kind enum. |
| src/include/nodes/ag_nodes.h | Registers new node tag for predicate functions. |
| src/backend/nodes/ag_nodes.c | Adds node name and ExtensibleNode methods entry for predicate-function node. |
| src/include/nodes/cypher_outfuncs.h | Declares serialization function for the new node. |
| src/backend/nodes/cypher_outfuncs.c | Implements serialization for cypher_predicate_function. |
| src/backend/parser/cypher_clause.c | Transforms predicate-function node into unnest-based subqueries (EXISTS / count). |
| src/backend/parser/cypher_analyze.c | Adds expression walker support for the new node type. |
| Makefile | Registers the new predicate_functions regression test. |
| regress/sql/predicate_functions.sql | Adds regression SQL coverage for predicate functions. |
| regress/expected/predicate_functions.out | Adds expected output for the new regression test. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
@gregfelice Please see the above comments by Copilot |
… perf, tests - Rewrite predicate functions from EXISTS_SUBLINK to EXPR_SUBLINK with aggregate-based CASE expressions (bool_or + IS TRUE/FALSE/NULL) to preserve three-valued Cypher NULL semantics - Add list_length check in extract_iter_variable_name() to reject qualified names like x.y as iterator variables - Add copy/read support for cypher_predicate_function ExtensibleNode to prevent query rewriter crashes - Use IS TRUE filtering in single() count (LIMIT 2 optimization breaks correlated variable refs in graph contexts -- documented) - Add 13 NULL regression tests: null list input, null elements, null predicates for all four functions Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Addressed all 4 Copilot suggestions:
Bonus: Added copy/read support for All 32 regression tests pass ( |
|
@gregfelice We'll see what Copilot thinks ;) Btw, in the future, can you put your comments in the reply to Copilot, please. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ate.h 1. Add NULL-list guard for all predicate functions (all/any/none/single). Wraps the result with CASE WHEN list IS NULL THEN NULL ELSE <result> END in the grammar layer. This fixes single(x IN null WHERE ...) returning false instead of NULL. The expr pointer is safely shared between the NullTest and the predicate function node because AGE's expression transformer creates new nodes without modifying the parse tree in-place. 2. Fix single() block comment in transform_cypher_predicate_function: described LIMIT 2 optimization but implementation uses plain count(*). Updated comment to match actual implementation. 3. Keep #include "catalog/pg_aggregate.h" -- Copilot suggested removal but AGGKIND_NORMAL macro requires it (build fails without it). Regression test: predicate_functions OK. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| * Transform a cypher_predicate_function node into a query tree. | ||
| * | ||
| * Generates aggregate-based queries that preserve Cypher's three-valued | ||
| * NULL semantics. The grammar layer wraps the SubLink with a | ||
| * CASE WHEN list IS NULL THEN NULL ELSE (subquery) END guard so all | ||
| * four functions return NULL when the input list is NULL. | ||
| * | ||
| * For all()/any()/none(): | ||
| * SELECT CASE WHEN bool_or(pred IS TRUE/FALSE) THEN ... | ||
| * WHEN bool_or(pred IS NULL) THEN NULL | ||
| * ELSE ... END | ||
| * FROM unnest(list) AS x | ||
| * |
There was a problem hiding this comment.
The PR description says all/any/none are transformed to EXISTS/NOT EXISTS and that these forms short-circuit, but the implemented transform uses bool_or() aggregates + CASE (which will evaluate all rows). Please update the PR description to match the actual aggregate-based approach (or implement the EXISTS/NOT EXISTS strategy if short-circuiting is a requirement).
| /* helper function to build a list_comprehension grammar node */ | ||
| static Node *build_list_comprehension_node(Node *var, Node *expr, | ||
| Node *where, Node *mapping_expr, |
There was a problem hiding this comment.
Now that extract_iter_variable_name() was introduced to validate iterator variables (including rejecting qualified ColumnRefs), build_list_comprehension_node() should ideally reuse it so var IN list validation is consistent between list comprehensions and all/any/none/single. Otherwise list comprehensions can still accept qualified iterator names and silently use only the first field, which can lead to confusing behavior/errors.
Summary
Implements the four openCypher predicate functions (issues #552, #553, #555, #556):
all(x IN list WHERE predicate)— true if all elements matchany(x IN list WHERE predicate)— true if at least one matchesnone(x IN list WHERE predicate)— true if no elements matchsingle(x IN list WHERE predicate)— true if exactly one matchesThese are among the most requested Cypher features for AGE and are critical for users migrating from Neo4j and Kuzu (recently archived).
Implementation
Approach: Builds on the existing list comprehension infrastructure (
unnest-based subqueries with child parsestates for variable scoping).SQL transformation strategy:
all()→NOT EXISTS (SELECT 1 FROM unnest(list) AS x WHERE NOT predicate)any()→EXISTS (SELECT 1 FROM unnest(list) AS x WHERE predicate)none()→NOT EXISTS (SELECT 1 FROM unnest(list) AS x WHERE predicate)single()→(SELECT count(*) FROM unnest(list) AS x WHERE predicate) = 1EXISTS/NOT EXISTS short-circuits on first match for optimal performance.
Files changed (12):
cypher_nodes.hcypher_predicate_functionnode type withCPFK_ALL/ANY/NONE/SINGLEenumag_nodes.h/ag_nodes.ccypher_outfuncs.h/cypher_outfuncs.ccypher_kwlist.hANY_P,NONE,SINGLEcypher_gram.yexpr_func_subexpr,build_predicate_function_node()helper,extract_iter_variable_name()shared helper, keywords added tosafe_keywordscypher_clause.ctransform_cypher_predicate_function()— builds query tree from predicate nodecypher_analyze.cMakefileregress/sql/predicate_functions.sqlregress/expected/predicate_functions.outBackward compatibility:
ANY_P,NONE,SINGLEadded tosafe_keywordsso they work as property keys and label names (e.g.,{any: 1, none: 2, single: 3})ALLwas already a reserved keyword withsafe_keywordsentryRegression Tests
28 test queries covering:
all()/none(), false forany()/single())MATCH (u) WHERE all(x IN u.vals WHERE ...))any(...) AND all(...))any(x IN ... WHERE all(y IN ... WHERE ...))){any: 1, none: 2, single: 3})ORDER BY)All 32 regression tests pass (31 existing + 1 new).