Skip to content

DRILL-8545: Disable HashAgg for collect_to_list_varchar due to ordering requirements#3042

Open
rymarm wants to merge 1 commit intoapache:masterfrom
rymarm:DRILL-8545
Open

DRILL-8545: Disable HashAgg for collect_to_list_varchar due to ordering requirements#3042
rymarm wants to merge 1 commit intoapache:masterfrom
rymarm:DRILL-8545

Conversation

@rymarm
Copy link
Member

@rymarm rymarm commented Mar 23, 2026

DRILL-8545: COLLECT_TO_LIST_VARCHAR function returns incorrect result when Hash Aggregator operator used

Description

Root cause

The collect_to_list_varchar function is incompatible with the Hash Aggregator because the aggregator processes data in a non-sequential manner, while the underlying ValueVector framework requires sequential writes for variable-length data. Furthermore, the Drill UDF framework lacks a straightforward mechanism to buffer these values internally before flushing them to the output vector, making it impossible to reorder them on the fly during the aggregation phase.
Solution

Solution

To ensure data integrity and prevent index out-of-bounds exceptions, I have modified the Hash Aggregator physical planning rule. The planner will now explicitly disallow the Hash Aggregator if a collect_to_list_varchar call is detected in the aggregate expression. This forces the optimizer to fall back to the Streaming Aggregator, which provides the necessary ordered input.

Documentation

No changes.

Testing

Updated the available unit test cases so they cover the mentioned problem.

@rymarm rymarm requested review from cgivre and jnturton March 23, 2026 19:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant