Conversation
- Database adapter - ClickHouse adapter
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds a Utopia Usage subsystem: core types (Adapter, SQL base adapter, Usage façade, Metric, Query), concrete adapters for ClickHouse and Database, and extensive PHPUnit tests with a Usage test trait. Also adds CI workflows (Tests, Linter, CodeQL), Dockerfile, docker-compose (mariadb, clickhouse, usage), phpunit.xml, composer.json updates, pint.json, CONTRIBUTING.md, CODE_OF_CONDUCT.md, LICENSE.md, and README.md. Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| 'tags' => json_encode($tags), | ||
| ]; | ||
|
|
||
| if ($this->sharedTables) { |
There was a problem hiding this comment.
Why are we aware for the concept of shared vs not shared tables in this scope? We can completely abstract this.
There was a problem hiding this comment.
Abstract this where? for database library, Database already abstracts it, but for Clickhouse we don't have a separate library, either we create a clickhouse client that handles these stuff, that can be shared here and in audits as well, or we duplicate this logic both here in audit library in clickhouse adapter
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 1264-1278: The branch in ClickHouse.php uses $singleValue after
only assigning it inside the inner is_array($values) check, which can leave it
undefined; update the TYPE_LESS_EQUAL handling (and mirror the same change for
TYPE_GREATER_EQUAL) to initialize $singleValue = null before the is_array check
so both the time branch (used by formatDateTime) and the non-time branch (used
by formatParamValue) always have a defined variable; locate the blocks
referencing $singleValue, $paramName, $escapedAttr, and $values and ensure
$singleValue is set to null (or a sensible default) prior to the conditional
assignment.
🧹 Nitpick comments (1)
src/Usage/Adapter/ClickHouse.php (1)
388-389: Optional: Consider URL-encoding the query string for defensive consistency.While backticks are valid in URL query strings and ClickHouse correctly handles them, the query string could be URL-encoded for defensive consistency with standard URL practices. However, the current implementation works correctly since table names are internally controlled and alphanumeric. If you want to add URL encoding for robustness:
- $url = "{$scheme}://{$this->host}:{$this->port}/?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow"; + $query = "INSERT INTO {$escapedTable} FORMAT JSONEachRow"; + $url = "{$scheme}://{$this->host}:{$this->port}/?query=" . urlencode($query);This is not required for correctness but improves alignment with standard URL encoding practices.
…od for cleaner code
…atus and server info
…TP request timeout
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 970-1017: In validateMetricsBatch(), the tenant validation reads
the wrong array key ('$tenant') causing tenant checks to be skipped or wrong;
change the lookup to use the correct key 'tenant' (e.g., $tenantValue =
$metricData['tenant']), keep the surrounding existence check
(array_key_exists('tenant', $metricData)), and retain the existing validation
logic (allow non-negative integers or numeric strings, reject negatives and
other types) so that metric-level tenant overrides are validated correctly.
🧹 Nitpick comments (1)
src/Usage/Adapter/Database.php (1)
133-180:$batchSizeparameter is unused.The
logBatchmethod accepts$batchSizebut processes all metrics in a singleupsertDocumentsWithIncreasecall. Either implement batching or document that the Database adapter handles all documents at once regardless of batch size.Option 1: Document the behavior
/** * Log multiple metrics in batch * * `@param` array<array{metric: string, value: int, period?: string, tags?: array<string,mixed>}> $metrics - * `@param` int $batchSize Maximum number of metrics per INSERT statement + * `@param` int $batchSize Unused - Database adapter processes all metrics at once via upsert */ + #[\SuppressWarnings(PHPMD.UnusedFormalParameter)] public function logBatch(array $metrics, int $batchSize = 1000): bool
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 1026-1041: The validation is reading the wrong array key: when
checking array_key_exists('tenant', $metricData) the code then uses
$metricData['$tenant'] (with a stray dollar sign) which yields null and skips
validation; update the access to use $metricData['tenant'] and keep the existing
null/int/string checks (including the ctype_digit branch) so that the tenant
validation in this block (referencing $metricData and $index) correctly enforces
non-negative integer values and throws the same Exception messages on invalid
input.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 782-883: The INSERT should enable ClickHouse deduplication instead
of disabling retries: modify the SQL in the ClickHouse Adapter where $sql and
$url are built (variables $sql = "INSERT INTO {$escapedTable} FORMAT
JSONEachRow" and $url =
".../?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow") to include a
SETTINGS clause (e.g. " SETTINGS non_replicated_deduplication_window=1") and
ensure the URL encodes that same SQL (or append the equivalent settings query
parameter) so retries remain enabled but server-side deduplication prevents
duplicate aggregated rows; update any logging (buildErrorMessage/logQuery) that
references $sql if needed to reflect the added SETTINGS.
…ickHouse operations
- Introduced increment and set methods for metrics, replacing log and logCounter. - Added in-memory buffers for increment and counter metrics with automatic flushing. - Implemented collect and collectSet methods for deferred metric accumulation. - Enhanced flush logic to handle both increment and set metrics, clearing buffers post-flush. - Added configuration options for flush thresholds and intervals. - Updated tests to reflect changes in metric logging and buffer management.
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (5)
src/Usage/Adapter/ClickHouse.php (2)
1318-1320:⚠️ Potential issue | 🔴 CriticalUse one per-metric tenant key consistently.
These two paths disagree on the override key: validation checks
'tenant', while resolution reads'$tenant'. As written, the key that actually overrides the adapter tenant bypasses validation, and the key that gets validated is ignored during insert. In shared-table mode that can route metrics to the wrong tenant or accept invalid tenant values.Also applies to: 1443-1445
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 1318 - 1320, The code inconsistently uses two different per-metric tenant keys ('tenant' vs '$tenant') so the validation path checks array_key_exists('tenant', $metricData) while resolution reads $metricData['$tenant']; update both places to use the same single key name (pick either 'tenant' or '$tenant') so validation and resolution reference the identical key, and apply the same change to the other occurrence around the 1443-1445 region; update any related variable names (e.g., $tenantValue) and tests to match the chosen key.
844-857:⚠️ Potential issue | 🟠 MajorINSERT retries are not safe unless deduplication is configured out of band.
This method retries transport failures, but the request has no stable deduplication token or explicit non-replicated dedup setting. If ClickHouse commits the batch and the response is lost, the retry sends the same rows again and inflates totals.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/ClickHouse.php` around lines 844 - 857, The INSERT retry is unsafe because retries can duplicate committed rows; modify the executeWithRetry block (the anonymous function passed to executeWithRetry where $queryParams is built) to include a stable deduplication token (e.g. add a unique query_id or deduplication key to $queryParams generated per logical batch) or disable retries for non-idempotent INSERTs when no out-of-band deduplication is configured; specifically, change building of $queryParams (where "INSERT INTO {$escapedTable} FORMAT JSONEachRow" is set and asyncInsert flags are added) to attach a deterministic or generated dedupe identifier (like query_id) or short-circuit/throw when no dedupe is available so retries are not attempted for that batch.tests/Usage/Adapter/ClickHouseTest.php (1)
17-21:⚠️ Potential issue | 🟡 MinorParse
CLICKHOUSE_SECUREwithfilter_var(), not a bool cast.
(bool) 'false'evaluates totrue, so these tests can silently switch to HTTPS depending on how the env var is populated. This is repeated throughout the file.Minimal fix
- $secure = (bool) (getenv('CLICKHOUSE_SECURE') ?: false); + $secure = filter_var(getenv('CLICKHOUSE_SECURE') ?: 'false', FILTER_VALIDATE_BOOLEAN);Also applies to: 39-43, 458-462, 549-553, 604-608, 659-663, 695-699, 733-737, 765-769, 799-803
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/Usage/Adapter/ClickHouseTest.php` around lines 17 - 21, The env boolean parsing for CLICKHOUSE_SECURE is incorrect because it uses a bool cast which treats the string "false" as true; update every occurrence where $secure is assigned from getenv('CLICKHOUSE_SECURE') (e.g., the initial $secure assignment near the top and the repeated blocks at the other locations) to parse with filter_var(getenv('CLICKHOUSE_SECURE'), FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE) (or filter_var(..., FILTER_VALIDATE_BOOLEAN) with a fallback) so the string "false" becomes false and missing values default to false; ensure you update all occurrences of the $secure assignment in this file (the repeated blocks referenced in the comment).tests/Usage/UsageBase.php (1)
95-101:⚠️ Potential issue | 🟡 MinorThese assertions are effectively no-ops.
count($results)is always>= 0, so these tests keep passing even when the date filters return nothing. That leavesgetBetweenDates(),lessThanEqual(), andgreaterThanEqual()unverified.Suggested fix
- $this->assertGreaterThanOrEqual(0, count($results)); + $this->assertNotEmpty($results);Use the same stronger assertion in the other two tests as well.
Also applies to: 180-201
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/Usage/UsageBase.php` around lines 95 - 101, The current tests like testGetBetweenDates use assertGreaterThanOrEqual(0, count($results)), which is a no-op; update testGetBetweenDates and the analogous tests covering lessThanEqual() and greaterThanEqual() to use a stronger assertion such as assertGreaterThan(0, count($results)) or assertNotEmpty($results) so the tests actually fail when the date filters return no rows; locate and change the assertions in the methods named testGetBetweenDates, testLessThanEqual (or the test covering lessThanEqual), and testGreaterThanEqual (or its equivalent) in UsageBase.php (also apply the same change for the block around lines 180-201) to ensure the results are validated properly.src/Usage/Adapter/Database.php (1)
299-299:⚠️ Potential issue | 🔴 CriticalPass the sort attribute to
DatabaseQuery::orderDesc().Line 299 and Line 318 call
DatabaseQuery::orderDesc()without a column name. If the database query API still requiresstring $attribute, both paths fatal before the query runs.Minimal fix
- $dbQueries[] = DatabaseQuery::orderDesc(); + $dbQueries[] = DatabaseQuery::orderDesc('time');Apply the same change in both methods.
Also applies to: 318-318
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/Usage/Adapter/Database.php` at line 299, Both calls to DatabaseQuery::orderDesc() are missing the required attribute argument and will fatal; update each call to pass the column to sort by (for example the same attribute variable used to build the query, e.g. $attribute or the literal column name like 'created_at') so change DatabaseQuery::orderDesc() to DatabaseQuery::orderDesc($attribute) (or the correct column string) in both places where it is invoked.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@README.md`:
- Around line 302-304: The "System Requirements" section incorrectly refers to
"Utopia Framework requires PHP 8.0 or later"; update that text to refer to this
package instead (e.g., "This package requires PHP 8.0 or later" or
"utopia-php/usage requires PHP 8.0 or later") so consumers of the library aren't
misled; edit the "System Requirements" heading block in README.md and replace
the phrase "Utopia Framework" with the appropriate package name or neutral
wording.
In `@src/Usage/Adapter/Database.php`:
- Around line 101-145: The incrementBatch and setBatch implementations ignore
the public $batchSize parameter and always upsert the entire payload at once;
change both methods (incrementBatch and setBatch) to chunk the work into batches
of $batchSize inside the $this->db->getAuthorization()->skip(...) closure:
accumulate/convert metrics into Document objects as currently done, then iterate
over array_chunk($documents, $batchSize) and call
$this->db->upsertDocumentsWithIncrease($this->collection, 'value', $chunk) (and
the analogous upsert method used by setBatch) for each chunk so large writes
honor the requested batchSize.
- Around line 47-61: The healthCheck() currently treats a missing/empty
collection as healthy; update it so if getCollection($collectionName)->isEmpty()
is true (meaning the collection is missing/empty) the method returns healthy =>
false and includes the database and collection name, so callers know setup()
must create it; adjust the return logic in healthCheck() around $collectionName
and the getCollection(...)->isEmpty() check to return an unhealthy status when
empty/missing instead of always returning healthy.
---
Duplicate comments:
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 1318-1320: The code inconsistently uses two different per-metric
tenant keys ('tenant' vs '$tenant') so the validation path checks
array_key_exists('tenant', $metricData) while resolution reads
$metricData['$tenant']; update both places to use the same single key name (pick
either 'tenant' or '$tenant') so validation and resolution reference the
identical key, and apply the same change to the other occurrence around the
1443-1445 region; update any related variable names (e.g., $tenantValue) and
tests to match the chosen key.
- Around line 844-857: The INSERT retry is unsafe because retries can duplicate
committed rows; modify the executeWithRetry block (the anonymous function passed
to executeWithRetry where $queryParams is built) to include a stable
deduplication token (e.g. add a unique query_id or deduplication key to
$queryParams generated per logical batch) or disable retries for non-idempotent
INSERTs when no out-of-band deduplication is configured; specifically, change
building of $queryParams (where "INSERT INTO {$escapedTable} FORMAT JSONEachRow"
is set and asyncInsert flags are added) to attach a deterministic or generated
dedupe identifier (like query_id) or short-circuit/throw when no dedupe is
available so retries are not attempted for that batch.
In `@src/Usage/Adapter/Database.php`:
- Line 299: Both calls to DatabaseQuery::orderDesc() are missing the required
attribute argument and will fatal; update each call to pass the column to sort
by (for example the same attribute variable used to build the query, e.g.
$attribute or the literal column name like 'created_at') so change
DatabaseQuery::orderDesc() to DatabaseQuery::orderDesc($attribute) (or the
correct column string) in both places where it is invoked.
In `@tests/Usage/Adapter/ClickHouseTest.php`:
- Around line 17-21: The env boolean parsing for CLICKHOUSE_SECURE is incorrect
because it uses a bool cast which treats the string "false" as true; update
every occurrence where $secure is assigned from getenv('CLICKHOUSE_SECURE')
(e.g., the initial $secure assignment near the top and the repeated blocks at
the other locations) to parse with filter_var(getenv('CLICKHOUSE_SECURE'),
FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE) (or filter_var(...,
FILTER_VALIDATE_BOOLEAN) with a fallback) so the string "false" becomes false
and missing values default to false; ensure you update all occurrences of the
$secure assignment in this file (the repeated blocks referenced in the comment).
In `@tests/Usage/UsageBase.php`:
- Around line 95-101: The current tests like testGetBetweenDates use
assertGreaterThanOrEqual(0, count($results)), which is a no-op; update
testGetBetweenDates and the analogous tests covering lessThanEqual() and
greaterThanEqual() to use a stronger assertion such as assertGreaterThan(0,
count($results)) or assertNotEmpty($results) so the tests actually fail when the
date filters return no rows; locate and change the assertions in the methods
named testGetBetweenDates, testLessThanEqual (or the test covering
lessThanEqual), and testGreaterThanEqual (or its equivalent) in UsageBase.php
(also apply the same change for the block around lines 180-201) to ensure the
results are validated properly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1832fded-8aca-4e75-aeb3-7347e9f683ba
📒 Files selected for processing (7)
README.mdsrc/Usage/Adapter.phpsrc/Usage/Adapter/ClickHouse.phpsrc/Usage/Adapter/Database.phpsrc/Usage/Usage.phptests/Usage/Adapter/ClickHouseTest.phptests/Usage/UsageBase.php
🚧 Files skipped from review as they are similar to previous changes (1)
- src/Usage/Usage.php
| ## System Requirements | ||
|
|
||
| Utopia Framework requires PHP 8.0 or later. We recommend using the latest PHP version whenever possible. |
There was a problem hiding this comment.
System requirements section refers to the wrong product.
Line 304 says "Utopia Framework requires PHP 8.0 or later", but this README is for utopia-php/usage as a standalone package. That wording is misleading for consumers of this library.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@README.md` around lines 302 - 304, The "System Requirements" section
incorrectly refers to "Utopia Framework requires PHP 8.0 or later"; update that
text to refer to this package instead (e.g., "This package requires PHP 8.0 or
later" or "utopia-php/usage requires PHP 8.0 or later") so consumers of the
library aren't misled; edit the "System Requirements" heading block in README.md
and replace the phrase "Utopia Framework" with the appropriate package name or
neutral wording.
feat: integrate Utopia\Query for usage metrics queries and remove leg…
Usage
Automatically construct ID using md5 hash
Automatically aggregate metric in DB library using upsertDocumentWithIncrease and in ClickHouse using
summingMergeTreeengine and final select to make sure the aggregation happensDatabase adapter (utopia-php/database)
ClickHouse adapter
Summary by CodeRabbit
New Features
Documentation
Tests
Chores