Skip to content

Feat: Usage library#1

Open
lohanidamodar wants to merge 72 commits intomainfrom
dev
Open

Feat: Usage library#1
lohanidamodar wants to merge 72 commits intomainfrom
dev

Conversation

@lohanidamodar
Copy link
Contributor

@lohanidamodar lohanidamodar commented Dec 9, 2025

Usage

  • Automatically construct ID using md5 hash

  • Automatically aggregate metric in DB library using upsertDocumentWithIncrease and in ClickHouse using summingMergeTree engine and final select to make sure the aggregation happens

  • Database adapter (utopia-php/database)

  • ClickHouse adapter

Summary by CodeRabbit

  • New Features

    • Pluggable usage tracking library with adapters (ClickHouse, SQL/Database), core Usage API, metric model, and query builder supporting batching, periods, date-range queries, aggregations, multi-tenant/namespace/shared-tables, buffering and flush behavior.
  • Documentation

    • Added README, CONTRIBUTING guide, Code of Conduct, and LICENSE.
  • Tests

    • Extensive unit and integration tests covering adapters, metrics, query parsing, buffering, and end-to-end usage flows.
  • Chores

    • CI workflows, Docker/Docker Compose support, PHPUnit config, composer scripts/requirements (PHP ≥8.0), and coding-style preset.

- Database adapter
- ClickHouse adapter
@coderabbitai
Copy link

coderabbitai bot commented Dec 9, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Adds a Utopia Usage subsystem: core types (Adapter, SQL base adapter, Usage façade, Metric, Query), concrete adapters for ClickHouse and Database, and extensive PHPUnit tests with a Usage test trait. Also adds CI workflows (Tests, Linter, CodeQL), Dockerfile, docker-compose (mariadb, clickhouse, usage), phpunit.xml, composer.json updates, pint.json, CONTRIBUTING.md, CODE_OF_CONDUCT.md, LICENSE.md, and README.md.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Feat: Usage library' is vague and overly broad. While it references a real component being added, it lacks specificity about the primary change and uses generic terminology that could apply to many different feature additions. Consider making the title more specific and descriptive, such as 'Feat: Add Usage library with Database and ClickHouse adapters' or 'Feat: Implement Usage metrics library with multi-adapter support' to better convey the scope and key components.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 85.71% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lohanidamodar lohanidamodar requested review from eldadfux and shimonewman and removed request for eldadfux and shimonewman December 9, 2025 03:11
@lohanidamodar lohanidamodar self-assigned this Dec 9, 2025
Copy link

@shimonewman shimonewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

'tags' => json_encode($tags),
];

if ($this->sharedTables) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we aware for the concept of shared vs not shared tables in this scope? We can completely abstract this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Abstract this where? for database library, Database already abstracts it, but for Clickhouse we don't have a separate library, either we create a clickhouse client that handles these stuff, that can be shared here and in audits as well, or we duplicate this logic both here in audit library in clickhouse adapter

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 1264-1278: The branch in ClickHouse.php uses $singleValue after
only assigning it inside the inner is_array($values) check, which can leave it
undefined; update the TYPE_LESS_EQUAL handling (and mirror the same change for
TYPE_GREATER_EQUAL) to initialize $singleValue = null before the is_array check
so both the time branch (used by formatDateTime) and the non-time branch (used
by formatParamValue) always have a defined variable; locate the blocks
referencing $singleValue, $paramName, $escapedAttr, and $values and ensure
$singleValue is set to null (or a sensible default) prior to the conditional
assignment.
🧹 Nitpick comments (1)
src/Usage/Adapter/ClickHouse.php (1)

388-389: Optional: Consider URL-encoding the query string for defensive consistency.

While backticks are valid in URL query strings and ClickHouse correctly handles them, the query string could be URL-encoded for defensive consistency with standard URL practices. However, the current implementation works correctly since table names are internally controlled and alphanumeric. If you want to add URL encoding for robustness:

-        $url = "{$scheme}://{$this->host}:{$this->port}/?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow";
+        $query = "INSERT INTO {$escapedTable} FORMAT JSONEachRow";
+        $url = "{$scheme}://{$this->host}:{$this->port}/?query=" . urlencode($query);

This is not required for correctness but improves alignment with standard URL encoding practices.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 970-1017: In validateMetricsBatch(), the tenant validation reads
the wrong array key ('$tenant') causing tenant checks to be skipped or wrong;
change the lookup to use the correct key 'tenant' (e.g., $tenantValue =
$metricData['tenant']), keep the surrounding existence check
(array_key_exists('tenant', $metricData)), and retain the existing validation
logic (allow non-negative integers or numeric strings, reject negatives and
other types) so that metric-level tenant overrides are validated correctly.
🧹 Nitpick comments (1)
src/Usage/Adapter/Database.php (1)

133-180: $batchSize parameter is unused.

The logBatch method accepts $batchSize but processes all metrics in a single upsertDocumentsWithIncrease call. Either implement batching or document that the Database adapter handles all documents at once regardless of batch size.

Option 1: Document the behavior
     /**
      * Log multiple metrics in batch
      *
      * `@param`  array<array{metric: string, value: int, period?: string, tags?: array<string,mixed>}>  $metrics
-     * `@param`  int  $batchSize  Maximum number of metrics per INSERT statement
+     * `@param`  int  $batchSize  Unused - Database adapter processes all metrics at once via upsert
      */
+    #[\SuppressWarnings(PHPMD.UnusedFormalParameter)]
     public function logBatch(array $metrics, int $batchSize = 1000): bool

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 1026-1041: The validation is reading the wrong array key: when
checking array_key_exists('tenant', $metricData) the code then uses
$metricData['$tenant'] (with a stray dollar sign) which yields null and skips
validation; update the access to use $metricData['tenant'] and keep the existing
null/int/string checks (including the ctype_digit branch) so that the tenant
validation in this block (referencing $metricData and $index) correctly enforces
non-negative integer values and throws the same Exception messages on invalid
input.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 782-883: The INSERT should enable ClickHouse deduplication instead
of disabling retries: modify the SQL in the ClickHouse Adapter where $sql and
$url are built (variables $sql = "INSERT INTO {$escapedTable} FORMAT
JSONEachRow" and $url =
".../?query=INSERT+INTO+{$escapedTable}+FORMAT+JSONEachRow") to include a
SETTINGS clause (e.g. " SETTINGS non_replicated_deduplication_window=1") and
ensure the URL encodes that same SQL (or append the equivalent settings query
parameter) so retries remain enabled but server-side deduplication prevents
duplicate aggregated rows; update any logging (buildErrorMessage/logQuery) that
references $sql if needed to reflect the added SETTINGS.

- Introduced increment and set methods for metrics, replacing log and logCounter.
- Added in-memory buffers for increment and counter metrics with automatic flushing.
- Implemented collect and collectSet methods for deferred metric accumulation.
- Enhanced flush logic to handle both increment and set metrics, clearing buffers post-flush.
- Added configuration options for flush thresholds and intervals.
- Updated tests to reflect changes in metric logging and buffer management.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (5)
src/Usage/Adapter/ClickHouse.php (2)

1318-1320: ⚠️ Potential issue | 🔴 Critical

Use one per-metric tenant key consistently.

These two paths disagree on the override key: validation checks 'tenant', while resolution reads '$tenant'. As written, the key that actually overrides the adapter tenant bypasses validation, and the key that gets validated is ignored during insert. In shared-table mode that can route metrics to the wrong tenant or accept invalid tenant values.

Also applies to: 1443-1445

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Usage/Adapter/ClickHouse.php` around lines 1318 - 1320, The code
inconsistently uses two different per-metric tenant keys ('tenant' vs '$tenant')
so the validation path checks array_key_exists('tenant', $metricData) while
resolution reads $metricData['$tenant']; update both places to use the same
single key name (pick either 'tenant' or '$tenant') so validation and resolution
reference the identical key, and apply the same change to the other occurrence
around the 1443-1445 region; update any related variable names (e.g.,
$tenantValue) and tests to match the chosen key.

844-857: ⚠️ Potential issue | 🟠 Major

INSERT retries are not safe unless deduplication is configured out of band.

This method retries transport failures, but the request has no stable deduplication token or explicit non-replicated dedup setting. If ClickHouse commits the batch and the response is lost, the retry sends the same rows again and inflates totals.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Usage/Adapter/ClickHouse.php` around lines 844 - 857, The INSERT retry is
unsafe because retries can duplicate committed rows; modify the executeWithRetry
block (the anonymous function passed to executeWithRetry where $queryParams is
built) to include a stable deduplication token (e.g. add a unique query_id or
deduplication key to $queryParams generated per logical batch) or disable
retries for non-idempotent INSERTs when no out-of-band deduplication is
configured; specifically, change building of $queryParams (where "INSERT INTO
{$escapedTable} FORMAT JSONEachRow" is set and asyncInsert flags are added) to
attach a deterministic or generated dedupe identifier (like query_id) or
short-circuit/throw when no dedupe is available so retries are not attempted for
that batch.
tests/Usage/Adapter/ClickHouseTest.php (1)

17-21: ⚠️ Potential issue | 🟡 Minor

Parse CLICKHOUSE_SECURE with filter_var(), not a bool cast.

(bool) 'false' evaluates to true, so these tests can silently switch to HTTPS depending on how the env var is populated. This is repeated throughout the file.

Minimal fix
-        $secure = (bool) (getenv('CLICKHOUSE_SECURE') ?: false);
+        $secure = filter_var(getenv('CLICKHOUSE_SECURE') ?: 'false', FILTER_VALIDATE_BOOLEAN);

Also applies to: 39-43, 458-462, 549-553, 604-608, 659-663, 695-699, 733-737, 765-769, 799-803

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/Usage/Adapter/ClickHouseTest.php` around lines 17 - 21, The env boolean
parsing for CLICKHOUSE_SECURE is incorrect because it uses a bool cast which
treats the string "false" as true; update every occurrence where $secure is
assigned from getenv('CLICKHOUSE_SECURE') (e.g., the initial $secure assignment
near the top and the repeated blocks at the other locations) to parse with
filter_var(getenv('CLICKHOUSE_SECURE'), FILTER_VALIDATE_BOOLEAN,
FILTER_NULL_ON_FAILURE) (or filter_var(..., FILTER_VALIDATE_BOOLEAN) with a
fallback) so the string "false" becomes false and missing values default to
false; ensure you update all occurrences of the $secure assignment in this file
(the repeated blocks referenced in the comment).
tests/Usage/UsageBase.php (1)

95-101: ⚠️ Potential issue | 🟡 Minor

These assertions are effectively no-ops.

count($results) is always >= 0, so these tests keep passing even when the date filters return nothing. That leaves getBetweenDates(), lessThanEqual(), and greaterThanEqual() unverified.

Suggested fix
-        $this->assertGreaterThanOrEqual(0, count($results));
+        $this->assertNotEmpty($results);

Use the same stronger assertion in the other two tests as well.

Also applies to: 180-201

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/Usage/UsageBase.php` around lines 95 - 101, The current tests like
testGetBetweenDates use assertGreaterThanOrEqual(0, count($results)), which is a
no-op; update testGetBetweenDates and the analogous tests covering
lessThanEqual() and greaterThanEqual() to use a stronger assertion such as
assertGreaterThan(0, count($results)) or assertNotEmpty($results) so the tests
actually fail when the date filters return no rows; locate and change the
assertions in the methods named testGetBetweenDates, testLessThanEqual (or the
test covering lessThanEqual), and testGreaterThanEqual (or its equivalent) in
UsageBase.php (also apply the same change for the block around lines 180-201) to
ensure the results are validated properly.
src/Usage/Adapter/Database.php (1)

299-299: ⚠️ Potential issue | 🔴 Critical

Pass the sort attribute to DatabaseQuery::orderDesc().

Line 299 and Line 318 call DatabaseQuery::orderDesc() without a column name. If the database query API still requires string $attribute, both paths fatal before the query runs.

Minimal fix
-            $dbQueries[] = DatabaseQuery::orderDesc();
+            $dbQueries[] = DatabaseQuery::orderDesc('time');

Apply the same change in both methods.

Also applies to: 318-318

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/Usage/Adapter/Database.php` at line 299, Both calls to
DatabaseQuery::orderDesc() are missing the required attribute argument and will
fatal; update each call to pass the column to sort by (for example the same
attribute variable used to build the query, e.g. $attribute or the literal
column name like 'created_at') so change DatabaseQuery::orderDesc() to
DatabaseQuery::orderDesc($attribute) (or the correct column string) in both
places where it is invoked.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@README.md`:
- Around line 302-304: The "System Requirements" section incorrectly refers to
"Utopia Framework requires PHP 8.0 or later"; update that text to refer to this
package instead (e.g., "This package requires PHP 8.0 or later" or
"utopia-php/usage requires PHP 8.0 or later") so consumers of the library aren't
misled; edit the "System Requirements" heading block in README.md and replace
the phrase "Utopia Framework" with the appropriate package name or neutral
wording.

In `@src/Usage/Adapter/Database.php`:
- Around line 101-145: The incrementBatch and setBatch implementations ignore
the public $batchSize parameter and always upsert the entire payload at once;
change both methods (incrementBatch and setBatch) to chunk the work into batches
of $batchSize inside the $this->db->getAuthorization()->skip(...) closure:
accumulate/convert metrics into Document objects as currently done, then iterate
over array_chunk($documents, $batchSize) and call
$this->db->upsertDocumentsWithIncrease($this->collection, 'value', $chunk) (and
the analogous upsert method used by setBatch) for each chunk so large writes
honor the requested batchSize.
- Around line 47-61: The healthCheck() currently treats a missing/empty
collection as healthy; update it so if getCollection($collectionName)->isEmpty()
is true (meaning the collection is missing/empty) the method returns healthy =>
false and includes the database and collection name, so callers know setup()
must create it; adjust the return logic in healthCheck() around $collectionName
and the getCollection(...)->isEmpty() check to return an unhealthy status when
empty/missing instead of always returning healthy.

---

Duplicate comments:
In `@src/Usage/Adapter/ClickHouse.php`:
- Around line 1318-1320: The code inconsistently uses two different per-metric
tenant keys ('tenant' vs '$tenant') so the validation path checks
array_key_exists('tenant', $metricData) while resolution reads
$metricData['$tenant']; update both places to use the same single key name (pick
either 'tenant' or '$tenant') so validation and resolution reference the
identical key, and apply the same change to the other occurrence around the
1443-1445 region; update any related variable names (e.g., $tenantValue) and
tests to match the chosen key.
- Around line 844-857: The INSERT retry is unsafe because retries can duplicate
committed rows; modify the executeWithRetry block (the anonymous function passed
to executeWithRetry where $queryParams is built) to include a stable
deduplication token (e.g. add a unique query_id or deduplication key to
$queryParams generated per logical batch) or disable retries for non-idempotent
INSERTs when no out-of-band deduplication is configured; specifically, change
building of $queryParams (where "INSERT INTO {$escapedTable} FORMAT JSONEachRow"
is set and asyncInsert flags are added) to attach a deterministic or generated
dedupe identifier (like query_id) or short-circuit/throw when no dedupe is
available so retries are not attempted for that batch.

In `@src/Usage/Adapter/Database.php`:
- Line 299: Both calls to DatabaseQuery::orderDesc() are missing the required
attribute argument and will fatal; update each call to pass the column to sort
by (for example the same attribute variable used to build the query, e.g.
$attribute or the literal column name like 'created_at') so change
DatabaseQuery::orderDesc() to DatabaseQuery::orderDesc($attribute) (or the
correct column string) in both places where it is invoked.

In `@tests/Usage/Adapter/ClickHouseTest.php`:
- Around line 17-21: The env boolean parsing for CLICKHOUSE_SECURE is incorrect
because it uses a bool cast which treats the string "false" as true; update
every occurrence where $secure is assigned from getenv('CLICKHOUSE_SECURE')
(e.g., the initial $secure assignment near the top and the repeated blocks at
the other locations) to parse with filter_var(getenv('CLICKHOUSE_SECURE'),
FILTER_VALIDATE_BOOLEAN, FILTER_NULL_ON_FAILURE) (or filter_var(...,
FILTER_VALIDATE_BOOLEAN) with a fallback) so the string "false" becomes false
and missing values default to false; ensure you update all occurrences of the
$secure assignment in this file (the repeated blocks referenced in the comment).

In `@tests/Usage/UsageBase.php`:
- Around line 95-101: The current tests like testGetBetweenDates use
assertGreaterThanOrEqual(0, count($results)), which is a no-op; update
testGetBetweenDates and the analogous tests covering lessThanEqual() and
greaterThanEqual() to use a stronger assertion such as assertGreaterThan(0,
count($results)) or assertNotEmpty($results) so the tests actually fail when the
date filters return no rows; locate and change the assertions in the methods
named testGetBetweenDates, testLessThanEqual (or the test covering
lessThanEqual), and testGreaterThanEqual (or its equivalent) in UsageBase.php
(also apply the same change for the block around lines 180-201) to ensure the
results are validated properly.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1832fded-8aca-4e75-aeb3-7347e9f683ba

📥 Commits

Reviewing files that changed from the base of the PR and between b5605bd and ed9d6bc.

📒 Files selected for processing (7)
  • README.md
  • src/Usage/Adapter.php
  • src/Usage/Adapter/ClickHouse.php
  • src/Usage/Adapter/Database.php
  • src/Usage/Usage.php
  • tests/Usage/Adapter/ClickHouseTest.php
  • tests/Usage/UsageBase.php
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/Usage/Usage.php

Comment on lines +302 to +304
## System Requirements

Utopia Framework requires PHP 8.0 or later. We recommend using the latest PHP version whenever possible.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

System requirements section refers to the wrong product.

Line 304 says "Utopia Framework requires PHP 8.0 or later", but this README is for utopia-php/usage as a standalone package. That wording is misleading for consumers of this library.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@README.md` around lines 302 - 304, The "System Requirements" section
incorrectly refers to "Utopia Framework requires PHP 8.0 or later"; update that
text to refer to this package instead (e.g., "This package requires PHP 8.0 or
later" or "utopia-php/usage requires PHP 8.0 or later") so consumers of the
library aren't misled; edit the "System Requirements" heading block in README.md
and replace the phrase "Utopia Framework" with the appropriate package name or
neutral wording.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants