Skip to content

KEH-2025 Copilot Lambda New Endpoints Refactor#74

Merged
hadiqur merged 32 commits intomainfrom
new-endpoint-refactor
Mar 24, 2026
Merged

KEH-2025 Copilot Lambda New Endpoints Refactor#74
hadiqur merged 32 commits intomainfrom
new-endpoint-refactor

Conversation

@hadiqur
Copy link
Contributor

@hadiqur hadiqur commented Mar 19, 2026

What type of PR is this? (check all applicable)

  • Refactor
  • Feature
  • Bug Fix
  • Optimization
  • Documentation Update

What

As of April 2026, the endpoints that are used to fetch and aggregate team-specific copilot metrics are being sunsetted, rendering get_copilot_team_date(), get_and_update_copilot_teams() and get_team_history() obsolete.

The endpoint used to fetch organisation-wide metrics has been changed, so get_and_update_historic_usage() and the lambda handler has been changed to reflect this.

Testing

Have any new tests been added as part of this issue? If not, try to explain why test coverage is not needed here.

  • Yes
  • No
    Please write a brief description of why test coverage is not necessary here.
  • Not as part of this ticket. (Could be done at a later point)

Documentation

Has any new documentation been written as part of this issue? We should try to keep documentation up to date
as new code is added, rather than leaving it for the future.

  • Yes
  • No
    Please write a brief description of why documentation is not necessary here.
  • Not as part of this ticket. (Could be done at a later point)

Related issues

N/A

How to review

Clone the repository, set up the development environment and run the lambda on your local machine. The lambda will use and write to a new file in S3 named org_history.json.

I've also deployed it to dev, with the schedule unchanged.

@hadiqur hadiqur changed the title New endpoint refactor KEH-2025 [Copilot Dash] Copilot Lambda New Endpoints Refactor Mar 19, 2026
@hadiqur hadiqur changed the title KEH-2025 [Copilot Dash] Copilot Lambda New Endpoints Refactor KEH-2025 Copilot Lambda New Endpoints Refactor Mar 19, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Mar 19, 2026

🦙 MegaLinter status: ⚠️ WARNING

Descriptor Linter Files Fixed Errors Warnings Elapsed time
⚠️ BASH bash-exec 6 4 0 0.04s
✅ BASH shellcheck 6 0 0 0.21s
✅ BASH shfmt 6 0 0 0.01s
✅ CSHARP csharpier 1 0 0 0.6s
✅ DOCKERFILE hadolint 1 0 0 0.14s
✅ JSON jsonlint 2 0 0 0.25s
✅ JSON prettier 2 0 0 0.51s
✅ JSON v8r 2 0 0 2.63s
✅ MARKDOWN markdownlint 6 0 0 1.13s
✅ REPOSITORY checkov yes no no 30.63s
✅ REPOSITORY dustilock yes no no 0.01s
✅ REPOSITORY gitleaks yes no no 7.82s
✅ REPOSITORY grype yes no no 58.27s
✅ REPOSITORY kics yes no no 49.3s
✅ REPOSITORY secretlint yes no no 1.18s
✅ REPOSITORY syft yes no no 4.39s
✅ REPOSITORY trivy yes no no 15.82s
✅ REPOSITORY trivy-sbom yes no no 0.5s
✅ TERRAFORM terraform-fmt 7 0 0 0.57s
✅ YAML prettier 8 0 0 1.27s
✅ YAML v8r 8 0 0 9.04s
✅ YAML yamllint 8 0 0 0.66s

See detailed report in MegaLinter reports

You could have same capabilities but better runtime performances if you request a new MegaLinter flavor.

MegaLinter is graciously provided by OX Security

@hadiqur hadiqur marked this pull request as ready for review March 19, 2026 17:13
@hadiqur hadiqur requested a review from a team March 19, 2026 17:57
Copy link
Contributor

@TotalDwarf03 TotalDwarf03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of comments

@@ -1,5 +1,7 @@
# Copilot Team Usage

Note: This functionality has been removed as of 19th March 2026 as the endpoint used to fetch metrics for team usage is being sunsetted. However, it may be restored via alternative methods in the future.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto

src/main.py Outdated
# AWS Bucket Path
BUCKET_NAME = f"{account}-copilot-usage-dashboard"
OBJECT_NAME = "historic_usage_data.json"
OBJECT_NAME = "org_history.json"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid the abbreviation please?

Few characters doesn't make a difference

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

src/main.py Outdated
# Get the usage data
usage_data = gh.get(f"/orgs/{org}/copilot/metrics")
usage_data = usage_data.json()
api_response = gh.get(f"/orgs/{org}/copilot/metrics/reports/organization-28-day/latest").json()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, I think gh.get() returns a string if the request fails.

How does this react if gh.get() does not return a dictionary but is passed through .json()?
Do we need extra handling?

Copy link
Contributor Author

@hadiqur hadiqur Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - script will catch AttributeError since gh.get() returns a HTTPError object if the requests fails, which can't be serialised into JSON

src/main.py Outdated
OBJECT_NAME,
extra={"no_days_added": len(dates_added), "dates_added": dates_added},
)
sorted_historic_usage = sorted(historic_usage, key=lambda x: x["day"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we sort this?
As the file gets larger (few 100 days) this is going to be a potential performance issue.

Is the file already sorted?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hadiqur please do check, but I think that the way your adding new day's to the historic_usage and the format of the GitHub API output (assuming it comes in order) you should be fine to remove this line.

Do check as I cannot understand the whole view :)

Copy link
Contributor Author

@hadiqur hadiqur Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The usage data from the API response is unsorted

Since the current file in S3 is already sorted, I've added a small optimisation so that only the newly added days get sorted rather than sorting the whole file again.

This means that no matter the file size, we're always extending it by 7 items (since the lambda runs every week) in-place so it will have no performance repercussions.

Copy link
Contributor Author

@hadiqur hadiqur Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressing comment from @co-opsw - I've removed the use of any() by using a set to define the list of historic dates, then it's an instant look-up. :)

usage_data always remains fixed at 28 days (i.e. 28 items) so it's trivial to slice it from the latest day in historic_usage and iterate over it - plus it ensures that missing days get added if for whatever reason the file from S3 gets altered

src/main.py Outdated
)
else:
# Ensure INFO logs show in the terminal when not logging to a file
logging.basicConfig(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd make sense to always log to terminal instead of in the file.
I think this is probably a cleaner approach.

if show_log_localy:
   logging.basicConfig(
        ....
    )

"Body": MagicMock(
read=MagicMock(return_value=json.dumps(existing_usage).encode("utf-8"))
)
"Body": BytesIO(json.dumps(existing_usage).encode("utf-8"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reason for BytesIO here?

Copy link
Contributor Author

@hadiqur hadiqur Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hadn't ever used MagicMock before as a testing framework before, so I found it a little bit difficult to read when looking at it for the first time.

After I did a bit of further digging, I learned that it's best practice to only use MagicMock when you want to isolate external dependencies (e.g. API calls, DB connections etc). In this case, mocking S3 or GitHub API calls is a perfect example, but it shouldn't be used to mock objects or data structures that already exist.

BytesIO has a .read() method so it naturally mimics the real response body (S3 returns a file-like object) without needing to mock behaviour.

Copy link

@co-opsw co-opsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple comments, left a couple questions that may be better to handle offline :)

src/main.py Outdated

import boto3
import github_api_toolkit
import requests
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth being more specific on imports, reduce dependency size? request get is the only thing used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved

src/main.py Outdated
OBJECT_NAME,
extra={"no_days_added": len(dates_added), "dates_added": dates_added},
)
sorted_historic_usage = sorted(historic_usage, key=lambda x: x["day"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hadiqur please do check, but I think that the way your adding new day's to the historic_usage and the format of the GitHub API output (assuming it comes in order) you should be fine to remove this line.

Do check as I cannot understand the whole view :)

if not any(d["day"] == day["day"] for d in historic_usage):
historic_usage.append(day)
dates_added.append(day["day"])
logger.info("Added data for day %s", day["day"])
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small performance idea, as the list of historic usage gets bigger then the time to perform this any() will take longer.

Perhaps we can instead take the latest day, and remove all days in usage_data the are the same or come before it.

Copy link
Contributor Author

@hadiqur hadiqur Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Resolved - see above thread made by Kieran

Comment on lines -121 to -135
def test_write_data_locally_creates_file(self, tmp_path):
s3 = MagicMock()
gh = MagicMock()
response = MagicMock()
response.links = {}
gh.get.return_value = response

with patch("src.main.get_copilot_team_date", return_value=[{"name": "teamA"}]):
with patch("src.main.os.makedirs") as mock_makedirs, \
patch("src.main.open", create=True) as mock_open:
result = get_and_update_copilot_teams(s3, gh, True)
assert result == [{"name": "teamA"}]
mock_makedirs.assert_called_once_with("output", exist_ok=True)
mock_open.assert_called_once()
s3.put_object.assert_not_called()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test seems to just be testing the programs writing to local files, whilst using the copilot_team function. Is it truly worth removing?

Copy link
Contributor Author

@hadiqur hadiqur Mar 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have the same test for TestGetAndUpdateHistoricUsage - so it's no longer needed (and every other test that was under the same test class). :)

Comment on lines +172 to +179
api_response = {
"download_links": [
"https://example.com/org_history_api_response.json"
]
}
fetched_usage_data = {"day_totals": [
{"day": "2024-01-01", "usage": 10},
]}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this a helper function?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved these to the top of the file instead where they're defined once, as a separate helper function would be too small IMO

@hadiqur hadiqur requested review from TotalDwarf03 and co-opsw March 23, 2026 14:46
Copy link
Contributor

@TotalDwarf03 TotalDwarf03 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One minor but I am happy with this. Good work 👍

Copy link

@co-opsw co-opsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :)

@hadiqur hadiqur merged commit 54f7837 into main Mar 24, 2026
5 checks passed
@hadiqur hadiqur deleted the new-endpoint-refactor branch March 24, 2026 10:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants