KEH-2025 Copilot Lambda New Endpoints Refactor#74
Conversation
🦙 MegaLinter status:
|
| Descriptor | Linter | Files | Fixed | Errors | Warnings | Elapsed time |
|---|---|---|---|---|---|---|
| bash-exec | 6 | 4 | 0 | 0.04s | ||
| ✅ BASH | shellcheck | 6 | 0 | 0 | 0.21s | |
| ✅ BASH | shfmt | 6 | 0 | 0 | 0.01s | |
| ✅ CSHARP | csharpier | 1 | 0 | 0 | 0.6s | |
| ✅ DOCKERFILE | hadolint | 1 | 0 | 0 | 0.14s | |
| ✅ JSON | jsonlint | 2 | 0 | 0 | 0.25s | |
| ✅ JSON | prettier | 2 | 0 | 0 | 0.51s | |
| ✅ JSON | v8r | 2 | 0 | 0 | 2.63s | |
| ✅ MARKDOWN | markdownlint | 6 | 0 | 0 | 1.13s | |
| ✅ REPOSITORY | checkov | yes | no | no | 30.63s | |
| ✅ REPOSITORY | dustilock | yes | no | no | 0.01s | |
| ✅ REPOSITORY | gitleaks | yes | no | no | 7.82s | |
| ✅ REPOSITORY | grype | yes | no | no | 58.27s | |
| ✅ REPOSITORY | kics | yes | no | no | 49.3s | |
| ✅ REPOSITORY | secretlint | yes | no | no | 1.18s | |
| ✅ REPOSITORY | syft | yes | no | no | 4.39s | |
| ✅ REPOSITORY | trivy | yes | no | no | 15.82s | |
| ✅ REPOSITORY | trivy-sbom | yes | no | no | 0.5s | |
| ✅ TERRAFORM | terraform-fmt | 7 | 0 | 0 | 0.57s | |
| ✅ YAML | prettier | 8 | 0 | 0 | 1.27s | |
| ✅ YAML | v8r | 8 | 0 | 0 | 9.04s | |
| ✅ YAML | yamllint | 8 | 0 | 0 | 0.66s |
See detailed report in MegaLinter reports
You could have same capabilities but better runtime performances if you request a new MegaLinter flavor.
MegaLinter is graciously provided by OX Security
| @@ -1,5 +1,7 @@ | |||
| # Copilot Team Usage | |||
|
|
|||
| Note: This functionality has been removed as of 19th March 2026 as the endpoint used to fetch metrics for team usage is being sunsetted. However, it may be restored via alternative methods in the future. | |||
src/main.py
Outdated
| # AWS Bucket Path | ||
| BUCKET_NAME = f"{account}-copilot-usage-dashboard" | ||
| OBJECT_NAME = "historic_usage_data.json" | ||
| OBJECT_NAME = "org_history.json" |
There was a problem hiding this comment.
Can we avoid the abbreviation please?
Few characters doesn't make a difference
src/main.py
Outdated
| # Get the usage data | ||
| usage_data = gh.get(f"/orgs/{org}/copilot/metrics") | ||
| usage_data = usage_data.json() | ||
| api_response = gh.get(f"/orgs/{org}/copilot/metrics/reports/organization-28-day/latest").json() |
There was a problem hiding this comment.
Just a thought, I think gh.get() returns a string if the request fails.
How does this react if gh.get() does not return a dictionary but is passed through .json()?
Do we need extra handling?
There was a problem hiding this comment.
Resolved - script will catch AttributeError since gh.get() returns a HTTPError object if the requests fails, which can't be serialised into JSON
src/main.py
Outdated
| OBJECT_NAME, | ||
| extra={"no_days_added": len(dates_added), "dates_added": dates_added}, | ||
| ) | ||
| sorted_historic_usage = sorted(historic_usage, key=lambda x: x["day"]) |
There was a problem hiding this comment.
Is there a reason we sort this?
As the file gets larger (few 100 days) this is going to be a potential performance issue.
Is the file already sorted?
There was a problem hiding this comment.
@hadiqur please do check, but I think that the way your adding new day's to the historic_usage and the format of the GitHub API output (assuming it comes in order) you should be fine to remove this line.
Do check as I cannot understand the whole view :)
There was a problem hiding this comment.
The usage data from the API response is unsorted
Since the current file in S3 is already sorted, I've added a small optimisation so that only the newly added days get sorted rather than sorting the whole file again.
This means that no matter the file size, we're always extending it by 7 items (since the lambda runs every week) in-place so it will have no performance repercussions.
There was a problem hiding this comment.
Addressing comment from @co-opsw - I've removed the use of any() by using a set to define the list of historic dates, then it's an instant look-up. :)
usage_data always remains fixed at 28 days (i.e. 28 items) so it's trivial to slice it from the latest day in historic_usage and iterate over it - plus it ensures that missing days get added if for whatever reason the file from S3 gets altered
src/main.py
Outdated
| ) | ||
| else: | ||
| # Ensure INFO logs show in the terminal when not logging to a file | ||
| logging.basicConfig( |
There was a problem hiding this comment.
I think it'd make sense to always log to terminal instead of in the file.
I think this is probably a cleaner approach.
if show_log_localy:
logging.basicConfig(
....
)| "Body": MagicMock( | ||
| read=MagicMock(return_value=json.dumps(existing_usage).encode("utf-8")) | ||
| ) | ||
| "Body": BytesIO(json.dumps(existing_usage).encode("utf-8")) |
There was a problem hiding this comment.
Reason for BytesIO here?
There was a problem hiding this comment.
I hadn't ever used MagicMock before as a testing framework before, so I found it a little bit difficult to read when looking at it for the first time.
After I did a bit of further digging, I learned that it's best practice to only use MagicMock when you want to isolate external dependencies (e.g. API calls, DB connections etc). In this case, mocking S3 or GitHub API calls is a perfect example, but it shouldn't be used to mock objects or data structures that already exist.
BytesIO has a .read() method so it naturally mimics the real response body (S3 returns a file-like object) without needing to mock behaviour.
…tion/github-copilot-usage-lambda into new-endpoint-refactor
co-opsw
left a comment
There was a problem hiding this comment.
Left a couple comments, left a couple questions that may be better to handle offline :)
src/main.py
Outdated
|
|
||
| import boto3 | ||
| import github_api_toolkit | ||
| import requests |
There was a problem hiding this comment.
Worth being more specific on imports, reduce dependency size? request get is the only thing used.
src/main.py
Outdated
| OBJECT_NAME, | ||
| extra={"no_days_added": len(dates_added), "dates_added": dates_added}, | ||
| ) | ||
| sorted_historic_usage = sorted(historic_usage, key=lambda x: x["day"]) |
There was a problem hiding this comment.
@hadiqur please do check, but I think that the way your adding new day's to the historic_usage and the format of the GitHub API output (assuming it comes in order) you should be fine to remove this line.
Do check as I cannot understand the whole view :)
| if not any(d["day"] == day["day"] for d in historic_usage): | ||
| historic_usage.append(day) | ||
| dates_added.append(day["day"]) | ||
| logger.info("Added data for day %s", day["day"]) |
There was a problem hiding this comment.
Small performance idea, as the list of historic usage gets bigger then the time to perform this any() will take longer.
Perhaps we can instead take the latest day, and remove all days in usage_data the are the same or come before it.
There was a problem hiding this comment.
Resolved - see above thread made by Kieran
| def test_write_data_locally_creates_file(self, tmp_path): | ||
| s3 = MagicMock() | ||
| gh = MagicMock() | ||
| response = MagicMock() | ||
| response.links = {} | ||
| gh.get.return_value = response | ||
|
|
||
| with patch("src.main.get_copilot_team_date", return_value=[{"name": "teamA"}]): | ||
| with patch("src.main.os.makedirs") as mock_makedirs, \ | ||
| patch("src.main.open", create=True) as mock_open: | ||
| result = get_and_update_copilot_teams(s3, gh, True) | ||
| assert result == [{"name": "teamA"}] | ||
| mock_makedirs.assert_called_once_with("output", exist_ok=True) | ||
| mock_open.assert_called_once() | ||
| s3.put_object.assert_not_called() |
There was a problem hiding this comment.
This test seems to just be testing the programs writing to local files, whilst using the copilot_team function. Is it truly worth removing?
There was a problem hiding this comment.
We already have the same test for TestGetAndUpdateHistoricUsage - so it's no longer needed (and every other test that was under the same test class). :)
tests/test_main.py
Outdated
| api_response = { | ||
| "download_links": [ | ||
| "https://example.com/org_history_api_response.json" | ||
| ] | ||
| } | ||
| fetched_usage_data = {"day_totals": [ | ||
| {"day": "2024-01-01", "usage": 10}, | ||
| ]} |
There was a problem hiding this comment.
I've moved these to the top of the file instead where they're defined once, as a separate helper function would be too small IMO
TotalDwarf03
left a comment
There was a problem hiding this comment.
One minor but I am happy with this. Good work 👍
What type of PR is this? (check all applicable)
What
As of April 2026, the endpoints that are used to fetch and aggregate team-specific copilot metrics are being sunsetted, rendering
get_copilot_team_date(),get_and_update_copilot_teams()andget_team_history()obsolete.The endpoint used to fetch organisation-wide metrics has been changed, so
get_and_update_historic_usage()and the lambda handler has been changed to reflect this.Testing
Have any new tests been added as part of this issue? If not, try to explain why test coverage is not needed here.
Please write a brief description of why test coverage is not necessary here.
Documentation
Has any new documentation been written as part of this issue? We should try to keep documentation up to date
as new code is added, rather than leaving it for the future.
Please write a brief description of why documentation is not necessary here.
Related issues
N/A
How to review
Clone the repository, set up the development environment and run the lambda on your local machine. The lambda will use and write to a new file in S3 named
org_history.json.I've also deployed it to dev, with the schedule unchanged.