Skip to content

Write standalone --debug output to CSV file#15182

Open
OliverRietmann wants to merge 5 commits intoAliceO2Group:devfrom
cern-nextgen:write_debug_to_csv
Open

Write standalone --debug output to CSV file#15182
OliverRietmann wants to merge 5 commits intoAliceO2Group:devfrom
cern-nextgen:write_debug_to_csv

Conversation

@OliverRietmann
Copy link

@OliverRietmann OliverRietmann commented Mar 17, 2026

New Table Style Output

count name gpu (us) cpu (us) cpu/tot tot (us) GB/s bytes bytes/call
K 405 GPUMemClean16 68428
K 144 GPUTPCCFDecodeZSDenseLink 99382
K 36 GPUTPCCFCheckPadBaseline 30028 139.366 4184893440 116247040
K 144 GPUTPCCFPeakFinder 45857
K 288 GPUTPCCFStreamCompaction_scanStart 6283
K 288 GPUTPCCFStreamCompaction_scanUp 2963
K 288 GPUTPCCFStreamCompaction_scanTop 3018
K 288 GPUTPCCFStreamCompaction_scanDown 2792
. ... ... ...
K 1 GPUTPCCompressionGatherKernels_multiBlock 4427 216.547 958603036 958603036
TPC Transformation (Tasks) 0 2 1.12 2
TPC Sector Tracking (Tasks) 994961 381470 0.29 1296042
D 37 TPC Sector Tracking (DMA to GPU) 2074 0.218 451584 12204
D 73 TPC Sector Tracking (DMA to Host) 3262 0.006 18868 258
TPC Track Merging and Fit (Tasks) 1721403 894083 0.36 2510388
D 5 TPC Track Merging and Fit (DMA to GPU) 2935 0.002 7248 1449
D 11 TPC Track Merging and Fit (DMA to Host) 16352 27.020 441816748 40165158
TPC Compression (Tasks) 703360 6055 0.01 742058
D 3 TPC Compression (DMA to GPU) 763 0.001 864 288
D 2 TPC Compression (DMA to Host) 34170 28.054 958603060 479301530
TPC Cluster Finding (Tasks) 384973 2811339 1.04 2691054
D 3316 TPC Cluster Finding (DMA to GPU) 194305 7.257 1410058680 425228
D 588 TPC Cluster Finding (DMA to Host) 51990 16.526 859175352 1461182
Prepare 493913
Wall 3804698 4094856 0.57 7241402

@github-actions
Copy link
Contributor

REQUEST FOR PRODUCTION RELEASES:
To request your PR to be included in production software, please add the corresponding labels called "async-" to your PR. Add the labels directly (if you have the permissions) or add a comment of the form (note that labels are separated by a ",")

+async-label <label1>, <label2>, !<label3> ...

This will add <label1> and <label2> and removes <label3>.

The following labels are available
async-2023-pbpb-apass4
async-2023-pp-apass4
async-2024-pp-apass1
async-2022-pp-apass7
async-2024-pp-cpass0
async-2024-PbPb-apass1
async-2024-ppRef-apass1
async-2024-PbPb-apass2
async-2023-PbPb-apass5

Copy link
Collaborator

@davidrohr davidrohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this and it makes a lot of sense. Still, I have some comments how it could be improved, please have a look.

return 0;
}

namespace
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would propose we try to move additional functions to GPUReconstructionDebug.cxx, in order not to blow up GPUReconstructionGPU.cxx with timing and debug code.

double kernelTotal = 0;
std::vector<double> kernelStepTimes(gpudatatypes::N_RECO_STEPS, 0.);

std::ofstream benchmarkCSV;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, let's try to encapsulate timing code to GPUReconstructionDebug.cxx

AddOption(deterministicGPUReconstruction, int32_t, -1, "", 0, "Make CPU and GPU debug output comparable (sort / skip concurrent parts), -1 = automatic if debugLevel >= 6 or deterministic compile flag set", def(1))
AddOption(showOutputStat, bool, false, "", 0, "Print some track output statistics")
AddOption(runCompressionStatistics, bool, false, "compressionStat", 0, "Run statistics and verification for cluster compression")
AddOption(resetTimers, int8_t, 1, "", 0, "Reset timers every event")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you change this here to 0, the timers will not be reset anymore, thus when debug mode is enabled in O2, timers will not be reset between time frames.
I think instead, I would leave the default at 1 here, and in standalone.cxx I would change the overriding to if (...runsInit > 0 && ...timingCSV == 0) rec->SetResetTimers(iRun < configStandalone.runsInit)

kernelStepTimes[stepNum] += time;
}
char bandwidth[256] = "";
Row task_row;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a more compact form, couldn't you initialize all the task_row members in one line with an initializer list?

if (kernelStepTimes[i] != 0. || mTimersRecoSteps[i].timerTotal.GetElapsedTime() != 0.) {
printf("Execution Time: Step : %11s %38s Time: %'10.0f us %64s ( Total Time : %'14.0f us, CPU Time : %'14.0f us, %'7.2fx )\n", "Tasks",
gpudatatypes::RECO_STEP_NAMES[i], kernelStepTimes[i] * 1000000 / mStatNEvents, "", mTimersRecoSteps[i].timerTotal.GetElapsedTime() * 1000000 / mStatNEvents, mTimersRecoSteps[i].timerCPU * 1000000 / mStatNEvents, mTimersRecoSteps[i].timerCPU / mTimersRecoSteps[i].timerTotal.GetElapsedTime());
Row reco_step_row;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And actually, this is quite redundant. Also the benchmarkCSV.is_open()
How about instead we have a class debugTimingWriter, and then you just have one-liners here
timingWriter.write(kernelStepTimes[i], mTimersRecoSteps[i].timerCPU, mTimersRecoSteps[i].timerTotal.GetElapsedTime(), ...)
And inside you could do

if (writeCSV) {
...}
if (writeMarkdown) { 
...
} else {
 ... classic style
}

Then whatever old parsing scripts still exist, are still compatible. and it will reduce the lines of code in GPUReconstructionCPU.cxx.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants