Skip to content

[pull] master from git:master#179

Merged
pull[bot] merged 29 commits intoturkdevops:masterfrom
git:master
Mar 17, 2026
Merged

[pull] master from git:master#179
pull[bot] merged 29 commits intoturkdevops:masterfrom
git:master

Conversation

@pull
Copy link

@pull pull bot commented Mar 17, 2026

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

shreyp135 and others added 29 commits February 19, 2026 08:59
Some functions in wt-status.c (count_stash_entries(),
read_line_from_git_path(), abbrev_oid_in_line(), and
read_rebase_todolist()) rely on the_repository as they do not have access
to a local repository instance.

Add a struct repository *r parameter to these functions and pass the local
repository instance through the callers, which already have access to it
either directly by struct repository *r or indirectly by struct wt_state
*s (s->repo).

Replace uses of the_repository in these functions with the passed parameter.

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
…nces

wt-status.c uses the global the_repository in several places even when a
repository instance is already available via struct wt_status *s or struct
repository *r.

Replace these uses of the_repository with the repository available in the
local context (i.e. s->repo or r).

The replacements of all the_repository with s->repo are mostly to cases
where a repository instance is already available via struct wt_status *s
and struct repository *r, all functions operating on struct wt_status *s
are only used after s is initialized by wt_status_prepare(), which sets
s->repo from the repository provided by the caller. As a result, s->repo is
guaranteed to be available and consistent whenever these functions are
invoked.

This reduces reliance on global state and keeps wt-status consistent,
though many functions operating on struct wt_status *s are called via
commit.c and it still relies on the_repository, but within wt-status.c the
local repository pointer refers to the same underlying repository object.

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
…hash_algo

wt-status.c still uses the global the_hash_algo even though a repository
instance is already available via struct wt_status.

Replace uses of the_hash_algo with the hash algorithm stored in the
associated repository (s->repo->hash_algo or r->hash_algo).

This removes another dependency on global state and keeps wt-status
consistent with local repository usage.

Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
When walking reachable objects in the repository, `count_objects()`
processes a set of objects and updates the `struct object_stats`. In
preparation for more granular statistics being collected, update the
`struct object_stats` for each individual object instead.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The machine-parsable formats for the git-repo(1) "structure" subcommand
print output in keyvalue pairs. Introduce the helper function
`print_keyvalue()` to remove some code duplication and improve
readability.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "structure" output for git-repo(1) shows the total inflated and disk
sizes of reachable objects in the repository, but doesn't show the size
of the largest individual objects. Since an individual object may be a
large contributor to the overall repository size, it is useful for users
to know the maximum size of individual objects.

While interating across objects, record the size and OID of the largest
objects encountered for each object type to provide as output. Note that
the default "table" output format only displays size information and not
the corresponding OID. In a subsequent commit, the table format is
updated to add table annotations that mention the OID.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "structure" output for git-repo(1) does not show the corresponding
OIDs for the largest objects in its "table" output. Update the output to
include a list of OID annotations with an index to the corresponding row
in the table.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Complex merge events may produce an octopus merge where the resulting
merge commit has more than two parents. While iterating through objects
in the repository for git-repo-structure, identify the commit with the
most parents and display it in the output.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The size of a tree object usually corresponds with the number of entries
it has. While iterating through objects in the repository for
git-repo-structure, identify the tree with the most entries and display
it in the output.

Signed-off-by: Justin Tobler <jltobler@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
'read_gitfile_gently()' treats any non-regular file as
'READ_GITFILE_ERR_NOT_A_FILE' and fails to discern between 'ENOENT'
and other stat failures. This flawed error reporting is noted by two
'NEEDSWORK' comments.

Address these comments by introducing two new error codes:
'READ_GITFILE_ERR_MISSING'(which groups the "file missing" scenarios
together) and 'READ_GITFILE_ERR_IS_A_DIR':

1. Update 'read_gitfile_error_die()' to treat 'IS_A_DIR', 'MISSING',
'NOT_A_FILE' and 'STAT_FAILED' as non-fatal no-ops. This accommodates
intentional non-repo scenarios (e.g., GIT_DIR=/dev/null).

2. Explicitly catch 'NOT_A_FILE' and 'STAT_FAILED' during
discovery and call 'die()' if 'die_on_error' is set.

3. Unconditionally pass '&error_code' to 'read_gitfile_gently()'.

4. Only invoke 'is_git_directory()' when we explicitly receive
   'READ_GITFILE_ERR_IS_A_DIR', avoiding redundant checks.

Additionally, audit external callers of 'read_gitfile_gently()' in
'submodule.c' and 'worktree.c' to accommodate the refined error codes.

Signed-off-by: Tian Yuchen <a3205153416@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
In check_connected(), if the transport tells us we got a single packfile
that has already been verified as self-contained and connected, then we
can skip checking connectivity for any tips that are mentioned in that
pack. This goes back to c6807a4 (clone: open a shortcut for
connectivity check, 2013-05-26).

We don't need to open that pack until we are about to start sending oids
to our child rev-list process, since that's when we check whether they
are in the self-contained pack. Let's push the opening of that pack
further down in the function. That saves us from having to clean it up
when we leave the function early (and by the time have opened the
rev-list process, we never leave the function early, since we have to
clean up the child process).

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since c6807a4 (clone: open a shortcut for connectivity check,
2013-05-26), we may open a one-off packed_git struct to check what's in
the pack we just received. At the end of the function we throw away the
struct (rather than linking it into the repository struct as usual).

We used to leak the struct until dd4143e (connected.c: free the
"struct packed_git", 2022-11-08), which calls free(). But that's not
sufficient; inside the struct we'll have mmap'd the pack idx data from
disk, which needs an munmap() call.

Building with SANITIZE=leak doesn't detect this, because we are leaking
our own mmap(), and it only finds heap allocations from malloc(). But if
we use our compat mmap implementation like this:

  make NO_MMAP=MapsBecomeMallocs SANITIZE=leak

then LSan will notice the leak, because now it's a regular heap buffer
allocated by malloc().

We can fix it by calling close_pack(), which will free any associated
memory. Note that we need to check for NULL ourselves; unlike free(), it
is not safe to pass a NULL pointer to close_pack().

Signed-off-by: Jeff King <peff@peff.net>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The usual entry point for loading the pack revindex is the
load_pack_revindex() function. It returns immediately if the packed_git
has a non-NULL revindex or revindex data field (representing an
in-memory or mmap'd .rev file, respectively), since the data is already
loaded.

But in 5a6072f (fsck: validate .rev file header, 2023-04-17) the fsck
code path switched to calling load_pack_revindex_from_disk() directly,
since it wants to check the on-disk data (if there is any). But that
function does _not_ check to see if the data has already been loaded; it
just maps the file, overwriting the revindex_map pointer (and pointing
revindex_data inside that map). And in that case we've leaked the mmap()
pointed to by revindex_map (if it was non-NULL).

This usually doesn't happen, since fsck wouldn't need to load the
revindex for any reason before we get to these checks. But there are
some cases where it does. For example, is_promisor_object() runs
odb_for_each_object() with the PACK_ORDER flag, which uses the revindex.

This happens a few times in our test suite, but SANITIZE=leak doesn't
detect it because we are leaking an mmap(), not a heap-allocated buffer
from malloc(). However, if you build with NO_MMAP, then our compat mmap
will read into a heap buffer instead, and LSan will complain. This
causes failures in t5601, t0410, t5702, and t5616.

We can fix it by checking for existing revindex_data when loading from
disk. This is redundant when we're called from load_pack_revindex(), but
it's a cheap check. The alternative is to teach check_pack_rev_indexes()
in fsck to skip the load, but that seems messier; it doesn't otherwise
know about internals like revindex_map and revindex_data.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
We mmap() a loose object file, storing the result in the local variable
"mapped", which is eventually assigned into our stream struct as
"st.mapped". If we hit an error, we jump to an error label which does:

  munmap(st.mapped, st.mapsize);

to clean up. But this is wrong; we don't assign st.mapped until the end
of the function, after all of the "goto error" jumps. So this munmap()
is never cleaning up anything (st.mapped is always NULL, because we
initialize the struct with calloc).

Instead, we should feed the local variable to munmap().

This leak is due to 595296e (streaming: allocate stream inside the
backend-specific logic, 2025-11-23), which introduced the local
variable. Before that, we assigned the mmap result directly into
st.mapped. It was probably switched there so that we do not have to
allocate/free the struct when the map operation fails (e.g., because we
don't have the loose object). Before that commit, the struct was passed
in from the caller, so there was no allocation at all.

You can see the leak in the test suite by building with:

  make SANITIZE=leak NO_MMAP=1 CC=clang

and running t1060. We need NO_MMAP so that the mmap() is backed by an
actual malloc(), which allows LSan to detect it. And the leak seems not
to be detected when compiling with gcc, probably due to some internal
compiler decisions about how the stack memory is written.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The past few commits fixed some cases where we leak memory allocated by
mmap(). Building with SANITIZE=leak doesn't detect these because it
covers only heap buffers allocated by malloc().

But if we build with NO_MMAP, our compat mmap() implementation will
allocate a heap buffer and pread() into it. And thus Lsan will detect
these leaks for free.

Using NO_MMAP is less performant, of course, since we have to use extra
memory and read in the whole file, rather than faulting in pages from
disk. But LSan builds are already slow, and this doesn't make them
measurably worse. Getting extra coverage for our leak-checking is worth
it.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit taught the Makefile to turn on NO_MMAP in this
instance. We should do the same with meson for consistency. We already
do this for ASan builds, so we can just tweak one conditional.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git supports creating additional commands through aliases, and through
placement of executables with a "git-" prefix in the PATH.

This information was not easy enough to find - users will look for this
information around the command description, but the documentation
exists in other locations.

Update the "GIT COMMANDS" section to reference the relevant sections,
making it easier for to find this information.

Signed-off-by: Omri Sarig <omri.sarig13@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running `git` commands inside command substitutions like

    test "$(git rev-parse A)" = "$(git rev-parse B)"

can hide failures from the `git` invocations and provide little
diagnostic information when `test` fails.

Use `test_cmp` when comparing against a stored expected value so
mismatches show both expected and actual output. Use `test_cmp_rev`
when comparing two revisions. These helpers produce clearer failure
output, making it easier to understand what went wrong.

Suggested-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Francesco Paparatto <francescopaparatto@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git history reword expects a single valid revision argument and errors
out if it doesn't get it.  In that case the struct rev_info passed to
release_revisions() for cleanup is still uninitialized, which can result
in attempts to free(3) random pointers.  Avoid that by initializing the
structure.

Signed-off-by: René Scharfe <l.s.r@web.de>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
The hashmap API requires the comparison function to take const pointers.
However, patch_id_neq() uses lazy evaluation to compute patch IDs on
demand. As established in b3dfeeb (rebase: avoid computing unnecessary
patch IDs, 2016-07-29), this avoids unnecessary work since not all
objects in the hashmap will eventually be compared.

Remove the ten-year-old "NEEDSWORK" comment and formally document
this intentional design trade-off.

Signed-off-by: Tian Yuchen <cat@malon.dev>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce dependence on the global the_hash_algo and the_repository
variables of wt-status code path.

* sp/wt-status-wo-the-repository:
  wt-status: use hash_algo from local repository instead of global the_hash_algo
  wt-status: replace uses of the_repository with local repository instances
  wt-status: pass struct repository through function parameters
"git repo structure" command learns to report maximum values on
various aspects of objects it inspects.

* jt/repo-structure-extrema:
  builtin/repo: find tree with most entries
  builtin/repo: find commit with most parents
  builtin/repo: add OID annotations to table output
  builtin/repo: collect largest inflated objects
  builtin/repo: add helper for printing keyvalue output
  builtin/repo: update stats for each object
The construct 'test "$(command)" = expectation' loses the exit
status from the command, which has been fixed by breaking up the
statement into pieces.

* fp/t3310-unhide-git-failures:
  t3310: avoid hiding failures from rev-parse in command substitutions
Doc update.

* os/doc-git-custom-commands:
  doc: make it easier to find custom command information
While discovering a ".git" directory, the code treats any stat()
failure as a sign that a filesystem entity .git does not exist
there, and ignores ".git" that is not a "gitdir" file or a
directory.  The code has been tightened to notice and report
filesystem corruption better.

* ty/setup-error-tightening:
  setup: improve error diagnosis for invalid .git files
Plug a few leaks where mmap'ed memory regions are not unmapped.

* jk/unleak-mmap:
  meson: turn on NO_MMAP when building with LSan
  Makefile: turn on NO_MMAP when building with LSan
  object-file: fix mmap() leak in odb_source_loose_read_object_stream()
  pack-revindex: avoid double-loading .rev files
  check_connected(): fix leak of pack-index mmap
  check_connected(): delay opening new_pack
Fix use of uninitialized variable.

* rs/history-ergonomics-updates-fix:
  history: initialize rev_info in cmd_history_reword()
In-code comment update to record a design decision to allow lazy
computation of patch IDs.

* ty/patch-ids-document-lazy-eval:
  patch-ids: document intentional const-casting in patch_id_neq()
Signed-off-by: Junio C Hamano <gitster@pobox.com>
@pull pull bot locked and limited conversation to collaborators Mar 17, 2026
@pull pull bot added the ⤵️ pull label Mar 17, 2026
@pull pull bot merged commit ca1db8a into turkdevops:master Mar 17, 2026
2 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants