[pull] master from git:master#179
Merged
pull[bot] merged 29 commits intoturkdevops:masterfrom Mar 17, 2026
Merged
Conversation
Some functions in wt-status.c (count_stash_entries(), read_line_from_git_path(), abbrev_oid_in_line(), and read_rebase_todolist()) rely on the_repository as they do not have access to a local repository instance. Add a struct repository *r parameter to these functions and pass the local repository instance through the callers, which already have access to it either directly by struct repository *r or indirectly by struct wt_state *s (s->repo). Replace uses of the_repository in these functions with the passed parameter. Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
…nces wt-status.c uses the global the_repository in several places even when a repository instance is already available via struct wt_status *s or struct repository *r. Replace these uses of the_repository with the repository available in the local context (i.e. s->repo or r). The replacements of all the_repository with s->repo are mostly to cases where a repository instance is already available via struct wt_status *s and struct repository *r, all functions operating on struct wt_status *s are only used after s is initialized by wt_status_prepare(), which sets s->repo from the repository provided by the caller. As a result, s->repo is guaranteed to be available and consistent whenever these functions are invoked. This reduces reliance on global state and keeps wt-status consistent, though many functions operating on struct wt_status *s are called via commit.c and it still relies on the_repository, but within wt-status.c the local repository pointer refers to the same underlying repository object. Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
…hash_algo wt-status.c still uses the global the_hash_algo even though a repository instance is already available via struct wt_status. Replace uses of the_hash_algo with the hash algorithm stored in the associated repository (s->repo->hash_algo or r->hash_algo). This removes another dependency on global state and keeps wt-status consistent with local repository usage. Signed-off-by: Shreyansh Paliwal <shreyanshpaliwalcmsmn@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
When walking reachable objects in the repository, `count_objects()` processes a set of objects and updates the `struct object_stats`. In preparation for more granular statistics being collected, update the `struct object_stats` for each individual object instead. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The machine-parsable formats for the git-repo(1) "structure" subcommand print output in keyvalue pairs. Introduce the helper function `print_keyvalue()` to remove some code duplication and improve readability. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "structure" output for git-repo(1) shows the total inflated and disk sizes of reachable objects in the repository, but doesn't show the size of the largest individual objects. Since an individual object may be a large contributor to the overall repository size, it is useful for users to know the maximum size of individual objects. While interating across objects, record the size and OID of the largest objects encountered for each object type to provide as output. Note that the default "table" output format only displays size information and not the corresponding OID. In a subsequent commit, the table format is updated to add table annotations that mention the OID. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The "structure" output for git-repo(1) does not show the corresponding OIDs for the largest objects in its "table" output. Update the output to include a list of OID annotations with an index to the corresponding row in the table. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Complex merge events may produce an octopus merge where the resulting merge commit has more than two parents. While iterating through objects in the repository for git-repo-structure, identify the commit with the most parents and display it in the output. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The size of a tree object usually corresponds with the number of entries it has. While iterating through objects in the repository for git-repo-structure, identify the tree with the most entries and display it in the output. Signed-off-by: Justin Tobler <jltobler@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
'read_gitfile_gently()' treats any non-regular file as 'READ_GITFILE_ERR_NOT_A_FILE' and fails to discern between 'ENOENT' and other stat failures. This flawed error reporting is noted by two 'NEEDSWORK' comments. Address these comments by introducing two new error codes: 'READ_GITFILE_ERR_MISSING'(which groups the "file missing" scenarios together) and 'READ_GITFILE_ERR_IS_A_DIR': 1. Update 'read_gitfile_error_die()' to treat 'IS_A_DIR', 'MISSING', 'NOT_A_FILE' and 'STAT_FAILED' as non-fatal no-ops. This accommodates intentional non-repo scenarios (e.g., GIT_DIR=/dev/null). 2. Explicitly catch 'NOT_A_FILE' and 'STAT_FAILED' during discovery and call 'die()' if 'die_on_error' is set. 3. Unconditionally pass '&error_code' to 'read_gitfile_gently()'. 4. Only invoke 'is_git_directory()' when we explicitly receive 'READ_GITFILE_ERR_IS_A_DIR', avoiding redundant checks. Additionally, audit external callers of 'read_gitfile_gently()' in 'submodule.c' and 'worktree.c' to accommodate the refined error codes. Signed-off-by: Tian Yuchen <a3205153416@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
In check_connected(), if the transport tells us we got a single packfile that has already been verified as self-contained and connected, then we can skip checking connectivity for any tips that are mentioned in that pack. This goes back to c6807a4 (clone: open a shortcut for connectivity check, 2013-05-26). We don't need to open that pack until we are about to start sending oids to our child rev-list process, since that's when we check whether they are in the self-contained pack. Let's push the opening of that pack further down in the function. That saves us from having to clean it up when we leave the function early (and by the time have opened the rev-list process, we never leave the function early, since we have to clean up the child process). Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Since c6807a4 (clone: open a shortcut for connectivity check, 2013-05-26), we may open a one-off packed_git struct to check what's in the pack we just received. At the end of the function we throw away the struct (rather than linking it into the repository struct as usual). We used to leak the struct until dd4143e (connected.c: free the "struct packed_git", 2022-11-08), which calls free(). But that's not sufficient; inside the struct we'll have mmap'd the pack idx data from disk, which needs an munmap() call. Building with SANITIZE=leak doesn't detect this, because we are leaking our own mmap(), and it only finds heap allocations from malloc(). But if we use our compat mmap implementation like this: make NO_MMAP=MapsBecomeMallocs SANITIZE=leak then LSan will notice the leak, because now it's a regular heap buffer allocated by malloc(). We can fix it by calling close_pack(), which will free any associated memory. Note that we need to check for NULL ourselves; unlike free(), it is not safe to pass a NULL pointer to close_pack(). Signed-off-by: Jeff King <peff@peff.net> Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The usual entry point for loading the pack revindex is the load_pack_revindex() function. It returns immediately if the packed_git has a non-NULL revindex or revindex data field (representing an in-memory or mmap'd .rev file, respectively), since the data is already loaded. But in 5a6072f (fsck: validate .rev file header, 2023-04-17) the fsck code path switched to calling load_pack_revindex_from_disk() directly, since it wants to check the on-disk data (if there is any). But that function does _not_ check to see if the data has already been loaded; it just maps the file, overwriting the revindex_map pointer (and pointing revindex_data inside that map). And in that case we've leaked the mmap() pointed to by revindex_map (if it was non-NULL). This usually doesn't happen, since fsck wouldn't need to load the revindex for any reason before we get to these checks. But there are some cases where it does. For example, is_promisor_object() runs odb_for_each_object() with the PACK_ORDER flag, which uses the revindex. This happens a few times in our test suite, but SANITIZE=leak doesn't detect it because we are leaking an mmap(), not a heap-allocated buffer from malloc(). However, if you build with NO_MMAP, then our compat mmap will read into a heap buffer instead, and LSan will complain. This causes failures in t5601, t0410, t5702, and t5616. We can fix it by checking for existing revindex_data when loading from disk. This is redundant when we're called from load_pack_revindex(), but it's a cheap check. The alternative is to teach check_pack_rev_indexes() in fsck to skip the load, but that seems messier; it doesn't otherwise know about internals like revindex_map and revindex_data. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
We mmap() a loose object file, storing the result in the local variable "mapped", which is eventually assigned into our stream struct as "st.mapped". If we hit an error, we jump to an error label which does: munmap(st.mapped, st.mapsize); to clean up. But this is wrong; we don't assign st.mapped until the end of the function, after all of the "goto error" jumps. So this munmap() is never cleaning up anything (st.mapped is always NULL, because we initialize the struct with calloc). Instead, we should feed the local variable to munmap(). This leak is due to 595296e (streaming: allocate stream inside the backend-specific logic, 2025-11-23), which introduced the local variable. Before that, we assigned the mmap result directly into st.mapped. It was probably switched there so that we do not have to allocate/free the struct when the map operation fails (e.g., because we don't have the loose object). Before that commit, the struct was passed in from the caller, so there was no allocation at all. You can see the leak in the test suite by building with: make SANITIZE=leak NO_MMAP=1 CC=clang and running t1060. We need NO_MMAP so that the mmap() is backed by an actual malloc(), which allows LSan to detect it. And the leak seems not to be detected when compiling with gcc, probably due to some internal compiler decisions about how the stack memory is written. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The past few commits fixed some cases where we leak memory allocated by mmap(). Building with SANITIZE=leak doesn't detect these because it covers only heap buffers allocated by malloc(). But if we build with NO_MMAP, our compat mmap() implementation will allocate a heap buffer and pread() into it. And thus Lsan will detect these leaks for free. Using NO_MMAP is less performant, of course, since we have to use extra memory and read in the whole file, rather than faulting in pages from disk. But LSan builds are already slow, and this doesn't make them measurably worse. Getting extra coverage for our leak-checking is worth it. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The previous commit taught the Makefile to turn on NO_MMAP in this instance. We should do the same with meson for consistency. We already do this for ASan builds, so we can just tweak one conditional. Signed-off-by: Jeff King <peff@peff.net> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Git supports creating additional commands through aliases, and through placement of executables with a "git-" prefix in the PATH. This information was not easy enough to find - users will look for this information around the command description, but the documentation exists in other locations. Update the "GIT COMMANDS" section to reference the relevant sections, making it easier for to find this information. Signed-off-by: Omri Sarig <omri.sarig13@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Running `git` commands inside command substitutions like
test "$(git rev-parse A)" = "$(git rev-parse B)"
can hide failures from the `git` invocations and provide little
diagnostic information when `test` fails.
Use `test_cmp` when comparing against a stored expected value so
mismatches show both expected and actual output. Use `test_cmp_rev`
when comparing two revisions. These helpers produce clearer failure
output, making it easier to understand what went wrong.
Suggested-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Francesco Paparatto <francescopaparatto@gmail.com>
Reviewed-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
git history reword expects a single valid revision argument and errors out if it doesn't get it. In that case the struct rev_info passed to release_revisions() for cleanup is still uninitialized, which can result in attempts to free(3) random pointers. Avoid that by initializing the structure. Signed-off-by: René Scharfe <l.s.r@web.de> Signed-off-by: Junio C Hamano <gitster@pobox.com>
The hashmap API requires the comparison function to take const pointers. However, patch_id_neq() uses lazy evaluation to compute patch IDs on demand. As established in b3dfeeb (rebase: avoid computing unnecessary patch IDs, 2016-07-29), this avoids unnecessary work since not all objects in the hashmap will eventually be compared. Remove the ten-year-old "NEEDSWORK" comment and formally document this intentional design trade-off. Signed-off-by: Tian Yuchen <cat@malon.dev> Signed-off-by: Junio C Hamano <gitster@pobox.com>
Reduce dependence on the global the_hash_algo and the_repository variables of wt-status code path. * sp/wt-status-wo-the-repository: wt-status: use hash_algo from local repository instead of global the_hash_algo wt-status: replace uses of the_repository with local repository instances wt-status: pass struct repository through function parameters
"git repo structure" command learns to report maximum values on various aspects of objects it inspects. * jt/repo-structure-extrema: builtin/repo: find tree with most entries builtin/repo: find commit with most parents builtin/repo: add OID annotations to table output builtin/repo: collect largest inflated objects builtin/repo: add helper for printing keyvalue output builtin/repo: update stats for each object
The construct 'test "$(command)" = expectation' loses the exit status from the command, which has been fixed by breaking up the statement into pieces. * fp/t3310-unhide-git-failures: t3310: avoid hiding failures from rev-parse in command substitutions
Doc update. * os/doc-git-custom-commands: doc: make it easier to find custom command information
While discovering a ".git" directory, the code treats any stat() failure as a sign that a filesystem entity .git does not exist there, and ignores ".git" that is not a "gitdir" file or a directory. The code has been tightened to notice and report filesystem corruption better. * ty/setup-error-tightening: setup: improve error diagnosis for invalid .git files
Plug a few leaks where mmap'ed memory regions are not unmapped. * jk/unleak-mmap: meson: turn on NO_MMAP when building with LSan Makefile: turn on NO_MMAP when building with LSan object-file: fix mmap() leak in odb_source_loose_read_object_stream() pack-revindex: avoid double-loading .rev files check_connected(): fix leak of pack-index mmap check_connected(): delay opening new_pack
Fix use of uninitialized variable. * rs/history-ergonomics-updates-fix: history: initialize rev_info in cmd_history_reword()
In-code comment update to record a design decision to allow lazy computation of patch IDs. * ty/patch-ids-document-lazy-eval: patch-ids: document intentional const-casting in patch_id_neq()
Signed-off-by: Junio C Hamano <gitster@pobox.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
See Commits and Changes for more details.
Created by
pull[bot] (v2.0.0-alpha.4)
Can you help keep this open source service alive? 💖 Please sponsor : )