Hacker News | How Git cherry-pick and revert use 3-way merge

How Git cherry-pick and revert use 3-way merge(jvns.ca)

211 points by hasheddan 2 years ago | 123 comments

juped 2 years ago
You can set the config option `merge.conflictStyle` to `diff3` to see all three inputs to a merge conflict (left, base, and right, rather than just left and right); this lets you actually resolve them, because you can see the changes left made to base, and the changes right made to base, and make both changes, rather than having to guess at what's going on.
I wish this were the default. (It's not broadly because it displays certain arcane recursive merges in a really bizarre way; see the mailing list for discussion.)
[-]
- Amorymeltzer 2 years ago
  I made the change a few years ago, and never looked back. It can take a little adjusting to at first, because it's not super clear exactly what you're looking at, but it's definitely worth it. For the "simple" cases you're no worse off, but for the gnarly conflicts it's such a huge help to have the full context there.
  ----
  Truth be told, I use zdiff3, the "zealous" diff3, where git >= 2.35[1] is available. Some good examples/thoughts at https://stackoverflow.com/a/71254097/2521092 and https://stackoverflow.com/a/70387424/2521092
  1: https://github.blog/2022-01-24-highlights-from-git-2-35/
- secondcoming 2 years ago
  I recommend the p4merge tool for visualising 3-way diffs.
crdrost 2 years ago
I must be misunderstanding...
If you are in the situation
```
    B -- X1 -- X2 -- X3 -- Head
     \
      Y1 -- Y2 -- Cherry
```
then Y2 can't be a merge-base for the difference between Head and Cherry, the merge-base is B, that's a graph property and not an input. (You can even calculate it with `git merge-base`.) Right?
And saying that "Y2 and Cherry are the patch" doesn't help my brain much.
Now just thinking out loud I think there's something there... I know from Darcs that basically there is a mathematically precise thing to do here with two merges, so basically since "merge" is just "inverse, apply_patch, commute, drop_commit" and "cherry_pick" is just "commute, drop_commit, merge", you have a first merge being of the tree,
```
    B -- Y1 -- Y2 -- inv(Y2) -- inv(Y1) -- M
                 \                        /
                  Cherry ----------------/
```
and then we squash and discard Y1 and Y2's metadata,
```
    B -- Cherry'

    patch := diff(M, B)
    Cherry' := commit(patch, [B], Cherry.metadata) 
```
and then we have something suitable to merge directly. It doesn't sound like Git does that though?
[-]
- andrewla 2 years ago
  My understanding of the article is that for the purposes of the cherry-pick, it "pretends" that the parent of Y2 is the base of the merge. Contrary to what the post says, git only cares about contents, not patches or diffs.
  The effect of this operation is that it compares what it would take to go from Y1 -> X3 with what it would take to go from Y1 -> Y2 -- it doesn't care about the path. If Y1 added a new function, for example, then when computing the three-way diff it would say "well, clearly in the X3 side of the merge that function was removed".
  This is astonishingly clever.
  If conflict-free, it will have all the content from X3 and all the changes from Y1->Y2, but none of the changes from B->Y1. Once again git has blown my mind.
  [-]
  - crdrost 2 years ago
    Ah OK, I think I am getting it now.
    So 3-way-merge just takes three inputs, "base, left, right", it says diff(base, left), diff(base, right), merge those two diffs together into one bigger diff.
    If you were to actually ask, "how do I get from Y2 to Head?" the answer commit-wise would require some conceptual rollbacks, the resulting conceptual commit-tree looking like
    B -- Y1 -- Y2 -- inv(Y2) -- inv(Y1) -- X1 -- X2 -- X3 -- Head \ Cherry
    but, because 3-way-merge is a crude heuristic, it doesn't actually commute Cherry through inv(Y2), inv(Y1), X1, X2, X3, and Head the way that, say, a `git rebase` would. It just doesn't have to.
    And in terms of what I was writing above, what I was missing was that the two merges can be coalesced into one merge.
    The most mind-blowing thing about this is that if you did a real rebase it'd actually protect against the problems with associativity that 3-way-merge has, by forcing you to commute through each individual intermediate patch. Like, you don't need Darcs or Pijul if you'll just write some wrapper scripts around Git.
    [-]
    pmeunier 2 years ago
    > the problems with associativity that 3-way-merge has, by forcing you to commute through each individual intermediate patch. Like, you don't need Darcs or Pijul if you'll just write some wrapper scripts around Git.
    Careful here, associativity is a completely different property from commutativity. Also, Pijul doesn't "commute through each individual intermediate patch" at all, its algorithmic complexity is actually lower than Git's.
- oasisaimlessly 2 years ago
  The output of 'git merge-base' is the base (of the 3-way diff) used when doing a normal merge. That's all.
  Other (non-merge) operations like cherry-pick and revert use a different base commit. If you squint, you can view these operations as a merge with an abnormally-chosen base commit, which is the whole point here.
  The issue is just one of nomenclature; just because the '3way_merge(v1, base, v2)` algorithm is being used doesn't mean that a merge (with 'base = merge_base(v1, v2)') is being performed.
- dan-robertson 2 years ago
  Yeah, normally for 3-way merges you want the kind of graph relationship you suggest (that the base is the most recent common ancestor for the two versions you want to merge)
  I think two questions you should ask are:
  1) what does it mean to ‘apply a patch’ where the patch is the diff from X to Y but you are applying to a different revision from X.
  2) what even is a patch in git?
  Darcs thinks in terms of patches but git, despite showing you lots of diffs is really only thinking in terms of snapshots. The answer to question 2 is that patches in git are diffs between revisions and, in particular between the parent of a commit and the revision of the commit itself.
  The answer to (1) is that ‘apply a patch’ in git is actually just a 3-way merge.
  If you write patches as arrows between revisions then the 3-way merge of two patches P, Q from a shared base is applying one patch to the tip of the other:
  P B ~~~~~~~~> X \ \ \ Q \ Q’ \ \ ~~~> Y ~~~> M
  Here M is the 3-way merge of X and Y at base B and Q’, and the diff from X to M is the patch Q applied at X. For a normal merge, you make X and Y parents of M. Git merely generalises this to cases where B is not the parent of X: you imagine a patch going from the base of the one implied by the commit you want to cherry-pick (I.e. the parent of the commit) all the way to the place where you want to base it on. This time, you are only doing the 3-way merge to compute the tip of your patch, you already know the base (it’s the place where you are applying the patch) and instead of producing a merge commit, you produce a commit with all the data from the old one before the cherry-pick except that you set the base to the place you cherry-pick onto.
  In darcs theory of patches terminology, you commute the cherry-pick patch backwards (ie going from (?->B)(B->Y1)(Y1->Y2)(Y2->Cherry) to (?->B)(B->Cherry’)(Cherry’->Y1’)(Y1’->Y2’)), somehow inject it before the B->X1 patch, and then commute it forwards to sit after X3->Head.
  One question to ask is whether this magic 3-way merge to apply patches works in these cases. And the answer is yes, it works about as well as regular 3-way merges, so you can get merge conflicts as well as some varieties of incorrect results (cf the Darcs docs about this)
- Quekid5 2 years ago
  Git cherry-picking (rebasing, etc) is about applying patches to things. Those patches are based on snapshots and thus you can say "Apply the Y2..Cherry" diff to Head without a second thought.
  There's no "history" to where Y2 or Cherry came from. If the patch applies, it applies.
  Any problems due to that are to be resolved later.
  It might be instructive to read up on rebase --onto and what that can do. Cherry pick is just a special case of that... mostly.
forkerenok 2 years ago
Very insightful read, thanks for sharing!
I remember many cases when I was cherry-picking things on a slightly changed (with overlaps) HEAD and it just worked. So I had this intuition that it's smarter than a flat patch, but had no idea that it is a legit 3-way merge!
Slightly OT: I noticed that the git source code links in the article are referencing master branch. I think it would be better for longevity to fix a commit hash.
juped 2 years ago
NB: git apply knows --3way, but the patch being applied has to record the hashes of the blobs it applied to when generated (which it will if it was generated by git).
git am also knows --3way transitively.
b212 2 years ago
Call me dumb but am I the only one that uses, 99% of the time:
git pull git add . git commit git merge git push git status git log git checkout — filename
And nothing else? Using the default merge flow is super simple, rebasing with force pushes is often hell. Reverting with conflicts is hell as well, I sometimes prefer to do undo commit than to revert (I know).
Anyway git is rather simple if you stick to the basics and I feel like that’s enough. I avoid rebase like plague, I dont care about merge commits at all.
xg15 2 years ago
It's also really useful to understand rebases. At least for me the insight that a rebase is a series of 3-way-merges made a lot of the errors and the infamous "repeating merge conflicts" problem finally understandable.
agumonkey 2 years ago
Always nice to dig below the surface for such tools. Makes me wanna read a lot more (clean c codebase and/or other tools abstractions)
hiAndrewQuinn 2 years ago
I could use a Git By Example course where we walk through increasingly gnarly gituations.
[-]
- chx 2 years ago
  The only git tutorial that worked for me is the exact opposite. It even warns you
  > you can only really use Git if you understand how Git works. Merely memorizing which commands you should run at what times will work in the short run, but it’s only a matter of time before you get stuck or, worse, break something.
  https://www.cduan.com/technical/git/
  this tutorial was an absolute revelation for me. I can't recommend it strongly enough.
  Ps: I also found git destroying my work via git reset to be really annoying so I built a safety net. https://gist.github.com/chx/85db0ebed1e02ab14b1a65b6024dea29
  [-]
  - seba_dos1 2 years ago
    Exactly. Learning by examples is the worst way to learn git - which sucks for people who learn best by examples, but that's how it is. You often hear people complaining about git's UX - and I agree that it could be much better, though I'd say its main sin is that it almost actively tries to confuse you into developing wrong mental models on how it works (somewhat ironically, it often does it to provide better UX for those who already understand it).
    A very obvious example of that would be right there in `git show`, or the equivalent parts of GitHub's and GitLab's UIs. A command that shows you the content of a commit, right? Well, right - it does show you the author, date, commit message... but then it also shows a diff.
    So, being a beginner who learns by examples, you say - I get it! A commit is a patch! It gets me from one state of the repo to another!
    Except you couldn't be more wrong. `git show` automatically calculated a difference between the requested commit and its parent to present it to you, but didn't bother to tell you that. Of course, it makes perfect sense - it wouldn't be very useful to show a complete state of the files under the commit in question, it's much more useful to show a diff against its parent(s). However, it doesn't even tell you which commit the diff was calculated against, it's completely hidden by default. How are you supposed to develop a proper mental model of total basics such as "what is a commit" if it hides such essential clues from you?
    Answer: you need to read up on the theory. There's no way around it, at least not now.
  - yencabulator 2 years ago
    You might enjoy https://eagain.net/articles/git-for-computer-scientists
    [-]
    chx 2 years ago
    Not sure whether I count as a computer scientist, I only have a math teacher masters and half an information engineering degree, I dropped out.
    Even knowing graph theory and understanding git I still don't have the shortest idea what https://eagain.net/articles/git-for-computer-scientists/git-... this even wants this to be.
    [-]
    yencabulator 2 years ago
    A tree can point to other trees, making them subdirectories. A tree can point to blobs, making them files in that directory. File name, mode and such metadata is a property of the edge, and blobs themselves are nameless containers of bytes.
  - Quekid5 2 years ago
    This is so true. Understand what git is truly about, namely a particular kind of Graph and how to apply changes across those.
- bloopernova 2 years ago
  https://www.leshenko.net/p/ugit/#
  There's another that was useful, I'll dig it out and update this comment.
  This one helped me a lot: https://alexwlchan.net/a-plumbers-guide-to-git/
- conor- 2 years ago
  I think this possibly fills what you're looking for.
  https://learngitbranching.js.org/
- Tyr42 2 years ago
  Also check out the New Old Thing for more gnarly git examples.
  Raymond Chen is excellent.
  [-]
  - Quekid5 2 years ago
    Raymond Chen's blog is a gift to this world.
chx 2 years ago
What would be the difference between
git show -U1000000000 $hash > x.patch; git apply x.patch
and
git cherry-pick $hash
for files less than 1000000000 lines?
[-]
- coryrc 2 years ago
  3-way merge has more information available to avoid some conflicts that would have to be manually resolved with the patch. Otherwise, you got the idea.
  [-]
  - chx 2 years ago
    > 3-way merge has more information available
    that was my question :) what information?
    [-]
    coryrc 2 years ago
    Patch lacks the diff from original commit to the commit-having-patch-applied-to. So if the name of a function referenced in the patch changed between those two, it will fail to apply, yet 3-way merge could continue. Additionally patch is very simplistic.
wnoise 2 years ago
How else would it?
[-]
- dboreham 2 years ago
  Was going to say the same: all the hash linking and what not is fine but 3-way diff is how SCC is actually done (otherwise it'd just be "a DAG comprising whatever unrelated file contents the user provided".
not2b 2 years ago
Note that three-way merge is much older than git. It's implemented by the diff3 program, which was part of Version 7 Unix from 1979 on, and the same algorithm is used by CVS, Perforce, and pretty much any other revision control system. The GNU version is part of diffutils, and of course there's also a BSD version.
[-]
- crdrost 2 years ago
  Note also that three-way merge is not an associative operation which can sometimes lead to some surprising behaviors, https://winddy.buzz/manual/why_pijul.html
  [-]
  - not2b 2 years ago
    A major source of conflicts I see in development might be resolved if we had something like a four-way merge.
    Consider this case: we have a base version, and two changes have been applied to that version. Change 1 adds a function at a given position in a file. Change 2 adds a different function at the same position. But we want both functions to exist in the final result. Suppose the release branch starts with the release version, and then change 2 is merged. Now we try to merge change 1. We get a conflict because the merge is at the same point. But if we could look at a fourth input (the state of the file on the development branch, or at the earliest state that has both change 1 and change 2 as predecessors), we could see that the resolution is to add both functions, in the order matching the order in the development branch.
    The four cases being considered are: the base version; the v1 change version (base + v1), the v2 change version (base + v2), and the earliest version that has both changes already applied.
- loeg 2 years ago
  The BSD version is sort of very recent, for what it's worth -- FreeBSD imported a not fully functional version in 2017 and has seen more work on it in 2022: https://github.com/freebsd/freebsd-src/commits/main/usr.bin/... , but the default version shipped is still GNU diff3 from ~2007 (before the GPL3): https://man.freebsd.org/cgi/man.cgi?query=diff3&apropos=0&se... .
  [-]
  - not2b 2 years ago
    I wasn't aware of that. Perhaps they never re-implemented the version from Bell Labs so they didn't have a free program (other than the GNU one) that they could release until more recently?
vvpan 2 years ago
Honest question - does git have to be as complicated as it is for most usages? The number of times I got stuck fixing (mine and other people's) git issues is way more than seems necessary. I have not used a merge at a workplace for so long that I forget how it works, because rebases make so much more sense. Git provides a thousand features and I find myself using only three. And yet mercurial (which I remember having a more streamlined UX) went the way of the dodo.
[-]
- AlotOfReading 2 years ago
  Git could definitely have a simpler interface, but massively simplifying the mental burden would ultimately involve reducing its power as a tool. A significant part of the appeal is that the power users and the novices can both agree on the same tool and make it work for their needs. There's a huge "aftermarket" of frontends and scripts to simplify the experience in the way you want. Some of them, like jujutsu [0] are basically entirely new VCSs built on top of git (and influenced by mercurial).
  I didn't mind mercurial, but I think it's telling that even the tech behemoths championing it like Google and Facebook forked off into their own tools (fig/sapling). The latter even supports git directly. That battle is long over.
  [0] https://github.com/martinvonz/jj
  [-]
  - coryrc 2 years ago
    Fig is an abstraction over top of Perforce. Before fig was a git-UI-based tool and it is gone. Mercurial's interface is way better than git.
    The fact that Mercurial does not scale to a Google-sized monorepo is not an indictment of Mercurial.
    [-]
    AlotOfReading 2 years ago
    From what I understand, the local instance in fig is/was a full-on local mercurial repo, just not the entire monorepo (which doesn't leave Piper).
    [-]
    coryrc 2 years ago
    I suppose it must be because you can create commits which don't exist in Perforce.
- jkubicek 2 years ago
  > The number of times I got stuck fixing (mine and other people's) git issues is way more than seems necessary.
  I almost never have any issues with git, I wonder if you're working differently than I do?
  > I have not used a merge at a workplace for so long that I forget how it works, because rebases make so much more sense.
  Oh, that's the reason.
  I've been using git almost as long as github.com has been a thing and I'm convinced that if you find yourself rebasing, _especially_ if rebasing is a normal part of your workflow, you're using git wrong.
  Here's how I work:
  1. All development is on a features branch 2. Feature branches are tight and focused 3. Frequent merges from `main` to stay up-to-date 4. Squash your branch into a single commit before merging
  This gets you everything you want from source control (linear history of well-structured atomic changes) without any of the gymnastic skills necessary to be a competent rebaser.
  Oh, and I never get stuck fixing weird issues, because it's impossible for me to generate weird issues. The worst thing I encounter is a merge conflict, and merge conflicts are the easiest possible conflicts to resolve.
  [-]
  - dasil003 2 years ago
    Disagree pretty strongly about rebasing. If you are an experienced and full-time programmer then you should know how to rebase and merge, and understand when the time and place is for each one, and I personally always rebase my local topic branches. I understand this is a bit subjective and may depend on local context, but essentially my view is this:
    When you merge, conflicts are resolved for a large number of independent commits all at once and crammed into a potentially large blackbox commit that is not directly related to either the work on main or your topic branch. When a developer makes a mistake in that merge (which happens often), it's hard to reason about their mental state and whether they fully grokked the state both both branches. Even worse is when there is a logical conflict, but the lines aren't near each other, so git just happily merges.
    By contrast, when you rebase, you are essentially going through the exercise of updating your commits for the current state of the world, and each conflict is resolved locally to where it was authored, including completely removing things that no longer make sense. The result is fully bisectable, and this property can be preserved along with the ability to see original author and merge times with existing timestamps and merging to main with the -no-ff flag. Obviously this requires some expertise with git, but it's not very difficult once you get the hang of it, and in fact leads to more disciplined thinking about how your changes will roll out.
    WRT to squashing, I think the value depends on the scale of the team and codebase you're working for. The larger it is, the more squashed you want things for posterity. However for small teams, as long as you keep CI building, finer grained history can be useful. This also depends on the size of your feature branches and deploy cadence, so YMMV.
    [-]
    aidos 2 years ago
    It’s irrational, but I always stand on the sidelines of these conversations being a little bit cross that people slag off rebasing.
    At some point someone pointed out that rebasing changed the history and if you change the history of a public branch everyone was going to have a bad time. That got turned into “rebasing is bad” and now people keep rolling it out the same old trope. It’s a tool, and a good one for managing your work. If you don’t know how to use it then you’re doing yourself a disservice.
    In terms of squashing, my rule is that I squint at the PR and if the commits are going to have value in a years time then I’ll merge commit otherwise I’ll squash. In 10 years of running my codebase I’ve never blamed and thought to myself “I wish there were more fine grained commits here” so it’s obviously working fine for me.
    As you say, do what works for your team but make sure you understand the tools and what they provide for you.
    [-]
    heavenlyblue 2 years ago
    You can rebase while keeping all of the commits in your feature branch, you will just be asked to resolve conflicts for each one of them (if any).
    I don't understand what the fuss is all about.
    IlliOnato 2 years ago
    I think squashing or not squashing really depends on the quality of individual commits in the topic branch.
    Myself, I almost always "edit the history" of my topic branch before merging. Often enough some interactive rebasing and occasional commit-splitting is employed. I do it to turn my topic branch into a series of clean logical commits, so that each one can easily be reviewed on its own. I would not want to loose such history, so I avoid squashing unless it's mandated.
    When people don't have time\energy\skills\inclination for this kind of clean-up and their topic branch commits are a mess of random things reflecting their actual development process -- including errors, fixing these errors, random experimentation, etc., there is not much value in keeping such history, and squashing often produces and cleaner and easier to review commit.
    However, for me the end result of a development cycle is not just a good state of the code (functional, tested, documented), but a good history of that code too: reviewable and bisect-able. Git provides tools to maintain such manageable history, but does not do it for you.
  - mbork_pl 2 years ago
    I have never understood the "squash a branch to one commit before merging" attitude. I prefer small commits, both during code review and later, during looking into repo history (which is needed once in a while) or bisecting.
    Also, technically, "squashing to one commit before merging" is rebasing, so apparently rebasing is a normal part of your workflow, too.
    That said, our discussion only confirms that Git is pretty versatile and flexible. (Of course, the UI is terrible, but Magit helps a lot in that department.)
    [-]
    jkubicek 2 years ago
    > I have never understood the "squash a branch to one commit before merging" attitude. I prefer small commits, both during code review and later, during looking into repo history (which is needed once in a while) or bisecting.
    I 100% agree. You don't have to make massive feature branches for your changes. If your feature branch is huge, it's a sign that you probably should be breaking your work up into smaller chunks.
    > Also, technically, "squashing to one commit before merging" is rebasing, so apparently rebasing is a normal part of your workflow, too.
    Yeah, I know ¯\_(ツ)_/¯
    This gets brought up every time I have this discussion, and it's correct, but also a little misleading. Squash-merges are never going to be any more challenging to make than regular old merges. Rebase conflicts are where people run into real trouble.
    [-]
    samus 2 years ago
    Rebase conflicts are just the dual to merge conflicts. Instead of applying the end result of my work, I apply my work step by step. I don't see the final big picture, but move towards it.
    This is of course a nightmare if the branch is long-running, because then it becomes easy to forget what I was thinking and doing when making the first few commits. But this is where organizing and crafting good commits (good commit messages are just the beginning), and frequent rebases on top of the source branch help.
    Frequent rebasing also makes it possible to start working on a feature that depends on somebody else's in-progress work. It works best if the other person also frequently commits, publishes and rebases their work to keep the trouble with rebasing to a minimum.
    Git serves many kinds of dealing with concurrent activities in other branches well. Maybe too well :-)
    yencabulator 2 years ago
    You can get merge conflicts just as well as rebase conflicts. Especially for squash merges, which really are rebases.
    [-]
    jkubicek 2 years ago
    That's true, but merge conflicts are the easiest possible conflicts to resolve. Main branch and my branch both changed this line, no matter what, you must resolve that conflict.
    Rebase conflicts might be just as easy or they might be an order of magnitude more complex and confusing. They'll never be easier than a merge conflict.
    [-]
    seba_dos1 2 years ago
    Merge conflicts are often much harder to resolve than rebase conflicts. I often find myself going through individual commits in order to understand how to resolve such conflict anyway, which rebase does for me automatically (especially with diff3), providing enough context right away. In many cases, it's much simpler to follow the changes step-by-step.
    Sometimes I even ended up resolving merge conflicts by rebasing and then rewinding back and committing the resulting state as a merge. It was easier that way.
    Sure, there are cases where rebasing naively gets super annoying as each commit conflicts on cherry picking, but you can end up with pretty nightmarish merges just as well if you're not careful.
    ChrisSD 2 years ago
    I'm not sure I understand. Why would rebasing ever be more complicated? They're the same thing at the end of the day, no? Main branch and my branch both changed this line no matter what, you must resolve that conflict.
    [-]
    jkubicek 2 years ago
    Image you have a feature branch where you changed a block of code two times and are rebasing that branch onto main, where that block of code was also changed. During the rebase, you'll need to resolve that conflict twice. If you had merged main into your branch you would only have to resolve it once.
    No matter how complex your branch gets or how much it diverges from main, you'll only ever need to resolve a conflict once. For rebasing you may need to resolve conflicts multiple times for changes you may have made hours/days/weeks ago.
    If you're strict about rebasing and keeping your working branch clean and remembering what you were doing in that commit from three weeks ago the rebase process might not be that bad, but it can never be easier than a merge.
    Rebase conflicts are like short-selling stocks; there's no upper limit on how bad it can get.
    [-]
    IlliOnato 2 years ago
    It seems there are 2 types of rebases involved here.
    I never rebase a feature branch on top of the changes in the main branch. The branching point of a branch always stays the same.
    But I use rebase for editing commits in my feature branch all the time, i.e. I rebase interactively one feature-branch commit on top of another feature-branch commit.
    jrockway 2 years ago
    Because the evolution of the topic branch is largely irrelevant to future analysis. If you write tests firsts, then you have a commit in the repo where tests can never pass; fun for bisecting. If you write the code first, then you have a commit with a bunch of dead code in it because the tests appear in the next commit.
    I would say that my average topic branch looks like: "edit the protos and rebuild them", "add feature", "add tests", "add CLI", "fix lint". There is no value in checking that into the main history in that order; the first commit doesn't compile because the API servers don't implement the API server interface anymore. The second commit is a bunch of dead code. The third commit is somewhat useful except the code doesn't follow the style guide and static checks fail. The fourth commit at least un-deadifies the code; something calls it now. The last commit is something that can be checked in. I really see no value in preserving that history; the set of commits makes more sense as one commit.
    You could redo this to "edit the protos, and rebuild them, and add stubs that return Unimplemented", "add feature and tests to replace the stubs that return Unimplemented", "add CLI binding", which could stand alone if you felt like it. It is more work for the author, though, because you had to write code that you are literally going to delete in the next commit. Why bother? Personally, when I'm reviewing code, the more context I have, the better. If there is a design doc for the feature and someone sends me just the protos that they describe in the design doc and the design doc is actually good, fine. I got the context and this just speeds things along. If there isn't, how do I know this API is even right? If I see how it's implemented across the client and the server, then it's easy to make suggestions for improvement. If it's just the API, it sounds right, but I don't know if you can actually implement that feature until I see it done, right? More importantly, how do I know this set of 3 patches isn't going to be followed by "oops, forgot this in the protos", "oops, forgot this in the server", "oops, forgot this in the tests", and "oops, forgot this in the CLI"? Now we've turned one commit into 8. Why? Because "small PRs are good"? Prove it. I had to do 8 code reviews instead of 1. You had to write code that you're literally deleting the commit after you added it. Why bother? I'm unconvinced that the history of "do it", "oops fix it" is more useful when doing archaeology than "do it right the first time".
    [-]
    yencabulator 2 years ago
    This is a problem with how you work. You're not making atomic commits, you're spreading a logical change over multiple commits. Non-atomic commits are just bad.
    Here's an example of a correct way to use small commits: add a new function+unit tests into a library in one commit, then switch existing code to use it in one or more follow-up commits.
    [-]
    jkubicek 2 years ago
    How do you correct the following commit history?
    1. New function & unit tests 2. Switch existing code to use new function 3. Whoops, new function broke some existing code 4. I guess I need to fix tests too 5. Spelin mistakes
    There's at least three ways to clean this up:
    First, you could maybe do an interactive rebase, move commits 3-5 up and squash into commit 1. Maybe easy, or maybe you hose yourself and break everything.
    Second, just squash it into a single commit. No possible way to screw this up, works every time.
    Third, do a `merge --no-commit`, commit your changes and your tests, commit the changes to the existing library to use your new method. Easy and straightforward and keeps the two change separate if that's something you're into.
    Of the three choices, rebasing is the worst!
    [-]
    yencabulator 2 years ago
    1. git commit lib/bar.x 2. git commit foo.x 3. git commit --fixup sha_of_1 4. git commit --fixup sha_of_1 5. git commit --fixup whichever_is_appropriate n. git rebase -i where_ever_you_branched_off
    You call git rebase "worst" purely because you don't currently see the value in granular commit history. If and when you do, git rebase is just the tool for the job.
    `git merge --no-commit` only works for trivial things, real world is a lot messier than that. But perhaps you haven't needed to deal with the complex changes. For an idea of what complex changes look like, skim some linux kernel VFS or MM refactoring patch sets.
    jkubicek 2 years ago
    > You had to write code that you're literally deleting the commit after you added it. Why bother? I'm unconvinced that the history of "do it", "oops fix it" is more useful when doing archaeology than "do it right the first time".
    My suspicion is that people are using rebasing to clean up their prior commits as they go instead of just getting everything working and then breaking the work up into commits, if needed. And this is also how they work themselves into horrible knots trying to figure out what they did to hose their branch.
  - happytoexplain 2 years ago
    I have to disagree with #4. If your branch's commit history is unnecessarily fragmented because it evolved as you worked, then yes, squash those commits - but "one commit per feature" as a rule doesn't work for many kinds of features which by their nature can be made only so "tight and focused". Sometimes it's an undue burden to have to view a feature's patch as a single diff, even if the feature is functionally as atomic as it can get.
    [-]
    jkubicek 2 years ago
    First, my feature branch commit history is always fragmented and unusable. I commit extremely liberally, every few minutes if I'm actively making changes. Why not? If I always squash before I merge, nobody seems my junk, and being able to back up a few steps after coding down the wrong path is super useful.
    Second, if I do create a feature that should be multiple commits for clarity, I find it a lot easier to do all my work in a scratch feature branch, get everything working correctly, tests passing, etc., then do a `git merge --no-commit` onto my final feature branch for the PR. At that point I can break my work up into a few different commits, as necessary.
    [-]
    yencabulator 2 years ago
    Maybe consider you're just a messy person, and don't extrapolate from that to "if rebasing is a normal part of your workflow, you're using git wrong".
    For me, rebasing is the means by which I keep my development branch neat and tidy. Each commit changes one thing. It's easy to see what's "prep work" and what's the ultimate change of functionality. Git bisect can find the commit that caused a bug. And so on.
    I view squash merges largely used as a crutch to hide bad habits. I find the resulting history from lots of squash merges is not useful in figuring out why a change was made; by that time the specific change is part of a big ball of mud with a very vague high-level goal.
    [-]
    jkubicek 2 years ago
    > Maybe consider you're just a messy person, and don't extrapolate from that to "if rebasing is a normal part of your workflow, you're using git wrong".
    Possibly? But I also find it hard to believe that there's this many developers out there typing out perfect code on the first try with no mistakes or false starts.
    > I find the resulting history from lots of squash merges is not useful in figuring out why a change was made; by that time the specific change is part of a big ball of mud with a very vague high-level goal.
    That's the developer's fault though. If someone is lazy enough to squash-merge a big ball of mud onto the main branch, they would also be lazy enough to generate a commit history filled with barely working and unclear commits, or incompetent enough to completely hose themselves with a rebase.
    At the end of the day, I'm generating the same set of small, atomic commits as you. The difference is that I don't spend any time worrying about what the commit history looks like while I'm coding. It's only at the last second before sharing a PR that I consider it. 95% of the time I've got a tight one-commit PR containing a little code, a few docs, a few tests. Sometimes I break my changes into separate PRs, very rarely I'll add multiple commits to a single PR, but those aren't my default workflows.
    [-]
    jimbobimbo 2 years ago
    "Messy" and "tidy" is not something I'd expect to see in the conversation about private branches.
    Version control in the private branch is a tool which helps me maintain state and go back and forward in time. It looks exactly like I thought about the code or work I needed to do at some point in time. It is not messy or tidy, it's just what it is. I don't have to please anyone's sensibilities when I'm in the private branch.
    The merge into main is squashed and atomic, of course.
    yencabulator 2 years ago
    > But I also find it hard to believe that there's this many developers out there typing out perfect code on the first try with no mistakes or false starts.
    Cleaning up one's own mess is what git rebase -i is for.
  - vvpan 2 years ago
    I pretty much only work in the context of teams. My own git skills matter quiet little - there are many other people of various skill levels who are working on the same codebase. Junior developers get very easily confused by git.
- seba_dos1 2 years ago
  Git is a data structure manipulation tool. LibreOffice also doesn't have to be so complicated as it is for most usages, it's just that everyone uses a different 5% of its functionality and considers it essential.
- waynecochran 2 years ago
  There is a very simple workflow that uses a small subset of git commands -- I try to never stray from that. When I do, there is stackoverflow, ohshitgit.com, and I also know an expert who can help me untangle things if I am really unsure.
  [-]
  - dboreham 2 years ago
    Also I've always found that cp working-tree broken-tree, then git clone <new-tree> then <manually fix up files I was working on> almost always beats trying to detailed the 10 suggested SO solutions.
    [-]
    Galanwe 2 years ago
    That's basically what I did the first year working with git.
    Then I met a coworker that told me to stop doing this and come with him on a whiteboard to understand and solve the problem.
    That was painful the first month, but after some months of that treatment I had a strong mental model of exactly what happened, and how to fix it.
    I think overall the mental model is to
    1) get knowledgeable enough to be able to actually draw the current graph of commits, branches and inheritance
    2) clearly lay down the desired structure of the target graph
    3) perform the steps necessary to transform the graph from 1) to 2).
    samus 2 years ago
    I've always found that a good Git user interface goes a long way to understand what is really going on. The commit graph and the conflict resolution UIs are most important. The right choice is very personal - I've found understanding the contents of a repo purely on the terminal to be torture, even though I issue most commands from there.
    mbork_pl 2 years ago
    Also, Git has this weird feature called "workdirs" which may help with that, too.
    coryrc 2 years ago
    http://learngitbranching.js.org
    A couple hours and you'll never need to do that again. I promise!
- doctorpangloss 2 years ago
  When you experience no Git issues:
  - you know what you're doing, or
  - you are not working with other people, or
  - you are working with other people, but on different things
  Git is a dual of Conway's Law.
  Your Git repos tell the best story about your experience as a developer.
  [-]
  - yencabulator 2 years ago
    That's just version control, nothing special about Git there. I guess a lot of kids running around these days never had this stuff happen, in worse forms, with CVS et al.
- naasking 2 years ago
  > I have not used a merge at a workplace for so long that I forget how it works, because rebases make so much more sense.
  It boggles my mind that people think 1) rebase makes more sense than branching and merging, and 2) that someone would use rebase so much that they forget how to branch and merge.
  [-]
  - singingfish 2 years ago
    I had a detailed discussion with a former colleague about this the other day. They're in the merge camp, and I'm in the "any code that actually makes it into the wild should have a linear history" (rebase) camp.
    Merge commits have multiple parents which makes visual inspection confusing, and it makes it much harder to use git bisect which is a tool that can save many hours of painstaking manual work.
    I clearly think that the rebase version is the correct approach, but I can see there are upsides and downsides for both. Sometimes for long lived branches I get merge commits in my feature / bugfix branches, but those are going to end up squash merged onto our release branch, so in the end the main thing I care about is the linearity of history in master/develop.
    Combination of linear history and appropriate use of annotated tags for releases makes it way easier to see what actually happened (i.e. the intentionality). A merge based approach shows you what people were actually doing though.
    [-]
    naasking 2 years ago
    I'll leave this here for you to peruse:
    https://fossil-scm.org/home/doc/trunk/www/rebaseharm.md
    > Combination of linear history and appropriate use of annotated tags for releases makes it way easier to see what actually happened (i.e. the intentionality).
    I still haven't heard a good reason why linear history using git as an archaeological tool is the appropriate method to "infer intent". It seems like you would only have to infer intent this way if you have poor development practices, like not adding appropriate comments and not tying important changes to tickets that provide more background/context.
    [-]
    singingfish 2 years ago
    If you have a nasty nasty difficult to find bug that emerged some time ago, and your history is littered with merge commits, git bisect is going to go badly. If your history is linear, and your data model is compatible with itself across history for the bits you're interested in, and you can write a sane test case - auto or manual, then your troubleshooting is going to be O(n^x) simpler than the case where you've got merge commits, and save you many hours. Not something I have to do often (once a year or less really) but valuabable enough that it's important to me nonetheless.
    Rewriting history when things land in develop/master I think is a good thing. To use an analogy from the humanities, history is written by the winner.
    At work we're collaborating with the merge camp at the moment and it's very difficult to work out what they're doing from reading their history - some of it is their bad practices elsewhere, but the merge commits are confusing.
    I'd also be fine with no rebasing but any merges must be fast forward, so if you can't do that fix your problems prior to integration.
    Also I've used a few different approaches for a few different projects, and the rebase/squash merge approach is the one that in my experience makes things clearer and easier to understand. I'd be reluctant to return to other approaches (aside from enforcing fast forward merges in public branches).
    I believe that squash merges are a good compromise as so long as you don't mess with the reflog and keep the automated parts of the commit messages, then there is no lying about history.
    [-]
    naasking 2 years ago
    Gotta say, in 20+ years of branching and merging using CVS, SVN and then Mercurial, I've never had an issue tracking down nasty bugs. Maybe git bisect on a linear history would hypothetically be a few minutes faster in some cases, but doesn't make it worth all of the extra work and the dangers of overwriting other people's work.
    I think the Fossil devs also make a good case that the problem is really a limitation in the tools like git bisect.
    baq 2 years ago
    Intentionality is the perfect argument against merging feature branches: I intentionally don’t want my half-broken commits to be in the production branch. I want the complete product to be there.
    [-]
    IlliOnato 2 years ago
    Apparently what you do makes sense for you, but I hope you realize that using interactive rebase you can turn your half-broken commits into nice and clean commits!
    I try to do this as I go, but there is always a "history review and fix if necessary" stage after I've got the code I wanted and before merging.
    Apart from having granular, reviewable, bisectable history, there were many a time when by having this extra look at the code I found bugs or design flaws.
    Whether such effort is worth it for you, and whether you have time\energy\discipline\skills for it is a different question.
    naasking 2 years ago
    > I intentionally don’t want my half-broken commits to be in the production branch.
    That makes no sense. What does it matter that interim versions from a merge might not work, what matters are the milestones like merges and tags. Basically you're imposing a silly aesthetic restriction on what's supposed to be a functional tool.
    [-]
    baq 2 years ago
    My silly aesthetic restriction is ‘tests pass’ and my branch commits don’t have this quality in all cases.
    singingfish 2 years ago
    The whole argument feels a bit People's Liberation Front of Judea to me. On the other hand I do have strong views on the matter, and I am pretty certain that my views are pretty close to the correct ones.
  - baq 2 years ago
    What’s the point of merging messy branches?
    If your feature branches have independently working commits, go ahead and branch. Most people’s feature branches are developer diaries instead of potentially releasable products. Merging them would make everyone’s lives worse, so of course they get squashed.
    Rebases are a bit weird since they mostly help during bisecting but still require production-ready commits in feature branches.
    [-]
    naasking 2 years ago
    Who is "everyone", in "it would make everyone's lives worse"? Worse in what way, exactly? What quantifiable metric are you using to judge "worse"?
    I agree that some feature branches can be developer diaries, but I'm not sure why throwing away the diary is a good thing, and why the repo should only contain "releasable products". Sounds like an aesthetic preference rather than something concrete.
    [-]
    yencabulator 2 years ago
    > why the repo should only contain "releasable products"
    This is your chance to learn about git bisect.
    baq 2 years ago
    It’s called continuous delivery: I want the main branch to be in a releasable state always. If it isn’t, tickets are raised and people are paged.
    yencabulator 2 years ago
    > still require production-ready commits in feature branches.
    I make a point of requiring that anyway. You gotta learn to build incrementally, repeatedly landing squashed mega-commits into main is just a recipe for future pain. To level up as a software developer, learn to create incremental work. git rebase -i is your friend.
    [-]
    baq 2 years ago
    I’ve had my share of rebase -i and I’ve gotten fed up with it, though GitHub plays no small part in that. Years ago when I worked with gerrit it made more sense. I just stopped caring because of the tool.
- twic 2 years ago
  I still use Mercurial for personal projects! I do think it's superior to Git in many ways, and simpler than Git in many ways, but now i've got used to Git, i increasingly find myself getting stuck in Mercurial. For example, recently i included changes i didn't want in a commit, and found i had no good way out of the situation, whereas in Git i would just reset the branch to HEAD^ and try again.
  [-]
  - coryrc 2 years ago
    hg histedit
    edit, commit
    hg add <smaller set of files>
    hg commit
    Possibly hg histedit, then reverse commit order (down commit).
    [-]
    naasking 2 years ago
    Could also strip while keeping changes and then redo the commits.
- speed_spread 2 years ago
  Git won because of Github and not much else.
  Git was conceived to manage the Linux source tree, which is a problem in a class of its own. Fully decentralized source management requires an extremely flexible system, which conversely precludes any kind of implicit workflow. While this goes against previous SCM (svn, etc.) it also leaves a serious void for regular organizations that just need to keep doing centralized, trunk-based development. Github provides that missing layer in a friendly, accessible manner.
  Ironically, by normalizing its usage, Github takes away a lot of the justification for git itself. You could achieve the same model with mercurial, bazaar, etc. which also have the advantage of a sane CLI.
  [-]
  - baq 2 years ago
    Git was winning already and GitHub just made doubly sure it did.
    Git was winning before GitHub because it was fast. Like, everything else was dog slow in comparison. Yes, hg was slower, sometimes much slower.
    Alternatives were also hard to set up compared to git init and git push over ssh. (Ye olde svn, I’m looking at you.)
- Tyr42 2 years ago
  I mean, this article is talking about how git works behind the scenes to make cherry pick work better and be less likely to fail to apple the patch. Yes there is complexity, but this is actually one of (rare) cases where git hides complexity successfully.
- andirk 2 years ago
  "With great power comes great responsibility"
  That's version control. You have every choice from straight forward to mind-blowingly time-manipulatingly complex. If you fear the unknown, then just do the out of the box stuff via a GUI like Sourcetree. If you're willing to sweat, then use Sourcetree but sometimes `git rebase HEAD~3` and rewrite history.
  And there's no shame in adding a `Fix: Change var to int` to be written into the annals of history forever. Don't worry about it.
- bloopernova 2 years ago
  Try this tutorial: https://alexwlchan.net/a-plumbers-guide-to-git/
  It helped me to more fully understand Git.
- TexasMick 2 years ago
  I basically refuse to fix other people's git issues because 99% of the time they don't know how the basics.
- twosdai 2 years ago
  Funny enough. I only use merge because rebase makes no sense to me.
  As an aside, I think git complexity can arise for two reasons. One is legitimate, if you have hundreds or thousands of devs committing to the same repo regularly you need to have some order to the chaos otherwise it just will grind to a halt.
  The other is bullshit. Where some dev / management people believe that the git process needs to be complex for small teams due to the need to justify their place by "promoting safety". In small team environments, simple branching off of main and then committing prs directly to main is more than enough in my opinion.
  But git gives you the tools to go as crazy as you want, so some people do.
  [-]
  - dale_glass 2 years ago
    Rebase is simple. It's "take these commits, and apply them one by one on top of another branch".
    So let's say your main development line is the "main" branch. You start working on "bugfix", so you branch-off from "main". You do this on November 1.
    So you make commits: "Fix bug". Then "Remove debug code", then "Rename X to Y". Now it's 10 days later.
    Say a coworker is doing some other work at the same time, and just merged this into "main". You want to make sure that after that, everything still works properly. Maybe somebody changed something that makes the code you wrote no longer work, because somebody happened to fix a typo in a function's name or something, and you wrote your code against the old name which just ceased to exist.
    So you do "git rebase main". What you're telling Git is "take the commits I made during the last 10 days, and apply them on top of the current main". It's like if you created a new branch then just applied the commits one by one by hand.
    You can do this as often as you want, so if you've got some experimental refactor that takes you a month you can keep on rebasing daily to make sure that if there's any conflicts you have to deal with, you deal with it a little at a time, instead of having to figure out how to square a month of your work with a month of the rest of the project's work once you're done.
    With this once you merge this into master, you get a "fast-forward" merge, which looks as if you were working on top of master all along. Instead of a single merge commit you see all 10 commits you made for the branch.
    You can also do an "interactive rebase", where instead of having the process automated you can make decisions about each commit individually. This allows you to rename, merge, or discard commits. Like you've got 10 embarrassing looking commits where you just poke stuff at random in hopes of that it'll finally work? You can make them disappear entirely, or merge them all into one commit.
    [-]
    recursive 2 years ago
    Not who you're responding to, but I also understand merge more readily than rebase. Your explanation doesn't really help.
    Applying a commit "on another branch" isn't really well defined in my mind. In my understanding, a commit consists of one or more parent commits, and the complete state of a tree, including the entire contents of all the files in it.
    Maybe there are other ways to be confused, but all of the uncertainty has been shoved into that simple phrase "apply a commit on another branch". What does that actually mean?
    [-]
    ecnahc515 2 years ago
    Have you ever used git cherry-pick? Git rebase is if you were loop over each commit (bottom to top) and run `git cherry-pick` on each commit separately.
    If you haven't used git cherry-pick, then it is basically a shortcut for `git show $REF | git apply -`, where the short cut is the simplest form of using git, you have a patch (via git show) and you want to apply it to your local checkout.
    dale_glass 2 years ago
    > "apply a commit on another branch". What does that actually mean?
    Calculate the difference between the commit and its parent, and repeat that change against another branch.
    So, like git log of your work over 10 days, git show each commit individually, then apply each patch that generates against the latest code.
    pertymcpert 2 years ago
    Of course it's defined, even git knows what it means because that's what cherry-pick does.
    [-]
    recursive 2 years ago
    So cherry-pick is actually merge, and rebase is actually cherry-pick?
    Everything is merge.
    [-]
    pertymcpert 2 years ago
    No, cherry-pick is cherry-pick. Rebase is a composite action which includes one or more cherry-picks.
    jkubicek 2 years ago
    Rebasing isn't that hard, but you also spent 8 paragraphs explaining it. Explaining merging just takes one paragraph: Copy all the changes from the main branch into my branch, if someone changed the same line that I changed, then create a merge conflict and manually figure out which change is right.
    [-]
    AlotOfReading 2 years ago
    Their explanation is one sentence. The rest is context, examples, and useful extensions of the basic idea (e.g. interactive rebases).
    dale_glass 2 years ago
    It's a matter of taste really, there are pros and cons to both.
    Rebase also can produce merge conflicts when doing the rebase. Sometimes that makes resolving them easier and sometimes harder.
  - Swizec 2 years ago
    > In small team environments, simple branching off of main and then committing prs directly to main is more than enough in my opinion.
    Interestingly enough, trunk based development is what all the big teams have arrived at also. It’s the only thing that works.
    If trunk based development doesn’t work for you, it’s because you’re missing feature flags or can’t use them for a specific project. And it always leads to pain.
    [-]
    twosdai 2 years ago
    Yeah I'm a huge fan of trunk based development. It's my preferred operating method for git.
    I know a lot of people like git flow for its ability to create seperate envs that can be controlled and tested, however it always felt extremely unnecessary to me when building saas products. Changes should not be long lived in this space.
    I can understand how git flow would be great when the product is an artifact like a docker image, binary, vm image etc... but for managed applications / saas stuff, it's not the best.
  - MereInterest 2 years ago
    I think yours is the sensible path. Rebase can make sense for some specific cases, such as a local branch being organized to present a clearer motivation for an implementation. Rebase of a shared or externally-visible branch should never be done, and rebase as a default behavior is just a recipe for merge conflicts.
    My preferred default would be to only ever merge, with an explicit merge commit at all points. (i.e. Never rebase, never use a fast-forward merge.) The main branch is then a sequence of merge commits, visible with "git log --first-parent". This is the same cleaned-up history that rebase proponents espouse, but doesn't lie about the true history the way a rebase-by-default workflow does
    [-]
    lanstin 2 years ago
    When I am working mostly alone I never rebase or merge. I have code in various directories on my laptop or ephemeral ranches, or more likely stashes if I have more than one thing in flight. When working with others, I will pull a branch while I work on something especially once I am ready for reviews or comments. If main progresses substantially while I work on it, I will rebase and then close and reopen any pending PRs. Especially the first time I start working on a repo, when I want a low key low pressure approach. Once they take some of my work and seem ok with me in general, I will switch to more frequent branchless PRs.
- darklycan51 2 years ago
  Git seems extremely over complicated for no reason, half of the features seem to exist because the system is so easy to mess up.
  [-]
  - yencabulator 2 years ago
    "I've only ever used a hammer so other tools should not exist."
    Supporting remote-use CVS for a distributed multi-team development group was pure hell, and regularly had me going to lunch while waiting for simple command to finish. I'm glad git's features exist.
- juped 2 years ago
  Instead of hijacking this post to start a flame war, why not promote mercurial without hijacking a post by drawing a parallel to mercurial's implementation of the same thing described in the article, highlighting commonalities and distinctions in the approaches?
  [-]
  - vvpan 2 years ago
    I probably should not have mentioned mercurial. I do not really know it anymore, but it is somewhat more UX focused than git. I think my original argument stands, especially considering the comments - too many ways to use Git, while most people need only basics.
test1235 2 years ago
```
    /*
     * There's probably some smart way to do this, but I'll leave
     * that to the smart and beautiful people. I'm simple and stupid.
     */
```
I love that this sort of thing still exists in enterprise software
[-]
- yencabulator 2 years ago
  Git is pretty much the opposite of "enterprise software" by origin.