• smj-edison 2 minutes ago
    Regardless of whether this means AGI has been achieved or not, I think this is really exciting since we could theoretically have agents look through papers and work on finding simpler solutions. The complexity of math is dizzying, so I think anything that can be done to simplify it would be amazing (I think of this essay[1]), especially if it frees up mathematicians' time to focus even more on the state of the art.

    [1] https://distill.pub/2017/research-debt/

  • outlace 3 hours ago
    The headline may make it seem like AI just discovered some new result in physics all on its own, but reading the post, humans started off trying to solve some problem, it got complex, GPT simplified it and found a solution with the simpler representation. It took 12 hours for GPT pro to do this. In my experience LLM’s can make new things when they are some linear combination of existing things but I haven’t been to get them to do something totally out of distribution yet from first principles.
    [-]
    • CGMthrowaway 3 hours ago
      This is the critical bit (paraphrasing):

      Humans have worked out the amplitudes for integer n up to n = 6 by hand, obtaining very complicated expressions, which correspond to a “Feynman diagram expansion” whose complexity grows superexponentially in n. But no one has been able to greatly reduce the complexity of these expressions, providing much simpler forms. And from these base cases, no one was then able to spot a pattern and posit a formula valid for all n. GPT did that.

      Basically, they used GPT to refactor a formula and then generalize it for all n. Then verified it themselves.

      I think this was all already figured out in 1986 though: https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.56... see also https://en.wikipedia.org/wiki/MHV_amplitudes

      [-]
      • godelski 2 hours ago

          > I think this was all already figured out in 1986 though
        
        They cite that paper in the third paragraph...

          Naively, the n-gluon scattering amplitude involves order n! terms. Famously, for the special case of MHV (maximally helicity violating) tree amplitudes, Parke and Taylor [11] gave a simple and beautiful, closed-form, single-term expression for all n.
        
        It also seems to be a main talking point.

        I think this is a prime example of where it is easy to think something is solved when looking at things from a high level but making an erroneous conclusion due to lack of domain expertise. Classic "Reviewer 2" move. Though I'm not a domain expert and so if there was no novelty over Parke and Taylor I'm pretty sure this will get thrashed in review.

        [-]
        • CGMthrowaway 2 hours ago
          You're right. Parke & Taylor showed the simplest nonzero amplitudes have two minus helicities while one-minus amplitudes vanish (generically). This paper claims that vanishing theorem has a loophole - a new hidden sector exists and one-minus amplitudes are secretly there, but distributional
          [-]
        • nyc_data_geek1 55 minutes ago
          So it's a garbage headline, from an AI vendor, trying to increase hype and froth around what they are selling, when in fact the "new result" has been a solved problem for almost 40 years? Am I getting that right?
          [-]
          • singpolyma3 7 minutes ago
            No
          • throwuxiytayq 46 minutes ago
            you’re not, and you might have a slight reading comprehension problem
            [-]
            • jrflowers 22 minutes ago
              Surely you can explain it yourself, then? A person such as yourself with normal reading comprehension would be able to analyze and synthesize this on your own, without asking an LLM for help, no?
      • btown 2 hours ago
        It bears repeating that modern LLMs are incredibly capable, and relentless, at solving problems that have a verification test suite. It seems like this problem did (at least for some finite subset of n)!

        This result, by itself, does not generalize to open-ended problems, though, whether in business or in research in general. Discovering the specification to build is often the majority of the battle. LLMs aren't bad at this, per se, but they're nowhere near as reliably groundbreaking as they are on verifiable problems.

      • lupsasca 1 hour ago
        That paper from the 80s (which is cited in the new one) is about "MHV amplitudes" with two negative-helicity gluons, so "double-minus amplitudes". The main significance of this new paper is to point out that "single-minus amplitudes" which had previously been thought to vanish are actually nontrivial. Moreover, GPT-5.2 Pro computed a simple formula for the single-minus amplitudes that is the analogue of the Parke-Taylor formula for the double-minus "MHV" amplitudes.
      • woeirua 3 hours ago
        You should probably email the authors if you think that's true. I highly doubt they didn't do a literature search first though...
        [-]
        • emp17344 2 hours ago
          You should be more skeptical of marketing releases like this. This is an advertisement.
        • godelski 2 hours ago
          They also reference Parke and Taylor. Several times...
        • suuuuuuuu 2 hours ago
          Don't underestimate the willingness of physicists to skimp on literature review.
          [-]
          • baq 2 hours ago
            After last month’s Erdos problems handling by LLMs at this point everyone writing papers should be aware that literature checks are approximately free, even physicists.
      • ericmay 3 hours ago
        Still pretty awesome though, if you ask me.
        [-]
        • fsloth 3 hours ago
          I think even “non-intelligent” solver like Mathematica is cool - so hell yes, this is cool.
        • _aavaa_ 2 hours ago
          Big difference between “derives new result” and “reproduces something likely in its training dataset”.
      • torginus 2 hours ago
        I'm not sure if GPTs ability goes beyond a formal math package's in this regard or its just its just way more convienient to ask ChatGPT rather than using these software.
    • randomtoast 3 hours ago
      > but I haven’t been to get them to do something totally out of distribution yet from first principles

      Can humans actually do that? Sometimes it appears as if we have made a completely new discovery. However, if you look more closely, you will find that many events and developments led up to this breakthrough, and that it is actually an improvement on something that already existed. We are always building on the shoulders of giants.

      [-]
      • davorak 2 hours ago
        > Can humans actually do that?

        From my reading yes, but I think I am likely reading the statement differently than you are.

        > from first principles

        Doing things from first principles is a known strategy, so is guess and check, brute force search, and so on.

        For an llm to follow a first principles strategy I would expect it to take in a body of research, come up with some first principles or guess at them, then iteratively construct and tower of reasonings/findings/experiments.

        Constructing a solid tower is where things are currently improving for existing models in my mind, but when I try openai or anthropic chat interface neither do a good job for long, not independently at least.

        Humans also often have a hard time with this in general it is not a skill that everyone has and I think you can be a successful scientist without ever heavily developing first principles problem solving.

      • dotancohen 3 hours ago
        Relativity comes to mind.

        You could nitpick a rebuttal, but no matter how many people you give credit, general relativity was a completely novel idea when it was proposed. I'd argue for special relatively as well.

        [-]
        • Paracompact 2 hours ago
          I am not a scientific historian, or even a physicist, but IMO relativity has a weak case for being a completely novel discovery. Critique of absolute time and space of Newtonian physics was already well underway, and much of the methodology for exploring this relativity (by way of gyroscopes, inertial reference frames, and synchronized mechanical clocks) were already in parlance. Many of the phenomena that relativity would later explain under a consistent framework already had independent quasi-explanations hinting at the more universal theory. Poincare probably came the closest to unifying everything before Einstein:

          > In 1902, Henri Poincaré published a collection of essays titled Science and Hypothesis, which included: detailed philosophical discussions on the relativity of space and time; the conventionality of distant simultaneity; the conjecture that a violation of the relativity principle can never be detected; the possible non-existence of the aether, together with some arguments supporting the aether; and many remarks on non-Euclidean vs. Euclidean geometry.

          https://en.wikipedia.org/wiki/History_of_special_relativity

          Now, if I had to pick a major idea that seemed to drop fully-formed from the mind of a genius with little precedent to have guided him, I might personally point to Galois theory (https://en.wikipedia.org/wiki/Galois_theory). (Ironically, though, I'm not as familiar with the mathematical history of that time and I may be totally wrong!)

          [-]
          • _alternator_ 2 hours ago
            Right on with special relativity—Lorentz also was developing the theory and was a bit sour that Einstein got so much credit. Einstein basically said “what if special relativity were true for all of physics”, not just electromagnetism, and out dropped e=mc^2. It was a bold step but not unexplainable.

            As for general relativity, he spent several years working to learn differential geometry (which was well developed mathematics at the time, but looked like abstract nonsense to most physicists). I’m not sure how he was turned on to this theory being applicable to gravity, but my guess is that it was motivated by some symmetry ideas. (It always come down to symmetry.)

          • godelski 1 hour ago

              > Critique of absolute time and space of Newtonian physics was already well underway
            
            This only means Einstein was not alone, it does not mean the results were in distribution.

              > Many of the phenomena that relativity would later explain under a consistent framework already had independent quasi-explanations hinting at the more universal theory.
            
            And this comes about because people are looking at edge cases and trying to solve things. Sometimes people come up with wild and crazy solutions. Sometimes those solutions look obvious after they're known (though not prior to being known, otherwise it would have already been known...) and others don't.

            Your argument really makes the claim that since there are others pursuing similar directions that this means it is in distribution. I'll use a classic statistics style framing. Suppose we have a bag with n red balls and p blue balls. Someone walks over and says "look, I have a green ball" and someone else walks over and says "I have a purple one" and someone else comes over and says "I have a pink one!". None of those balls were from the bag we have. There are still n+p balls in our bag, they are still all red or blue despite there being n+p+3 balls that we know of.

              > I am not a [...] physicist
            
            I think this is probably why you don't have the resolution to see the distinctions. Without a formal study of physics it is really hard to differentiate these kinds of propositions. It can be very hard even with that education. So be careful to not overly abstract and simplify concepts. It'll only deprive you of a lot of beauty and innovation.
          • bananaflag 2 hours ago
            From that article:

            > The quintic was almost proven to have no general solutions by radicals by Paolo Ruffini in 1799, whose key insight was to use permutation groups, not just a single permutation.

            Thing is, I am usually the kind of person who defends the idea of a lone genius. But I also believe there is a continuous spectrum, no gaps, from the village idiot to Einstein and beyond.

            Let me introduce, just for fun, not for the sake of any argument, another idea from math which I think it came really out of the blue, to the degree that it's still considered an open problem to write an exposition about it, since you cannot smoothly link it to anything else: forcing.

        • johnfn 3 hours ago
          Even if I grant you that, surely we’ve moved the goal posts a bit if we’re saying the only thing we can think of that AI can’t do is the life’s work of a man who’s last name is literally synonymous with genius.
        • poplarsol 2 hours ago
          That's not exactly true. Lorentz contraction is a clear antecedent to special relativity.
        • lamontcg 2 hours ago
          Not really. Pretty sure I read recently that Newton appreciated that his theory was non-local and didn't like what Einstein later called "spooky action at a distance". The Lorentz transform was also known from 1887. Time dilation was understood from 1900. Poincaré figured out in 1905 that it was a mathematical group. Einstein put a bow on it all by figuring out that you could derive it from the principle of relativity and keeping the speed of light constant in all inertial reference frames.

          I'm not sure about GR, but I know that it is built on the foundations of differential geometry, which Einstein definitely didn't invent (I think that's the source of his "I assure you whatever your difficulties in mathematics are, that mine are much greater" quote because he was struggling to understand Hilbert's math).

          And really Cauchy, Hilbert, and those kinds of mathematicians I'd put above Einstein in building entirely new worlds of mathematics...

          [-]
          • Paracompact 2 hours ago
            Agree with you everywhere. Although I prefer the quote:

            "Since the mathematicians have invaded the theory of relativity, I do not understand it myself anymore."

            :)

      • tjr 2 hours ago
        Go enough shoulders down, and someone had to have been the first giant.
        [-]
        • nextaccountic 2 hours ago
          Probably not homo sapiens.. other hominids older than us developed a lot of technology
        • pram 2 hours ago
          Pythagoras is the turtle.
          [-]
          • TheCycoONE 1 hour ago
            Pythagoras learned from Egyptians that have been largely erased by euro/western narratives of superiority.
      • CooCooCaCha 2 hours ago
        Depends on what you think is valid.

        The process you’re describing is humans extending our collective distribution through a series of smaller steps. That’s what the “shoulders of giants” means. The result is we are able to do things further and further outside the initial distribution.

        So it depends on if you’re comparing individual steps or just the starting/ending distributions.

      • godelski 1 hour ago

          > Can humans actually do that? 
        
        Yes

        Seriously, think about it for a second...

        If that were true then science should have accelerated a lot faster. Science would have happened differently and researchers would have optimized to trying to ingest as many papers as they can.

        Dig deep into things and you'll find that there are often leaps of faith that need to be made. Guesses, hunches, and outright conjectures. Remember, there are paradigm shifts that happen. There are plenty of things in physics (including classical) that cannot be determined from observation alone. Or more accurately, cannot be differentiated from alternative hypotheses through observation alone.

        I think the problem is when teaching science we generally teach it very linearly. As if things easily follow. But in reality there is generally constant iterative improvements but they more look like a plateau, then there are these leaps. They happen for a variety of reasons but no paradigm shift would be contentious if it was obvious and clearly in distribution. It would always be met with the same response that typical iterative improvements are met with "well that's obvious, is this even novel enough to be published? Everybody already knew this" (hell, look at the response to the top comment and my reply... that's classic "Reviewer #2" behavior). If it was always in distribution progress would be nearly frictionless. Again, with history in how we teach science we make an error in teaching things like Galileo, as if The Church was the only opposition. There were many scientists that objected, and on reasonable grounds. It is also a problem we continually make in how we view the world. If you're sticking with "it works" you'll end up with a geocentric model rather than a heliocentric model. It is true that the geocentric model had limits but so did the original heliocentric model and that's the reason it took time to be adopted.

        By viewing things at too high of a level we often fool ourselves. While I'm criticizing how we teach I'll also admit it is a tough thing to balance. It is difficult to get nuanced and in teaching we must be time effective and cover a lot of material. But I think it is important to teach the history of science so that people better understand how it actually evolves and how discoveries were actually made. Without that it is hard to learn how to actually do those things yourself, and this is a frequent problem faced by many who enter PhD programs (and beyond).

          > We are always building on the shoulders of giants.
        
        And it still is. You can still lean on others while presenting things that are highly novel. These are not in disagreement.

        It's probably worth reading The Unreasonable Effectiveness of Mathematics in the Natural Sciences. It might seem obvious now but read carefully. If you truly think it is obvious that you can sit in a room armed with only pen and paper and make accurate predictions about the world, you have fooled yourself. You have not questioned why this is true. You have not questioned when this actually became true. You have not questioned how this could be true.

        https://www.hep.upenn.edu/~johnda/Papers/wignerUnreasonableE...

          You are greater than the sum of your parts
    • emil-lp 3 hours ago
      "GPT did this". Authored by Guevara (Institute for Advanced Study), Lupsasca (Vanderbilt University), Skinner (University of Cambridge), and Strominger (Harvard University).

      Probably not something that the average GI Joe would be able to prompt their way to...

      I am skeptical until they show the chat log leading up to the conjecture and proof.

      [-]
      • Sharlin 3 hours ago
        I'm a big LLM sceptic but that's… moving the goalposts a little too far. How could an average Joe even understand the conjecture enough to write the initial prompt? Or do you mean that experts would give him the prompt to copy-paste, and hope that the proverbial monkey can come up with a Henry V? At the very least posit someone like a grad student in particle physics as the human user.
        [-]
        • buttered_toast 3 hours ago
          I would interpret it as implying that the result was due to a lot more hand-holding that what is let on.

          Was the initial conjecture based on leading info from the other authors or was it simply the authors presenting all information and asking for a conjecture?

          Did the authors know that there was a simpler means of expressing the conjecture and lead GPT to its conclusion, or did it spontaneously do so on its own after seeing the hand-written expressions.

          These aren't my personal views, but there is some handwaving about the process in such a way that reads as if this was all spontaneous involvement on GPTs end.

          But regardless, a result is a result so I'm content with it.

          [-]
          • lupsasca 1 hour ago
            Hi I am an author of the paper. We believed that a simple formula should exist but had not been able to find it despite significant effort. It was a collaborative effort but GPT definitely solved the problem for us.
            [-]
            • etraql 19 minutes ago
              Do you also work at OpenAI? A comment pointing that out was flagged by the LLM marketers.
            • buttered_toast 1 hour ago
              Oh that's really cool, I am not versed in physics by any means, can you explain how you believed there to be a simple formula but were unable to find it? What would lead you to believe that instead of just accepting it at face value?
              [-]
              • lupsasca 1 hour ago
                There are closely related "MHV amplitudes" which naively obey a really complicated formula, but for which there famously also exists a much simpler "Parke-Taylor formula". Alfredo had derived a complicated expression for these new "single-minus amplitudes" and we were hoping we could find an analogue of the simpler "Parke-Taylor formula" for them.
                [-]
                • buttered_toast 60 minutes ago
                  Thank you for taking the time to reply, I see you might have already answered this elsewhere so it's much appreciated.
        • lamontcg 2 hours ago
          That's kinda the whole point.

          SpaceX can use an optimization algorithm to hoverslam a rocket booster, but the optimization algorithm didn't really figure it out on its own.

          The optimization algorithm was used by human experts to solve the problem.

          [-]
          • Sharlin 2 hours ago
            In this case there certainly were experts doing hand-holding. But simply being able to ask the right question isn't too much to ask, is it? If it had been merely a grad student or even a PhD student who had asked ChatGPT to figure out the result, and ChatGPT had done that, even interactively with the student, this would be huge news. But an average person? Expecting LLMs to transcend the GIGO principle is a bit too much.
        • slopusila 3 hours ago
          hey, GPT, solve this tough conjecture I've read about on Quanta. make no mistakes
          [-]
          • co_king_3 2 hours ago
            [dead]
            [-]
            • terminalbraid 2 hours ago
              "Hey GPT thanks for the result. But is it actually true?"
      • jmalicki 1 hour ago
        "Grad Student did this". Co-authored by <Famous advisor 1>, <Famous advisor 2>, <Famous advisor 3>.

        Is this so different?

      • hgfda 2 hours ago
        [dead]
      • famouswaffles 3 hours ago
        The paper has all those prominent institutions who acknowledge the contribution so realistically, why would you be skeptical ?
        [-]
        • kristopolous 3 hours ago
          they probably also acknowledge pytorch, numpy, R ... but we don't attribute those tools as the agent who did the work.

          I know we've been primed by sci-fi movies and comic books, but like pytorch, gpt-5.2 is just a piece of software running on a computer instrumented by humans.

          [-]
          • famouswaffles 3 hours ago
            I don't see the authors of those libraries getting a credit on the paper, do you ?

            >I know we've been primed by sci-fi movies and comic books, but like pytorch, gpt-5.2 is just a piece of software running on a computer instrumented by humans.

            Sure

          • name_taken_duh 3 hours ago
            And we are just a system running on carbon-based biology in our physics computer run by whomever. What makes us special, to say that we are different than GPT-5.2?
            [-]
            • palmotea 3 hours ago
              > And we are just a system running on carbon-based biology in our physics computer run by whomever. What makes us special, to say that we are different than GPT-5.2?

              Do you really want to be treated like an old PC (dismembered, stripped for parts, and discarded) when your boss is done with you (i.e. not treated specially compared to a computer system)?

              But I think if you want a fuller answer, you've got a lot of reading to do. It's not like you're the first person in the world to ask that question.

            • kristopolous 2 hours ago
              It's always a value decision. You can say shiny rocks are more important than people and worth murdering over.

              Not an uncommon belief.

              Here you are saying you personally value a computer program more than people

              It exposes a value that you personally hold and that's it

              That is separate from the material reality that all this AI stuff is ultimately just computer software... It's an epistemological tautology in the same way that say, a plane, car and refrigerator are all just machines - they can break, need maintenance, take expertise, can be dangerous...

              LLMs haven't broken the categorical constraints - you've just been primed to think such a thing is supposed to be different through movies and entertainment.

              I hate to tell you but most movie AIs are just allegories for institutional power. They're narrative devices about how callous and indifferent power structures are to our underlying shared humanity

        • Refreeze5224 3 hours ago
          Their point is, would you be able to prompt your way to this result? No. Already trained physicists working at world-leading institutions could. So what progress have we really made here?
          [-]
          • famouswaffles 3 hours ago
            It's a stupid point then. Are you able to work with a world leading physicist to any significant degree? No
            [-]
            • emil-lp 2 hours ago
              It's like saying: calculator drives new result in theoretical physics

              (In the hands of leading experts.)

              [-]
              • famouswaffles 2 hours ago
                No it's not like saying that at all, which is why Open AI have a credit on the paper.
                [-]
                • camdenreslink 2 hours ago
                  Open AI have a credit on the paper because it is marketing.
                  [-]
                • not_kurt_godel 58 minutes ago
                  And even if it were, calculators (computers) were world-changing technology when they were new.
    • stouset 2 hours ago
      When chess engines were first developed, they were strictly worse than the best humans. After many years of development, they became helpful to even the best humans even though they were still beatable (1985–1997). Eventually they caught up and surpassed humans but the combination of human and computer was better than either alone (~1997–2007). Since then, humans have been more or less obsoleted in the game of chess.

      Five years ago we were at Stage 1 with LLMs with regard to knowledge work. A few years later we hit Stage 2. We are currently somewhere between Stage 2 and Stage 3 for an extremely high percentage of knowledge work. Stage 4 will come, and I would wager it's sooner rather than later.

      [-]
      • TGower 2 hours ago
        With a chess engine, you could ask any practitioner in the 90's what it would take to achieve "Stage 4" and they could estimate it quite accurately as a function of FLOPs and memory bandwidth. It's worth keeping in mind just how little we understand about LLM capability scaling. Ask 10 different AI researchers when we will get to Stage 4 for something like programming and you'll get wild guesses or an honest "we don't know".
        [-]
        • blt 10 minutes ago
          And their predictions about Go were wrong, because they thought the algorithm would forever be α-β pruning with a weak value heuristic
        • stouset 1 hour ago
          That is not what happened with chess engines. We didn’t just throw better hardware at it, we found new algorithms, improved the accuracy and performance of our position evaluation functions, discovered more efficient data structures, etc.

          People have been downplaying LLMs since the first buzzword garbage scientific paper made its way past peer review and into publication. And yet they keep getting better and better to the point where people are quite literally building projects with shockingly little human supervision.

          By all means, keep betting against them.

        • baq 2 hours ago
          Chess grandmasters are living proof that it’s possible to reach grandmaster level in chess on 20W of compute. We’ve got orders of magnitude of optimizations to discover in LLMs and/or future architectures, both software and hardware and with the amount of progress we’ve got basically every month those ten people will answer ‘we don’t know, but it won’t be too long’. Of course they may be wrong, but the trend line is clear; Moore’s law faced similar issues and they were successively overcome for half a century.

          IOW respect the trend line.

        • NitpickLawyer 1 hour ago
          > With a chess engine, you could ask any practitioner in the 90's what it would take to achieve "Stage 4" and they could estimate it quite accurately as a function of FLOPs and memory bandwidth.

          And the same practitioners said right after deep blue that go is NEVER gonna happen. Too large. The search space is just not computable. We'll never do it. And yeeeet...

      • empath75 23 minutes ago
        We are already at stage 3 for software development and arguably step 4
      • bluecalm 2 hours ago
        The evolution was also interesting: first the engines were amazing tactically but pretty bad strategically so humans could guide them. With new NN based engines they were amazing strategically but they sucked tactically (first versions of Leela Chess Zero). Today they closed the gap and are amazing at both strategy and tactics and there is nothing humans can contribute anymore - all that is left is to just watch and learn.
    • slibhb 59 minutes ago
      > In my experience LLM’s can make new things when they are some linear combination of existing things but I haven’t been to get them to do something totally out of distribution yet from first principles.

      What's the distinction between "first principles" and "existing things"?

      I'm sympathetic to the idea that LLMs can't produce path-breaking results, but I think that's true only for a strict definition of path-breaking (that is quite rare for humnans too).

    • hellisad 2 hours ago
      Hmm feels a bit trivializing, we don't know exactly how difficult it was to come up with the generic set of equations mentioned from the human starting point.

      I can claim some knowledge of physics from my degree, typically the easy part is coming up with complex dirty equations that work under special conditions, the hard part is the simplification into something elegant, 'natural' and general.

      Also "LLM’s can make new things when they are some linear combination of existing things"

      Doesn't really mean much, what is a linear combination of things you first have to define precisely what a thing is?

    • getnormality 58 minutes ago
      Insert perfunctory HN reply of "but do humans ever do anything totally out of distribution from first principles?"

      (This is deep)

    • tedd4u 45 minutes ago
      What does a 12-hour solution cost an OpenAI customer?
    • epolanski 3 hours ago
      Serious questions, I often hear about this "let the LLM cook for hours" but how do you do that in practice and how does it manages its own context? How doesn't it get lost at all after so many tokens?
      [-]
      • javier123454321 3 hours ago
        From what I've seen is a process of compacting the session once it reaches some limit, which basically means summarizing all the previous work and feeding it as the initial prompt for the next session.
      • lovecg 3 hours ago
        I’m guessing, would love someone who has first hand knowledge to comment. But my guess is it’s some combination of trying many different approaches in parallel (each in a fresh context), then picking the one that works, and splitting up the task into sequential steps, where the output of one step is condensed and is used as an input to the next step (with possibly human steering between steps)
    • ctoth 3 hours ago
      In my experience humans can make new things when they are some linear combination of existing things but I haven’t been able to get them to do something totally out of distribution yet from first principles[0].

      [0]: https://slatestarcodex.com/2019/02/19/gpt-2-as-step-toward-g...

    • bpodgursky 3 hours ago
      I don't want to be rude but like, maybe you should pre-register some statement like "LLMs will not be able to do X" in some concrete domain, because I suspect your goalposts are shifting without you noticing.

      We're talking about significant contributions to theoretical physics. You can nitpick but honestly go back to your expectations 4 years ago and think — would I be pretty surprised and impressed if an AI could do this? The answer is obviously yes, I don't really care whether you have a selective memory of that time.

      [-]
      • RandomLensman 3 hours ago
        I don't know enought about theoretical physics: what makes it a significant contribution there?
        [-]
        • terminalbraid 3 hours ago
          It's a nontrivial calculation valid for a class of forces (e.g. QCD) and apparently a serious simplification to a specific calculation that hadn't been completed before. But for what it's worth, I spent a good part of my physics career working in nucleon structure and have not run across the term "single minus amplitudes" in my memory. That doesn't necessarily mean much as there's a very broad space work like this takes place in and some of it gets extremely arcane and technical.

          One way I gauge the significance of a theory paper are the measured quantities and physical processes it would contribute to. I see none discussed here which should tell you how deep into math it is. I personally would not have stopped to read it on my arxiv catch-up

          https://arxiv.org/list/hep-th/new

          Maybe to characterize it better, physicists were not holding their breath waiting for this to get done.

          [-]
        • epolanski 3 hours ago
          Not every contribution has immediate impact.
          [-]
          • terminalbraid 3 hours ago
            That doesn't answer the question. That statement just admits "maybe" which isn't helpful or insightful to answering it.
      • outlace 3 hours ago
        I never said LLMs will not be able to do X. I gave my summary of the article and my anecdotal experiences with LLMs. I have no LLM ideology. We will see what tomorrow brings.
      • nozzlegear 3 hours ago
        > We're talking about significant contributions to theoretical physics.

        Whoever wrote the prompts and guided ChatGPT made significant contributions to theoretical physics. ChatGPT is just a tool they used to get there. I'm sure AI-bloviators and pelican bike-enjoyers are all quite impressed, but the humans should be getting the research credit for using their tools correctly. Let's not pretend the calculator doing its job as a calculator at the behest of the researcher is actually a researcher as well.

        [-]
        • famouswaffles 3 hours ago
          If this worked for 12 hours to derive the simplified formula along with its proof then it guided itself and made significant contributions by any useful definition of the word, hence Open AI having an author credit.
          [-]
          • nozzlegear 3 hours ago
            > hence Open AI having an author credit.

            How much precedence is there for machines or tools getting an author credit in research? Genuine question, I don't actually know. Would we give an author credit to e.g. a chimpanzee if it happened to circle the right page of a text book while working with researchers, leading them to a eureka moment?

            [-]
            • floxy 3 hours ago
              >How much precedence is there for machines or tools getting an author credit in research?

              For a datum of one, the mathematician Doron Zeilberger give credit to his computer Shalosh B. Ekhad on select papers.

              https://medium.com/@miodragpetkovic_24196/the-computer-a-mys...

              https://sites.math.rutgers.edu/~zeilberg/akherim/EkhadCredit...

              https://sites.math.rutgers.edu/~zeilberg/pj.html

              [-]
              • nozzlegear 2 hours ago
                Interesting (and an interesting name for the computer too), thanks!
            • steveklabnik 2 hours ago
              Not exactly the same thing, but I know of at least two professors that would try to list their cats as co-authors:

              https://en.wikipedia.org/wiki/F._D._C._Willard

              https://en.wikipedia.org/wiki/Yuri_Knorozov

              [-]
            • kuboble 3 hours ago
              I have seem stuff like "you can use my program if you will make me a co-author".

              That usually comes up with some support usually.

            • famouswaffles 3 hours ago
              >How much precedence is there for machines or tools getting an author credit in research?

              Well what do you think ? Do the authors (or a single symbolic one) of pytorch or numpy or insert <very useful software> typically get credits on papers that utilize them heavily? Well Clearly these prominent institutions thought GPT's contribution significant enough to warrant an Open AI credit.

              >Would we give an author credit to e.g. a chimpanzee if it happened to circle the right page of a text book while working with researchers, leading them to a eureka moment?

              Cool Story. Good thing that's not what happened so maybe we can do away with all these pointless non sequiturs yeah ? If you want to have a good faith argument, you're welcome to it, but if you're going to go on these nonsensical tangents, it's best we end this here.

              [-]
              • nozzlegear 3 hours ago
                > Well what do you think ? Do the authors (or a single symbolic one) of pytorch or numpy or insert <very useful software> typically get credits on papers that utilize them heavily ?

                I don't know! That's why I asked.

                > Well Clearly these prominent institutions thought GPT's contribution significant enough to warrant an Open AI credit.

                Contribution is a fitting word, I think, and well chosen. I'm sure OpenAI's contribution was quite large, quite green and quite full of Benjamins.

                > Cool Story. Good thing that's not what happened so maybe we can do away with all these pointless non sequiturs yeah ? If you want to have a good faith argument, you're welcome to it, but if you're going to go on these nonsensical tangents, it's best we end this here.

                It was a genuine question. What's the difference between a chimpanzee and a computer? Neither are humans and neither should be credited as authors on a research paper, unless the institution receives a fat stack of cash I guess. But alas Jane Goodall wasn't exactly flush with money and sycophants in the way OpenAI currently is.

                [-]
                • famouswaffles 2 hours ago
                  >I don't know! That's why I asked.

                  If you don't read enough papers to immediately realize it is an extremely rare occurrence then what are you even doing? Why are you making comments like you have the slightest clue of what you're talking about? including insinuating the credit was what...the result of bribery?

                  You clearly have no idea what you're talking about. You've decided to accuse prominent researchers of essentially academic fraud with no proof because you got butthurt about a credit. You think your opinion on what should and shouldn't get credited matters ? Okay

                  I've wasted enough time talking to you. Good Day.

                  [-]
                  • nozzlegear 2 hours ago
                    Do I need to be credentialed to ask questions or point out the troubling trend of AI grift maxxers like yourself helping Sam Altman and his cronies further the myth of AGI by pretending a machine is a researcher deserving of a research credit? This is marketing, pure and simple. Close the simonw substack for a second and take an objective view of the situation.
            • slopusila 3 hours ago
              it's called ethics and research integrity. not crediting GPT would be a form of misrepresentation
              [-]
              • nozzlegear 3 hours ago
                Would it? I think there's a difference between "the researchers used ChatGPT" and "one of the researchers literally is ChatGPT." The former is the truth, and the latter is the misrepresentation in my eyes.

                I have no problem with the former and agree that authors/researchers must note when they use AI in their research.

                [-]
                • slopusila 2 hours ago
                  now you are debating exactly how GPT should be credited. idk, I'm sure the field will make up some guidance

                  for this particular paper it seems the humans were stuck, and only AI thinking unblocked them

                  [-]
                  • nozzlegear 2 hours ago
                    > now you are debating exactly how GPT should be credited. idk, I'm sure the field will make up some guidance

                    In your eyes maybe there's no difference. In my eyes, big difference. Tools are not people, let's not further the myth of AGI or the silly marketing trend of anthropomorphizing LLMs.

        • bpodgursky 3 hours ago
          If a helicopter drops someone off on the top of Mount Everest, it's reasonable to say that the helicopter did the work and is not just a tool they used to hike up the mountain.
          [-]
          • nozzlegear 3 hours ago
            Who piloted the helicopter in this scenario, a human or chatgpt? You'd say the pilot dropped them off in a helicopter. The helicopter didn't fly itself there.
            [-]
            • bpodgursky 3 hours ago
              “They have chosen cunning instead of belief. Their prison is only in their minds, yet they are in that prison; and so afraid of being taken in that they cannot be taken out.”

              ― C.S. Lewis, The Last Battle

              [-]
              • nozzlegear 2 hours ago
                "For me, it is far better to grasp the universe as it really is than to persist in delusion, however satisfying and reassuring."

                — Carl Sagan

                [-]
                • bpodgursky 2 hours ago
                  I read the narnia series many times as a kid and this one stuck with me, I didn't prompt for it.

                  I have no real way to demonstrate that I'm telling the truth, but I am ¯\_(ツ)_/¯

                  [-]
                  • nozzlegear 1 hour ago
                    Sorry for the assumption. For what it's worth, I read one of Sagan's books last year, but pulled the quote from Goodreads :P
    • bottlepalm 3 hours ago
      Is every new thing not just combinations of existing things? What does out of distribution even mean? What advancement has ever made that there wasn’t a lead up of prior work to it? Is there some fundamental thing that prevents AI from recombining ideas and testing theories?
      [-]
      • fpgaminer 3 hours ago
        > Is every new thing not just combinations of existing things?

        If all ideas are recombinations of old ideas, where did the first ideas come from? And wouldn't the complexity of ideas be thus limited to the combined complexity of the "seed" ideas?

        I think it's more fair to say that recombining ideas is an efficient way to quickly explore a very complex, hyperdimensional space. In some cases that's enough to land on new, useful ideas, but not always. A) the new, useful idea might be _near_ the area you land on, but not exactly at. B) there are whole classes of new, useful ideas that cannot be reached by any combination of existing "idea vectors".

        Therefore there is still the necessity to explore the space manually, even if you're using these idea vectors to give you starting points to explore from.

        All this to say: Every new thing is a combination of existing things + sweat and tears.

        The question everyone has is, are current LLMs capable of the latter component. Historically the answer is _no_, because they had no real capacity to iterate. Without iteration you cannot explore. But now that they can reliably iterate, and to some extent plan their iterations, we are starting to see their first meaningful, fledgling attempts at the "sweat and tears" part of building new ideas.

        [-]
        • drdeca 34 minutes ago
          Well, what exactly an “idea” is might be a little unclear, but I don’t think it clear that the complexity of ideas that result from combining previously obtained ideas would be bounded by the complexity of the ideas they are combinations of.

          Any countable group is a quotient of a subgroup of the free group on two elements, iirc.

          There’s also the concept of “semantic primes”. Here is a not-quite correct oversimplification of the idea: Suppose you go through the dictionary and one word at a time pick a word whose definition includes only other words that are still in the dictionary, and removing them. You can also rephrase definitions before doing this, as long as it keeps the same meaning. Suppose you do this with the goal of leaving as few words in it as you can. In the end, you should have a small cluster of a bit over 100 words, in terms of which all the other words you removed can be indirectly defined. (The idea of semantic primes also says that there is such a minimal set which translates essentially directly* between different natural languages.)

          I don’t think that says that words for complicated ideas aren’t like, more complicated?

        • red75prime 2 hours ago
          "Sweat and tears" -> exploration and the training signal for reinforcement learning.
      • outlace 3 hours ago
        For example, ever since the first GPT 4 I’ve tried to get LLM’s to build me a specific type of heart simulation that to my knowledge does not exist anywhere on the public internet (otherwise I wouldn’t try to build it myself) and even up to GPT 5.3 it still cannot do it.

        But I’ve successfully made it build me a great Poker training app, a specific form that also didn’t exist, but the ingredients are well represented on the internet.

        And I’m not trying to imply AI is inherently incapable, it’s just an empirical (and anecdotal) observation for me. Maybe tomorrow it’ll figure it out. I have no dogmatic ideology on the matter.

    • amelius 3 hours ago
      Just wait until LLMs are fast and cheap enough to be run in a breadth first search kind of way, with "fuzzy" pruning.
    • verdverm 3 hours ago
      [flagged]
      [-]
      • buttered_toast 3 hours ago
        Absolutely no way this is true right? Ilya left around the time 4o was released. I can't imagine they haven't had a single successful run since then.
        [-]
        • verdverm 3 hours ago
          When's the last time they talked about it?

          I heard this from people who know more than me

          [-]
          • buttered_toast 3 hours ago
            Can't say, just seems implausible, but I am a nobody anyways ¯\_(ツ)_/¯
            [-]
            • verdverm 54 minutes ago
              I'm pretty sure it is widely known that the early 5.x series were built from 4.5 (unreleased). It seems more plausible the 5.x series is still in that continuation.

              For some extra context, pre-training is ~1/3 of the training, where it gains the basic concepts of how tokens go together. Mid & late training are where you instill the kinds of anthropic behaviors we see today. I expect pre-training to increasingly become a lower percentage of overall training, putting aside any shifts of what happens in each phase.

              So to me, it is plausible they can take the 4.x pre-training and keep pushing in the later phases. There is a lot of results out there to show scaling laws (limits) have not peaked yet. I would not be surprised to learn that Gemini 3 Deep Research had 50% late-training / RL

              [-]
              • buttered_toast 41 minutes ago
                Okay I see what you mean, and yeah that sounds reasonable too. Do you have any context on that first part? I would like to know more about how/why they might not have been able to pursue more training runs.
                [-]
                • verdverm 13 minutes ago
                  I have not done it myself (don't have the dinero), but my understanding is that there are many runs, restarts, and adjustments at this phase. It's surprisingly more fragile than we know aiui

                  If you already have a good one, it's not likely much has changed since a year ago that would create meaningful differences at this phase (in data, arch is diff, I know less here). If it is indeed true, it's a datapoint to add to the others singling internal (everybody has some amount of this, not good when it makes the headlines)

                  Distillation is also a powerful training method. There are many ways to stay with the pack without having new pre-training runs. It's pretty much what we see from all of them with the minor versions. So coming back to it, the speculation is that OpenAi is still on their 4.x pre-train, but that doesn't impede all progress

  • Davidzheng 3 hours ago
    "An internal scaffolded version of GPT‑5.2 then spent roughly 12 hours reasoning through the problem, coming up with the same formula and producing a formal proof of its validity."

    When I use GPT 5.2 Thinking Extended, it gave me the impression that it's consistent enough/has a low enough rate of errors (or enough error correcting ability) to autonomously do math/physics for many hours if it were allowed to [but I guess the Extended time cuts off around 30 minute mark and Pro maybe 1-2 hours]. It's good to see some confirmation of that impression here. I hope scientists/mathematicians at large will be able to play with tools which think at this time-scale soon and see how much capabilities these machines really have.

    [-]
    • mmaunder 3 hours ago
      Yes and 5.3 and the latest codex cli client is incredibly good across compactions. Anyone know the methodology they're using to maintain state and manage context for a 12 hour run? It could be as simple as a single dense document and its own internal compaction algrorithm, I guess.
      [-]
    • slopusila 3 hours ago
      after those 30 min you can manually ask it again to continue working on the problem
      [-]
      • Davidzheng 2 hours ago
        It's a bit unclear to me what happens if I do that after it thinks for 30 minutes and ends with no response. Does it start off where it left off? Does it start from scratch again? Like I don't know how the compaction of their prior thinking traces work
  • square_usual 3 hours ago
    It's interesting to me that whenever a new breakthrough in AI use comes up, there's always a flood of people who come in to handwave away why this isn't actually a win for LLMs. Like with the novel solutions GPT 5.2 has been able to find for erdos problems - many users here (even in this very thread!) think they know more about this than Fields medalist Terence Tao, who maintains this list showing that, yes, LLMs have driven these proofs: https://github.com/teorth/erdosproblems/wiki/AI-contribution...
    [-]
    • loire280 2 hours ago
      It's easy to fall into a negative mindset when there are legions of pointy haired bosses and bandwagoning CEOs who (wrongly) point at breakthroughs like this as justification for AI mandates or layoffs.
      [-]
      • dakolli 44 minutes ago
        Yes, all of these stories, and frequent model releases are just intended to psyop "decision makers" into validating their longstanding belief that the labour shouldn't be as big of a line item in a companies expenses, and perhaps can be removed altogether.. They can finally go back to the good old days of having slaves (in the form of "agentic" bots), they yearn to own slaves again.

        CEOs/decision makers would rather give all their labour budget to tokens if they could just to validate this belief. They are bitter that anyone from a lower class could hold any bargaining chips, and thus any influence over them. It has nothing to do with saving money, they would gladly pay the exact same engineering budget to Anthropic for tokens (just like the ruling class in times past would gladly pay for slaves) if it can patch that bitterness they have for the working class's influence over them.

        The inference companies (who are also from this same class of people) know this, and are exploiting this desire. They know if they create the idea that AI progress is at an unstoppable velocity decision makers will begin handing them their engineering budgets. These things don't even have to work well, they just need to be perceived as effective, or soon to be for decision makers to start laying people off.

        I suspect this is going to backfire on them in one of two ways.

        1. French Revolution V2, they all get their heads cutoff in 15 years, or an early retirement on a concrete floor.

        2. Many decisions makers will make fools of themselves, destroy their businesses and come begging to the working class for our labor, giving the working class more bargaining chips in the process.

        Either outcome is going to be painful for everyone, lets hope people wake up before we push this dumb experiment too far.

    • lovecg 2 hours ago
      Let’s have some compassion, a lot of people are freaking out about their careers now and defense mechanisms are kicking in. It’s hard for a lot of people to say “actually yeah this thing can do most of my work now, and barrier of entry dropped to the ground”.
      [-]
      • Toutouxc 10 minutes ago
        I am constantly seeing this thing do most of my work (which is good actually, I don't enjoy typing code), but requiring my constant supervision and frequent intervention and always trying to sneak in subtle bugs or weird architectural decisions that, I feel with every bone in my body, would bite me in the ass later. I see JS developers with little experience and zero CS or SWE education rave about how LLMs are so much better than us in every way, when the hardest thing they've ever written was bubble sort. I'm not even freaking about my career, I'm freaking about how much today's "almost good" LLMs can empower incompetence and how much damage that could cause to systems that I either use or work on.
      • dakolli 29 minutes ago
        Yeah but you know what, this is a complete psyop.

        They just want people to think the barrier of entry has dropped to the ground and that value of labour is getting squashed, so society writes a permission slip for them to completely depress wages and remove bargaining chips from the working class.

        Don't fall for this, they want to destroy any labor that deals with computer I/0, not just SWE. This is the only value "agentic tooling" provides to society, slaves for the ruling class. They yearn for the opportunity to own slaves again.

        It can't do most of your work, and you know that if you work on anything serious. But If C-suite who hasn't dealt with code in two decades, thinks this is the case because everyone is running around saying its true they're going to make sure they replace humans with these bot slaves, they really do just want slaves, they have no intention of innovating with these slaves. People need to work to eat, now unless LLMs are creating new types of machines that need new types of jobs, like previous forms of automation, then I don't see why they should be replacing the human input.

        If these things are so good for business, and are pushing software development velocity.. Why is everything falling apart? Why does the bulk of low stakes software suck. Why is Windows 11 so bad? Why aren't top hedge funds, medical device manufactures (places where software quality is high stakes) replacing all their labor? Where are the new industries? They don't do anything novel, they only serve to replace inputs previously supplied by humans so the ruling class can finally get back to good old feeling of having slaves that can't complain.

    • epolanski 3 hours ago
      It's an obvious tension created by the title.

      The reality is: "GPT 5.2 found a more general and scalable form of an equation, after crunching for 12 hours supervised by 4 experts in the field".

      Which is equivalent to taking some of the countless niche algorithms out there and have few experts in that algo have LLMs crunch tirelessly till they find a better formula. After same experts prompted it in the right direction and with the right feedback.

      Interesting? Sure. Speaks highly of AI? Yes.

      Does it suggest that AI is revolutionizing theoretical physics on its own like the title does? Nope.

      [-]
      • jdthedisciple 1 hour ago
        > GPT 5.2 after crunching 12 hours mathematical formulas supervised and prompted by 4 experts in the field

        Yet, if some student or child achieved the same – under equal supervision – we would call him the next Einstein.

        [-]
        • epolanski 1 hour ago
          We would not call him at all because it would be one of the many millions that went through projects like this for their thesis as physics or math graduates.

          One of my best friends in his bachelor thesis had solved a difficult mathematical problem in planet orbits or something, and it was just yet another random day in academia.

          And she didn't solve it because she was a genius but because there's a bazillions such problems out there and little time to look at them and focus. Science is huge.

    • hgfda 2 hours ago
      It is not only the the peanut gallery that is skeptical:

      https://www.math.columbia.edu/~woit/wordpress/?p=15362

      Let's wait a couple of days whether there has been a similar result in the literature.

      [-]
      • gjm11 2 hours ago
        For the sake of clarity: Woit's post is not about the same alleged instance of GPT producing new work in theoretical physics, but about an earlier one from November 2025. Different author, different area of theoretical physics.
        [-]
        • etraql 24 minutes ago
          This thread is about "whenever a new breakthrough in AI use comes up", and the comment you reply to correctly points out skepticism for the general case and does not claim any relation to the current case.

          You reached your goal though and got that comment downvoted.

  • cpard 2 hours ago
    AI can be an amazing productivity multiplier for people who know what they're doing.

    This result reminded me of the C compiler case that Anthropic posted recently. Sure, agents wrote the code for hours but there was a human there giving them directions, scoping the problem, finding the test suites needed for the agentic loops to actually work etc etc. In general making sure the output actually works and that it's a story worth sharing with others.

    The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding. It works great for creating impressions and building brand value but also does a disservice to the actual researchers, engineers and humans in general, who do the hard work of problem formulation, validation and at the end, solving the problem using another tool in their toolbox.

    [-]
    • supern0va 1 hour ago
      >AI can be an amazing productivity multiplier for people who know what they're doing.

      >[...]

      >The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding.

      You're sort of acting like it's all or nothing. What about the the humans that used to be that "force multiplier" on a team with the person guiding the research?

      If a piece of software required a team of ten to people, and instead it's built with one engineer overseeing an AI, that's still 90% job loss.

      For a more current example: do you think all the displaced Uber/Lyft drivers aren't going to think "AI took my job" just because there's a team of people in a building somewhere handling the occasional Waymo low confidence intervention, as opposed to being 100% autonomous?

      [-]
      • bagacrap 44 minutes ago
        Well those Uber drivers are usually pretty quick to note that Uber is not their job, just a side hustle. It's too bad I won't know what they think by then since we won't be interacting any more.
    • jonahx 2 hours ago
      > The "AI replaces humans in X" narrative is primarily a tool for driving attention and funding.

      It's also a legitimate concern. We happen to be in a place where humans are needed for that "last critical 10%," or the first critical 10% of problem formulation, and so humans are still crucial to the overall system, at least for most complex tasks.

      But there's no logical reason that needs to be the case. Once it's not, humans will be replaced.

      [-]
      • cpard 2 hours ago
        The reason there is a marketing opportunity is because, to your point, there is a legitimate concern. Marketing builds and amplifies the concern to create awareness.

        When the systems turn into something trivial to manage with the new tooling, humans build more complex or add more layers on the existing systems.

    • decidu0us9034 2 hours ago
      I'm not sure you can call something an optimizing C compiler if it doesn't optimize or enforce C semantics (well, it compiles C but also a lot of things that aren't syntactically valid C). It seemed to generate a lot of code (wow!) that wasn't well-integrated and didn't do what it promised to, and the human didn't have the requisite expertise to understand that. I'm not a theoretical physicist but I will hold to my skepticism here, for similar reasons.
      [-]
      • cpard 1 hour ago
        sure, I won't argue on this, although it did manage to deliver the marketing value they were looking for, at the end their goal was not to replace gcc but to make people talk about AI and Anthropic.

        What I said in my original comment is that AI delivers when it's used by experts, in this case there was someone who was definitely not a C compiler expert, what would happen if there was a real expert doing this?

        [-]
    • elzbardico 2 hours ago
      Actually, the results were far worse and way less impressive than what the media said.
      [-]
      • cpard 2 hours ago
        the c compiler results or the physics results this post is about?
        [-]
        • elzbardico 53 minutes ago
          The C compiler.
        • NewsaHackO 1 hour ago
          His point is going to be some copium like since the c compiler is not as optimized as gcc, it was not impressive.
          [-]
          • elzbardico 53 minutes ago
            You probably don’t know what you’re talking about.
            [-]
            • NewsaHackO 26 minutes ago
              Why wasn't the C compiler it made impressive to you?
    • kylehotchkiss 2 hours ago
      > for people who know what they're doing.

      I worry we're not producing as many of those as we used to

      [-]
      • blks 2 hours ago
        We will be producing them even less. I fear for the future graduates, hell even for school children, who are now uncontrollably using ChatGPT for their homework. Next level brainrot
    • fragmede 2 hours ago
      Right. If it hadn't been Nicholas Carlini driving Claude, with his decades of experience, there wouldn't be a Claude c compiler. It still required his expertise and knowledge for it to get there.
  • computator 28 minutes ago
    I have a weird long-shot idea for GPT to make a new discovery in physics: Ask it to find a mathematical relationship between some combination of the fundamental physical constants[1]. If it finds (for example) a formula that relates electron mass, Bohr radius, and speed of light to a high degree of precision, that might indicate an area of physics to explore further if those constants were thought to be independent.

    [1] https://en.wikipedia.org/wiki/List_of_physical_constants

    [-]
    • fsh 8 minutes ago
      The Bohr radius is the result of a simple classical physics calculation (a common exercise for undergraduates in their first year). It depends only on the electron mass and the fine structure constant which is the strength of the electromagnetic interaction. In the SI system, the speed of light has a fixed value which defines the unit of length.
    • lich_king 25 minutes ago
      There are known mathematical relationships between almost all fundamental physical constants? In particular, in your example, Bohr radius is calculated from electron mass and the speed of light in vacuum... I don't think this path is as promising as it sounds.
  • nilkn 3 hours ago
    It would be more accurate to say that humans using GPT-5.2 derived a new result in theoretical physics (or, if you're being generous, humans and GPT-5.2 together derived a new result). The title makes it sound like GPT-5.2 produced a complete or near-complete paper on its own, but what it actually did was take human-derived datapoints, conjecture a generalization, then prove that generalization. Having scanned the paper, this seems to be a significant enough contribution to warrant a legitimate author credit, but I still think the title on its own is an exaggeration.
  • Insanity 4 hours ago
    They also claimed ChatGPT solved novel erdös problems when that wasn’t the case. Will take with a grain of salt until more external validation happened. But very cool if true!
    [-]
    • famouswaffles 3 hours ago
      Well they (OpenAI) never made such a claim. And yes, LLMs have made unique solutions/contributions to a few erdos problems.
    • smokel 3 hours ago
      How was that not the case? As far as I understand it ChatGPT was instrumental to solving a problem. Even if it did not entirely solve it by itself, the combination with other tools such as Lean is still very impressive, no?
      [-]
      • emil-lp 3 hours ago
        It didn't solve it, it simply found that it had been solved in a publication and that the list of open problems wasn't updated.
        [-]
        • Davidzheng 3 hours ago
          My understanding is there's been around 10 erdos problems solved by GPT by now. Most of them have been found to be either in literature or a very similar problem was solved in literature. But one or two solutions are quite novel.

          https://github.com/teorth/erdosproblems/wiki/AI-contribution... may be useful

          [-]
          • carefree-bob 1 hour ago
            I am not aware of any unsolved Erdos problem that was solved via an LLM. I am aware of LLMs contributing to variations on known proofs of previously solved Erdos problems. But the issue with having an LLM combine existing solutions or modify existing published solutions is that the previous solutions are in the training data of the LLM, and in general there are many options to make variations on known proofs. Most proofs go through many iterations and simplifications over time, most of which are not sufficiently novel to even warrant publication. The proof you read in a textbook is likely a highly revised and simplified proof of what was first published.

            If I'm wrong, please let me know which previously unsolved problem was solved, I would be genuinely curious to see an example of that.

            [-]
            • Davidzheng 59 minutes ago
              It's in the link above, but you can look at #1051 or #851 on the erdosproblems website.
              [-]
              • carefree-bob 25 minutes ago
                The erdosproblems website shows 851 was proved in 1934. https://www.erdosproblems.com/851

                I guess 1051 qualifies - from the paper: "Semi-autonomous mathematical discovery with gemini" https://arxiv.org/pdf/2601.22401

                "We tentatively believe Aletheia’s solution to Erdős-1051 represents an early example of an AI system autonomously resolving a slightly non-trivial open Erdős problem of somewhat broader (mild) mathematical interest, for which there exists past literature on closely-related problems [KN16], but none fully resolves Erdős-1051. Moreover, it does not appear to us that Aletheia’s solution is directly inspired by any previous human argument (unlike in many previously discussed cases), but it does appear to involve a classical idea of moving to the series tail and applying Mahler’s criterion. The solution to Erdős-1051 was generalized further, in a collaborative effort by Aletheia together with human mathematicians and Gemini Deep Think, to produce the research paper [BKK+26]."

          • emp17344 3 hours ago
            Some of these were initially hyped as novel solutions, and then were quietly downgraded after it was discovered the solutions weren’t actually novel.
            [-]
            • Insanity 2 hours ago
              Yeah that was also my take-away when I was following the developments on it. But then again I don't follow it very closely so _maybe_ some novel solutions are discovered. But given how LLMs work, I'm skeptical about that.
          • compass_copium 54 minutes ago
            ...am I wrong in thinking that 1(a) is the relevant section here, and shows a lot of red?
            [-]
            • Davidzheng 40 minutes ago
              I honestly don't see the point of the red data points. By now all the erdos problems have been attempted by AIs--so every unsolved one can be a red data point.
    • vonneumannstan 3 hours ago
      Wasnt that like some marketing bro? This is coming out the front door with serious physicists attached.
  • mym1990 2 hours ago
    Many innovations are built off cross pollination of domains and I think we are not too far off from having a loop where multiple agents grounded very well in specific domains can find intersections and optimizations by communicating with each other, especially if they are able to run for 12+ hours. The truth is that 99% of attempts at innovation will fail, but the 1% can yield something fantastic, the more attempts we can take, the faster progress will happen.
    [-]
    • alansaber 16 minutes ago
      I find it hard not to agree with this line of thinking (albeit will be less than 1%)
  • pear01 29 minutes ago
    If a researcher uses LLM to get a novel result should the llm also reap the rewards? Could a nobel prize ever be given to a llm or is that like giving a nobel to a calculator?
  • vbarrielle 3 hours ago
    I' m far from being an LLM enthusiast, but this is probably the right use case for this technology: conjectures which are hard to find, but then the proof can be checked with automated theorem provers. Isn't it what AlphaProof does by the way?
  • major4x 1 hour ago
  • elashri 3 hours ago
    I would be less interested in scattering amplitude of all particle physics concepts as a test case because the scattering amplitudes because it is one of the concisest definition and its solution is straightforward (not easy of course). So once you have a good grasp of the QM and the scattering then it is a matter of applying your knowledge of math to solve the problem. Usually the real problem is to actually define your parameters from your model and define the tree level calculations. Then for LLM to solve these it is impressive but the researchers defined everything and came up with the workflow.

    So I would read this (with more information available) with less emphasize on LLM discovering new result. The title is a little bit misleading but actually "derives" being the operative word here so it would be technically correct for people in the field.

  • crorella 3 hours ago
  • another_twist 1 hour ago
    Thats great. I think we need to start researching how to get cheaper models to do math. I have a hunch it should be possible to get leaner models to achieve these results with the right sort of reinforcement learning.
    [-]
  • jtrn 33 minutes ago
    This is my favorite field for me to have opinions about, without not having any training or skill. Fundamental research i just a something I enjoy thinking about, even tho I am psychologist. I try to pull inn my experience from the clinic and clinical research when i read theoretical physics. Don't take this text to seriously, its just my attempt at understanding whats going on.

    I am generally very skeptical about work on this level of abstraction. only after choosing Klein signature instead of physical spacetime, complexifying momenta, restricting to a "half-collinear" regime that doesn't exist in our universe, and picking a specific kinematic sub-region. Then they check the result against internal consistency conditions of the same mathematical system. This pattern should worry anyone familiar with the replication crisis. The conditions this field operates under are a near-perfect match for what psychology has identified as maximising systematic overconfidence: extreme researcher degrees of freedom (choose your signature, regime, helicity, ordering until something simplifies), no external feedback loop (the specific regimes studied have no experimental counterpart), survivorship bias (ugly results don't get published, so the field builds a narrative of "hidden simplicity" from the survivors), and tiny expert communities where fewer than a dozen people worldwide can fully verify any given result.

    The standard defence is that the underlying theory — Yang-Mills / QCD — is experimentally verified to extraordinary precision. True. But the leap from "this theory matches collider data" to "therefore this formula in an unphysical signature reveals deep truth about nature" has several unsupported steps that the field tends to hand-wave past.

    Compare to evolution: fossils, genetics, biogeography, embryology, molecular clocks, observed speciation — independent lines of evidence from different fields, different centuries, different methods, all converging. That's what robust external validation looks like. "Our formula satisfies the soft theorem" is not that.

    This isn't a claim that the math is wrong. It's a claim that the epistemic conditions are exactly the ones where humans fool themselves most reliably, and that the field's confidence in the physical significance of these results outstrips the available evidence.

    I wrote up a more detailed critique in a substack: https://jonnordland.substack.com/p/the-psychologists-case-ag...

  • emp17344 3 hours ago
    Cynically, I wonder if this was released at this time to ward off any criticism from the failure of LLMs to solve the 1stproof problems.
  • pruufsocial 3 hours ago
    All I saw was gravitons and thought we’re finally here the singularity has begun
  • snarky123 3 hours ago
    So wait,GPT found a formula that humans couldn't,then the humans proved it was right? That's either terrifying or the model just got lucky. Probably the latter.
    [-]
    • JasonADrury 3 hours ago
      > found a formula that humans couldn't

      Couldn't is an immensely high bar in this context, didn't seems more appropriate and renders this whole thing slightly less exciting.

      [-]
      • vessenes 3 hours ago
        I'd say "couldn't in 20 hours" might be more defensible. Depends on how many humans though. "couldn't in 20 GPT watt-hours" would give us like 2,000 humans or so.
  • getnormality 1 hour ago
    I'll believe it when someone other than OpenAI says it.

    Not saying they're lying, but I'm sure it's exaggerated in their own report.

  • baalimago 3 hours ago
    Well, anyone can derive a new result in anything. Question is most often if the result makes any sense
  • sfmike 2 hours ago
    5.2 is the best model on the market.
  • PlatoIsADisease 2 hours ago
    I'll read the article in a second, but let me guess ahead of time: Induction.

    Okay read it: Yep Induction. It already had the answer.

    Don't get me wrong, I love Induction... but we aren't having any revolutions in understanding with Induction.

  • ares623 3 hours ago
    I guess the important question is, is this enough news to sustain OpenAI long enough for their IPO?
    [-]
    • danny_codes 2 hours ago
      Well it’ll be at least a whole month before some other company announces similar capability. The moat will hold!
      [-]
      • dyauspitr 2 hours ago
        I believe Gemini holds the moat now.
  • gaigalas 3 hours ago
    I like the use of the word "derives". However, it gets outshined by "new result" in public eyes.

    I expect lots of derivations (new discoveries whose pieces were already in place somewhere, but no one has put them together).

    In this case, the human authors did the thinking and also used the LLM, but this could happen without the original human author too (some guy posts some partial on the internet, no one realizes is novel knowledge, gets reused by AI later). It would be tremendously nice if credit was kept in such possible scenarios.

  • vonneumannstan 3 hours ago
    Interesting considering the Twitter froth recently about AI being incapable in principle of discovering anything.
    [-]
    • baq 3 hours ago
      Anything but recent.
  • mrguyorama 2 hours ago
    Don't lend much credence to a preprint. I'm not insinuating fraud, but plenty of preprints turn out to be "Actually you have a math error here", or are retracted entirely.

    Theoretical physics is throwing a lot of stuff at the wall and theory crafting to find anything that might stick a little. Generation might actually be good there, even generation that is "just" recombining existing ideas.

    I trust physicists and mathematicians to mostly use tools because they provide benefit, rather than because they are in vogue. I assume they were approached by OpenAI for this, but glad they found a way to benefit from it. Physicists have a lot of experience teasing useful results out of probabilistic and half broken math machines.

    If LLMs end up being solely tools for exploring some symbolic math, that's a real benefit. Wish it didn't involve destroying all progress on climate change, platforming truly evil people, destroying our economy, exploiting already disadvantaged artists, destroying OSS communities, enabling yet another order of magnitude increase in spam profitability, destroying the personal computer market, stealing all our data, sucking the oxygen out of investing into real industry, and bold faced lies to all people about how these systems work.

    Also, last I checked, MATLAB wasn't a trillion dollar business.

    Interestingly, the OpenAI wrangler is last in the list of Authors and acknowledgements. That somewhat implies the physicists don't think it deserves much credit. They could be biased against LLMs like me.

    When Victor Ninov (fraudulently) analyzed his team's accelerator data using an existing software suite to find a novel SuperHeavy element, he got first billing on the authors list. Probably he contributed to the theory and some practical work, but he alone was literate in the GOOSY data tool. Author lists are often a political game as well as credit, but Victor got top billing above people like his bosses, who were famous names. The guy who actually came up with the idea of how to create the element, in an innovative recipe that a lot of people doubted, was credited 8th

    https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.83...

  • brcmthrowaway 3 hours ago
    End times approach..
  • starkeeper 3 hours ago
    [flagged]
  • baggachipz 3 hours ago
    [flagged]
  • starkeeper 3 hours ago
    [flagged]
  • longfacehorrace 3 hours ago
    Car manufacturers need to step up their hype game...

    New Honda Civic discovered Pacific Ocean!

    New F150 discovers Utah Salt Flats!

    Sure it took humans engineering and operating our machines, but the car is the real contributor here!