Rendered at 20:18:24 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.
aleqs 2 hours ago [-]
Okay, so anthropic has amazing AI which supposedly writes most of their code and can continuously improve... meanwhile they have outages on a regular basis, and any kind of long-running work will now consistently hit 'API Error: Server is temporarily limiting requests'. Not sure of this is intentional to force a reduction of token usage, but at this point I need to build around these throttling limits and outages with my own tools to restart/resume sessions. From my experience, in the last 2 weeks, literally 100% of any non-trivial Claude session/work will now be blocked on these issues, requiring manual intervention.
One of my focuses now is my own model-agnostic, harness and workflow orchestration (I know everyone is building these) , baselining on opus, and aiming to transition to Chinese models like deepseek in the short term and hopefully open, self hosted models in the future (which I plan to open source).
The nonstop marketing fluff from anthropic while their service quality and availability noticeably degrades... just continues to destroy my trust in the company.
qsort 12 minutes ago [-]
Look, I've never been someone who mindlessly hypes AI companies, as a matter of fact I think they have serious leadership problems across the board, but you people are straw-manning them so badly it actually makes me sympathize with them.
They aren't saying they have fully automated luxury AGI, they specifically list the ways models fall short of that bar and caution against people taking the 8x figure as the actual uplift number. At the same time they recognize that 80% of new code is now AI-authored, when two years ago those models were little more than toys. And frankly that checks out: if two years ago you told me we'd have something like Opus 4.8/GPT 5.5 I would have rolled to disbelieve.
f311a 35 minutes ago [-]
Infrastructure is a much harder problem. They can't even improve Claude Code, which eats 1GB+ of RAM. Meanwhile, my editor only consumes 80MB of RAM.
airstrike 14 minutes ago [-]
This might explain it, in the opposite way it was meant to:
Also remember when XP was super bloated cause it needed 64MB?
TimMeade 13 minutes ago [-]
I loved Turbo Pascal....
6 minutes ago [-]
aagha 2 hours ago [-]
And don't forget that they have BILLIONS of dollars and can't figure out how to get a decent support or public communications system setup.
aleqs 1 hours ago [-]
They can't even seem to get their usage metering consistent.
thinkingtoilet 46 minutes ago [-]
Don't confuse things. It's not "can't figure out", it's "don't care to figure out". They're not dumb. They just don't care about support.
contagiousflow 40 minutes ago [-]
Couldn't they just have background agents "figure it out"
collingreen 33 minutes ago [-]
If agents can just figure it out, isn't that AGI?
jakobnissen 2 hours ago [-]
Their outages are probably not due to their code though. It’s probably their infrastructure that can’t keep up. So seeing failures of infrastructure doesn’t really tell you anything about how good or bad Anthropic makes use of their models.
matthewdgreen 30 minutes ago [-]
The messed up scrolling behavior I keep getting in Claude Code is definitely due to their code.
aleqs 1 hours ago [-]
That seems like an assumption based on basically nothing. There is a lot of code at the infra layer, and based on the stack choices for Claude code and based on how buggy and unreliable ~everything from anthropic is, it seems pretty bizarre to claim these issues are not related to their code.
claudiug 5 minutes ago [-]
those are results of the humans only. not the AI. AI is perfect /s
mweidner 9 minutes ago [-]
I fail to see how pursuing recursive self-improvement at full speed is compatible with Anthropic's stated goal of AI Safety. If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?
I am not cynical enough to believe that Anthropic's warnings are pure marketing hype. Let's hope that it is instead overconfidence or the result of too much time talking to their own chatbot.
tokioyoyo 4 minutes ago [-]
Sorry for nitpicking, but:
> If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?
Arguably, yes.
parineum 5 minutes ago [-]
> I am not cynical enough to believe that Anthropic's warnings are pure marketing hype.
It doesn't really have to be dishonest, he could really believe it. I do believe, however, that it is incredibly wrong and is functioning as marketing hype.
chilipepperhott 30 minutes ago [-]
I find any and all claims like this ridiculous from a company who can't build a terminal application that uses less than a gigabyte of RAM.
3 minutes ago [-]
cpursley 29 minutes ago [-]
A came here just to write: Pretty please let it churn for a few nights and redo Claude Code in Rust. Because the harness is very very good as are their models, but that node thing is a hog for no good reason at all.
jcarver 4 minutes ago [-]
I had similar thoughts and ended up writing my own open-source harness in Rust: https://aether-agent.io/ . It's rough around the edges, but I use it as my daily driver and I'm pretty happy with it.
ale 23 minutes ago [-]
Incoming rust rewrite branch ready to merge: +1,009,257 -4,024
canadiantim 15 minutes ago [-]
People already rebuilt Claude Code in Rust after the Claude Code leak, it's on github as claw code (and other variants)
robbrown451 41 minutes ago [-]
Do code harnesses that build themselves count as recursive self improvement, or does it need to be the AI itself to qualify for the term?
I always was fascinated (obsessed?) by robots that build robots, or even things like this that can contribute a lot to making the next version of itself:
https://buildyourcnc.com/products/cnc-machine-blacktoe-v4-2x...
(cnc router that cuts plywood, and is made out of cnc-router cut plywood)
This is my own effort at an AI assisted coding environment optimized for building itself:
https://recursi.dev/
(just launching it, hope its ok to mention it, it is free/open source.... here is the HN link that has gotten no love yet: https://news.ycombinator.com/item?id=48401022 )
Personally I think harnesses are as important as the AI itself, and have this crazytheory that even if the models stopped improving today we could still have massive advances in the harnesses alone.
lanthissa 23 minutes ago [-]
yes? the future for any verifiable task is the model attempts to verify initial state and a goal then decomposes its tasks in to every smaller verifiable subtasks, with /memory being the persistence between runs and then /dreaming on the results of those memory files + run data to introduce new ideas.
i think thats the path to async agi these labs are imagining. The only limit is that sensor data you have on the world or your system, how long your willing to wait, and how much you're willing to spend to parallelize it.
maybe once you start building out these verified workflows you can feed that back into training and hte model starts to get a feel for the world to the point that it can intuit things since it has these sub paths built.
my personal agi test is can a model, trained on video of someone knocking on a door and then open it encounter a microwave for the first time and open it when the foods done without knocking.
jrflo 27 minutes ago [-]
I think harnesses would count, AI != LLMs. Any piece of code that helps the computer reason for itself is AI, the harnesses are AI in a sense.
cyanydeez 35 minutes ago [-]
If you want to get out ahead of what's coming, it'll be small models that bootstrap the harness rather than anything else.
robbrown451 27 minutes ago [-]
I used to think that, but ended up going the other direction, partly because I don't have the wherewithall to build a model but then I realized, with existing models that can take more than a tiny amount of context, you can just let any model bootstrap itself with a good prompt sent by the system.
There's a ton of other tricks to it, but mostly keeping the protocol simple for the AI so it can concentrate on coding logic and not stuff like managing BS boilerplate, dependencies, etc. (for instance I make extensive use of things like abstract syntax tree library to help with surgical edits from the LLM)
That said, I would be very open to collaborating with someone who builds such small models, I don't think the system strictly needs it, but it also could have some extra power if it had it.
andai 24 minutes ago [-]
> mine also makes extensive use of things like abstract syntax tree library to help with surgical edits from the LLM
Tell me more! This takes me way back. I did one like this in the GPT-4 days! (8k context window)
robbrown451 17 minutes ago [-]
Start off with my video!!! You can also try it with zero setup (you can code right there on the static web page, it will save your edits in the browser indexed DB, and hotpatch them back into the code before it runs it.... also you can grant permission to the browser to read/write to a local directory)
recursi.dev
Seriously, I'm looking for collaborators.
There's upwards of 80,000 lines of code in the editor system, a lot to it to make sure that even newbies don't get stuck.... so that's kind of proof the system works since it doesn't break down when the codebase grows large.
torben-friis 2 hours ago [-]
>A caveat: Lines of code is an imperfect measure, as it measures quantity over quality. So 8× lines of code/engineer/day in the second quarter of 2026 is almost certainly an overstatement of the true productivity gain. Nonetheless, it indicates an acceleration. At Anthropic, we don’t reward people for how many lines of code they write; rather, team members are producing more code simply because they’re using AI systems to write more code.
What about the hypothesis that AI is generating more verbose code? I just see the text pretending to acknowledge "LOC != Productivity" and then using it as a metric anyway.
malfist 20 minutes ago [-]
One of my co-workers just asked me to review his pull request that was all AI generated. 600 files were touched, over 40k lines of code added.
I'm sure he thought that was a crowning achievement, proof that AI can enable 10X developers, after all, what engineer could write 40k lines of code in a week?
I declined to review it, stating that I couldn't possibly vet 40k lines of code, and wouldn't put my reputation on the line to stamp the work as good. The PR nagged me for 2 weeks from my todo list and then disappeared. I don't know if he found another dev to get an approval from, or if the PR was abandoned. But I know for sure that him and I are on two totally separate islands around the value of LLMs.
Groxx 1 minutes ago [-]
[delayed]
fooqux 46 minutes ago [-]
Exactly. If AI is going to start being graded on how many LoC it generates- oh, I'm sorry, how much it "accelerates", than guess what newer models will start doing more of?
chuckadams 6 minutes ago [-]
AI generates code that mimics the existing code. If your code is terse and comment-free, then the agent’s code is too. The times I’ve seen Claude drift into a default “house style” it generated like 1 comment for every 10 LOC or so. It’s a far cry from the GPT-3 days that littered every line with the journals of Captain Obvious.
SimianSci 2 hours ago [-]
Anthropic is looking to IPO here soon.
A key aspect of this is to prove profitability.
Shifting their focus from Training new models to instead serving inference, they would greatly reduce their spend. In fact this is something being reported on that they are already doing, which is the reason for their first ever profitable quarter.
Its awfully convenient that the company which has greatly reduced its spend on training is now asking for a slow down in this area.
malfist 18 minutes ago [-]
I mean, if they've consumed all of human knowledge. What's left for them to train on? This pivot isn't only because it's cheaper and a way to juice the numbers for an IPO, it's survival because they can't improve more.
5 minutes ago [-]
applicative 2 minutes ago [-]
It did sound to me like they feel some sort of wall coming.
rhlf_monkey 13 minutes ago [-]
So in the latest L. Ron Hubbard encyclical Anthropic informs its flock that recursive self-improvement does not work yet but that their engineers burn more tokens.
The Claude code quality and operational security of Anthropic have already been analyzed by the public.
If you compare the output of (purportedly) trillion dollar corporations to Bell Labs or even Microsoft Research it is embarrassing. But the output is a fixture on any discussion board.
minimaxir 2 hours ago [-]
I have been doing more experiments with what I have now been calling agentic iterative optimization: telling the LLM to optimize code such that it speeds up all real-world-representative benchmarks by X% without cheating or causing regressions in both tests and performance metrics (e.g. MSE for statistical algorithms or file size in the case of something such as image compression). This is done using Rust where there are more low-level levers to tweak for performance than something like Python.
Opus 4.6/4.7 was consistently successful at getting 2-3x speed improvement with just one pass. It can also do the inverse: improve the performance metrics for better quality without causing a significant regression in speed. Then GPT-5.5 turned out to be much better at this workflow, often getting a multiplicative 1.5x-2x improvement above what Opus could do.
I now have quite a few GPT-5.5-optimized projects in various domains that are feature complete and are substantially more performant than existing SOTA implementations that I plan to open source as soon as possible: the bottleneck is polish as usual.
Upvoter33 2 hours ago [-]
I'm having a hard time putting much faith into posts like these, especially as they near IPO.
reasonableklout 2 hours ago [-]
Putting faith into the claim that recursive self-improvement is close to happening, or that they will coordinate with other companies / the government when the time comes?
nickandbro 2 hours ago [-]
So what happens when the world becomes hyper optimized with closed loop AI agents recursively trying to optimize everything deemed sub optimal?
mofeien 2 hours ago [-]
I would assume that shortly after, the solar system will be hyper optimized as well, then the milky way, then the local cluster, and so on. Everything will be close to optimal afterwords, and I sure hope we will have specified the target function for that optimization correctly in the single attempt that we will have had.
Readerium 17 minutes ago [-]
Loll
peheje 2 hours ago [-]
there will be a lot of paper clips
simianwords 48 minutes ago [-]
Often repeated meme doesn’t have any bearing to reality.
The orthogonality thesis sounds like a fun gotcha but if you give it some thought you realise how strange it sounds and the opposite thesis - collinearity thesis is actually correct.
1. Intelligence transfers and compounds
2. Goals of agents are not arbitrary
3. Our goals and agent goals are more likely to be aligned at the deeper level
2 hours ago [-]
morisil 3 hours ago [-]
Quite aligned with my own experience from harness engineering and winning AI4Science hackathon. During the hackathon I was working as a human optimizer, moving the feedback from test harness running on Claude Code, back to my local Claude Code for analysis-hypothesis-proposal cycle. And in this moment I realized that 2 Claudes talking to each other could actually scale much better.
anilgulecha 3 hours ago [-]
> We believe it would be good for the world to have the option to slow or temporarily pause frontier AI development to enable societal structures and alignment research to keep up with the advance of the technology. The Anthropic Institute will conduct research—in collaboration with many others—and take actions to help build the systems that a credible slowdown or pause would require.
Interesting - they're commiting to kickoff policy conventions to organize a world-slowdown of frontier LLM building. If they actually are able to crack it, this will give a much needed breather IMO. As exciting as the last ~6 months have been, there's some bigger questions to go answer now.
fasterik 2 hours ago [-]
We should be skeptical of any major player that advocates for regulating their own industry. In practice, this just means increasing barriers to entry and making it harder to compete with them.
In my mind we should be trying to push AI along the Linux trajectory. You have a free and open source product, developed by a decentralized team with a strong code of ethics, running on commodity hardware. There can still be trillion dollar industries built on top of it, but the core technology is democratized and available to everybody. I don't see how we get there if we allow a handful of companies to dictate where development of the technology goes.
mofeien 2 hours ago [-]
The regulation that is being argued for here is against pushing the frontier. Entering the market with say a new speech to text model is not subject to such regulation. What's needed is something qualitatively different from entry barriers, and of the frontier model companies at least Anthropic and deepmind seem to have enough self-awareness to speak about it. They are finding themselves in a race with possibly catastrophic outcome for humanity and would like to stop, but it needs internation cooperation on a level that no single company can provide.
8note 23 minutes ago [-]
its a cartel looking to end competition though
the actual race is to keep having revenue, since everyone is still willing to pay more for the best model.
we as consumers of LLM models lose out by the arms race ending by the creation of a cartel
what happens if they get this regulatory capture is that all the frontier labs put effort into making inference cheaper, and become extraordinarily profitable, at the expense of us consumers, who really want better models, at a subsidized price
techblueberry 2 hours ago [-]
Wouldn’t this align with their financial interests? In theory the thing that’s keeping them from being profitable (or one of the big things) is the periodic capex expenditures of building new frontier models.
fasterik 2 hours ago [-]
I don't think there's anything inherently bad about Anthropic making a profit. Red Hat makes a profit off of Linux. I'm interested in the democratization of the underlying technology.
Upvoter33 2 hours ago [-]
I read this differently: they are actually seeing that it's hard to keep advancing frontier models, and now are moving the goal posts so that when they start getting evaluated more harshly, they can point to something like this.
smokedetector1 2 hours ago [-]
Theyre probably looking to get a way to slow down the capex required to keep up, so they can be more profitable
bottlepalm 29 minutes ago [-]
I'd use number of commits as a metric versus lines of code. A commit is generally a unit of work - regardless of the lines of code added/removed. It'd be interesting to see the metrics in terms of commits. I'm sure it's still an order of magnitude jump. Personally I'm flying with my own projects with AI, lots of commits, but I really try to minimize lines of code added. If I can remove and simplify existing code so the balance of lines added on commit are minimal - that's the path to a better quality app overall.
delichon 3 hours ago [-]
Is this the moment when the AI gets permission to approve its own PRs:
> today, Anthropic engineers on average ship 8x as much code per quarter as they did from 2021-2025.
So based on my experience with the verbosity and non-DRYness of LLM code, a solid 2.5x in value delivered. Not bad!
techblueberry 2 hours ago [-]
> A caveat: Lines of code is an imperfect measure, as it measures quantity over quality. So 8× lines of code/engineer/day in the second quarter of 2026 is almost certainly an overstatement of the true productivity gain. Nonetheless, it indicates an acceleration. At Anthropic, we don’t reward people for how many lines of code they write; rather, team members are producing more code simply because they’re using AI systems to write more code.
I simultaneously think the AI revolution is making real revolutionary gains and am mystified by the lying.
An accurate Translation seems to be “we made this shit up, but it feels right”
embedding-shape 2 hours ago [-]
Until the moment we start bragging about how many lines of code LLMs are saving us, we're walking in the wrong direction. Your programs, designs and architectures is supposed to get better, not add even more boilerplate just because you can produce it faster...
HarHarVeryFunny 2 hours ago [-]
"You go to IPO with the AI you have, not the AI you might wish you have."
-- Donald Rumsfeld
So, right now it's a verbose code generator.
But post-IPO it will be wonderful - sentient, self-improving (recursively, iteratively, asymptotically), full of loving grace.
geodel 1 hours ago [-]
> But post-IPO it will be wonderful - sentient, self-improving (recursively, iteratively, asymptotically), full of loving grace.
We hold these truths to be self-evident.
jazzyjackson 2 hours ago [-]
I guess the claim is simply that AI written code is verbose and there’s lots of it being created but I agree, these systems seem to be able to create lots of low quality software, so until FreeCAD has feature parity with Solidworks I’m bearish on the singularity.
darepublic 2 hours ago [-]
the tooling has quite a ways to go to catch up to the llm engines that drive the real value. I have encountered various codex bugs (I know not anthropic) which tell me that.. these billion dollar companies, if they are eating their own dog food, can still release buggy crap software.
artninja1988 3 hours ago [-]
The mythos public release will be a big indicator if the Anthropic and SF story of transformational ai soon holds any water imo
sonink 2 hours ago [-]
Broadly agree to this position - I think there are some people skeptical that Anthropic is doing this for regulatory capture - but I think there are being honest about they are seeing and how regulation should catch up.
I for one, believe that we should pause all work on AI for the forseeable future. This is almost impossible to orchestrate - but we should still try nevertheless. Maybe we are not able to pause, but we are able to slow down. That might give us more room, to maybe able to pause in the future. But going ahead is too dangerous.
And its not just Anthropic which is saying this. Even Geoffry Hinton has said the same thing. If there is a non-zero chance that AI can kill all of humanity, and both Geoffry and Anthropic have the same position, then it makes sense for us to be hundred percent sure before we move ahead. Dario/Anthropic have already made their money from AI, maybe they are just being honest about what they think lies ahead.
8note 18 minutes ago [-]
no, it really doesnt.
the end of humanity has a strong case for banning all burning of fossil fuels immediately
the end of humanity as a sales tactic to increase your stock price does not
these are companies working on their IPO to make sure they can get the best price, not people being honest about what they think lies ahead.
if they were being honest about what lies ahead, they'd unilaterally stop training, and put all of their money into FPV drone bombs to destroy datacenters being used for training or inference
if you actually believe the thing is gonna kill everyone, you're not gonna worry about how you stop it, and certainly not keep building and operating the thing
that they arent buying anti-tank mines to drop on data centers says they arent in the slightest serious about it
4ffs 20 minutes ago [-]
"Even Geoffry Hinton has said the same thing"
The same bozo who claimed radiologists would be out of a job by now.
The data does not support what you nor others say. Jesus christ. Cant believe people are this dumb. Has LLMs infested the minds of people to the extent they can't critically analyse whats happening infront of their eyes?
geodel 1 hours ago [-]
It will be so powerful that it can't be trusted with any earthly person.
damowangcy 4 hours ago [-]
AI tech bro:
Month 1 - 6 months to AGI
Month 2 - We will Replace all jobs
Month 3 - Okay maybe only the SWEs, programming is solved
Month 4 - Announce model that is too dangerous to release
Month 5 - Releases dangerous model
Month 6 - This is it! We will replace AIs with more AIs (*secretly files for IPO)
AI is here to stay, like it or not but it is not the solution to everything. If it is, what is Anthropic's moat? A better model? I don't see any ecosystem being built by them, as MCP is almost obsolete except for some very niche use case. And they're doing stuff that a non-profit version of OpenAI would do. Can we trust a for-profit company to stand against their investors during a conflict of interest? Because running a company for maximum profit versus being ethical is two different end of the spectrum.
baq 2 hours ago [-]
Anthropic is providing agentic intelligence as a service. OpenAI and Google deepmind also are in this business.
The problem is, if you’re any sort of knowledge worker, you’re essentially providing the same thing: you’re an intelligence with agency.
MCP is irrelevant. The moat is the quality of intelligence the service providers sell, including you. Tokens aren’t fungible between providers until you measure that they are for your use case, that’s kinda sorta the goal of job interviews.
Thus the moat will be that they’re providing the best models for the things people need other intelligent people for, but we should expect there will be limits on how much share they can economically take assuming competitors are optimizing for slightly different targets (but there’s still significant overlap in capability). This will disappear, but it’s always a question of when. The path matters as much as the destination.
Note that implications for you and me are exactly what the article says they are: nobody knows, but it’ll be a dramatic shift.
parpfish 2 hours ago [-]
i'm waiting for the AI giants to realize that they are burning cash to run their consumer-facing chatbots and that they should kill those products to focus on their enterprise tools.
free chatgpt doesn't need to exist anymore. its job was to build hype/interest and it did.
but take it away and you solve many social problems and annoyances caused by AI with no loss to the upside of AI. no more cheating students in school. no more shitty linkedin posts. no more dangerous "therapy sessions" that give bad advice.
4ffs 23 minutes ago [-]
Theyre making a mistake with this continued self-hyping. At some point even the dumbest of prospective investors don't buy it.
amelius 2 hours ago [-]
Does this train on LLM output, or is this more like iterative self prompt improvement?
HarHarVeryFunny 2 hours ago [-]
Their statement is that they regard lines of code shipped as indicative of self-improvement. So, while a well written coding agent might be a few thousand LOC, Athropic's is bloated like a decomposing whale and over 500K LOC ! What more proof do you need?
Legend2440 2 hours ago [-]
Have you tried reading the article? It answers your question.
Don't ask people to explain the article to you if you're too lazy to open it yourself.
_se 2 hours ago [-]
I think that's the whole point of LLMs
Aperocky 2 hours ago [-]
Anthropic is the most self hyped company I've seen, to the point that I'm wondering what would happen to its employees if they held a different opinion. Do they just.. keep it to themselves? For instance, if some Anthropic employees had a completely rational opinion that all of this isn't going to lead to AGI, but I just don't hear that ever from them.
The metric being tracked, code commits, is hilariously one sided. Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts, for instance:
Instead of thinking about edge cases with brain and whiteboard, you can have the LLMs to simply generate most possibility including tests for it, because that is cheaper. There's probably 50x more commits of which 40 will be revert pairs but we are only twice as fast. And in reality nothing did change because the outcome remain the same. I can't see how it is necessarily different in the LLM space.
apsurd 2 hours ago [-]
> Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts
I've been struggling to capture this sentiment for myself in a way that hits. If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code. It just makes no sense. I can't seem to get off this hill. Company-wide AI mandates and 100 fleet Agent orchestration Rube Goldberg machines... it's getting wild out there.
Meanwhile my Claude Pro ($200/year) does force me to smooth out my usage and plan more (Sonnet/Opus advisor split). But other than that, I can't imagine what I'd be doing with 20x (200x?) the compute to code sling. I think I'd lose my mind.
torben-friis 1 hours ago [-]
>If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code.
I wonder how much of current engineering practices can be traced to what's pushed to company leaders on LinkedIn.
Every company is shitting bricks pushing for faster development and speed, gotta go fast to nowhere in particular, and I'm convinced it's tied to constant bombardment of the idea that they're doing to be left out or obsolete if they don't get in the ship NOW.
Aperocky 2 hours ago [-]
Because code used to be correlated with progress, it became almost a measurement in lieu. But realistically, the code is meaningless if it doesn't accomplish something, and that should remain the true bar of progress.
For instance, if I churned out 20x more code, threw away 19x code with rewrites and reverts and discards and accomplished the same project to the same standard 70% faster, would I do it? Yes. The part that matter is not 20x code, it is 70% faster.
Code is both the final product, and a tool to achieve that. We used to have a much harder time to realize the "tool" part, but now we are here. This also means any measurement centered on code being the final product is going to cease being effective or realistic.
apsurd 2 hours ago [-]
You're right, my gripe is specifically with code slinging that hits production end users. My background is in product so to your point, it's very unnerving to see a straight line being enthusiastically optimized for developers -> customer facing product outcomes.
This is contentious because I'm not exactly advocating for arbitrary gate-keepers. The nuance is that building usable stuff is hard. And not a matter of shipping more code. I take your point to mean well it depends on what that code is doing. If 20x more code is in a meta-harness of simulation and such to arrive at the leading candidate for what hits production, well then you've got my attention there.
josefritzishere 47 minutes ago [-]
I can't get away from the a similar conclusion. Even AI Pioneer has said that LLMs are at a dead end.
butler14 2 hours ago [-]
Warming up for that IPO
stri8ted 14 minutes ago [-]
Is there something in the post that you find implausible or don't believe to be true?
holoduke 2 hours ago [-]
I have a claw that is instructed to make at least 500 pr per day. It uses Claude, Gemeni and openai and runs basically every few minutes. I use online forums for input for the claw. Moltbook, reddit etc. it's quite funny how it tries to improve itself. But to say it really creates a new skynet. Nah. Not at all. It's more a clutter of useless features or incomprehensible code restructuring.
moregrist 2 hours ago [-]
This more or less agrees with my assessment of recent changes in Claude Code where a lot of new features are either:
- A lot of half-baked features or half-done features.
- Or have significant overlap with existing features, and aren’t clearly an improvement.
More code is not better. More features are not better. It would be lovely to see more intentional design than just more.
I know they’re dog fooding this. I have to believe they have some people with taste. So it makes me wonder if anyone has the time to think or if they’re just shoveling prompts as fast as possible.
holoduke 29 minutes ago [-]
It's like the AI created a method add(a b) return a+a+a+a-b-b-b-b
But then much bigger and complex features. Totally useless nothing methods. But still interesting to see occasional exceptions that are better.
bitwize 13 minutes ago [-]
After several months with their top engineers and state-of-the-art AI on the job, Anthropic managed to "reduce flickering by 85%" on their TUI Claude Code client, which is built in fucking React and rendered by drawing the entire chat conversation each time (hence the flicker). I think they've since eliminated it completely by slapping some double-buffering around it (since "our client is actually a real-time game engine" after all). Meanwhile for decades Emacs and Vim have had an optimizer built into their display cores that solves for the minimum set of terminal escape commands it takes to transform the screen from a given old state to a desired new state.
You will forgive me when, between muted snickers, I express considerable doubt that Anthropic will be able to bring its AI to a point of "self-improving" any time soon.
georgehotz 2 hours ago [-]
The world has been recursively self improving for millenia. Similar to scientology, this is a cult pushing sci-fi nonsense. They are just coupled to an LLM lab to give their stories an aire of seriousness. Imagine scientology starting making laptops.
4ffs 13 minutes ago [-]
TBH the more Anthropic keeps yapping the more desperate they seem now. OAI has been pretty quiet in comparison lately.
vblanco 3 hours ago [-]
Another article about how anthropic wants to ban everyone except themselves and destroy opensource and chinese AIs.
reasonableklout 2 hours ago [-]
Where is this discussed in the article? I don't see any mentions of China or open source models
artninja1988 2 hours ago [-]
Not really mentioned explicitly but:
> A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.
And later:
> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation. We’ll publish what comes out of it. The window to investigate the questions together is here, and people outside AI companies should be involved in this deliberation.
reasonableklout 2 hours ago [-]
Coordinating a pause at the frontier is not the same as destroying or even harming open source/China.
It feels like both open source can flourish while the frontier is deliberately regulated?
vblanco 52 minutes ago [-]
they explicitly mention in the article that just frontier stopping isnt enough because then that just means others will catch up, they want to be the leaders of a global organization/cartel that bans everyone except themselves. Particularly important given anthropic attacks china and opensource every chance they get. https://www.anthropic.com/news/detecting-and-preventing-dist...
artninja1988 48 minutes ago [-]
Yeah. This is why Anthropic is way worse than openai. They don't contribute shit to open source and even lobby against it.
esafak 2 hours ago [-]
> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation.
If they wanted to they could have convened an international forum with commercial and political stakeholders years ago. Less talk, more do.
andromaton 1 hours ago [-]
[dead]
ath3nd 1 hours ago [-]
[dead]
simianwords 3 hours ago [-]
Sorry but if AI can build itself then it can run companies of size 3000 companies with a few people. Or even higher. What are the consequences?
delichon 3 hours ago [-]
When AI is a more effective capital allocator than NI it will drive capital into the accounts of whoever controls the AI, gaining them increasing decision making power over the economy and culture. Maybe those controllers will be human at first.
cdrnsf 2 hours ago [-]
They will not be.
lstodd 2 hours ago [-]
As has been mentioned in the sibling comment it already is.
Consequences are: financial crisis.
llmslave 2 hours ago [-]
I cannot wait for these models to tear down traditional social hierarchies. We havent even begun to see the effects, fingers crossed
baq 2 hours ago [-]
Hierarchies exist for a reason, take away the reason and the house of cards eventually collapses — but the house of cards is still a house. When it’s gone, we’re back to laws of the jungle.
Be careful what you wish for IOW.
llmslave 2 hours ago [-]
I think certain types of people with power, i.e. access to capital, will lose relevance. world will become more meritcratic with ai as leverage to the individual
hvb2 2 hours ago [-]
Your analysis of the whole rise of AI is that people with access to capital will lose relevance???
So the most capital intensive industry we've ever created will put less power in the hands of those with capital?
I'm sorry, I have no idea how you came to that conclusion...
baq 2 hours ago [-]
It’s exactly the opposite I’m afraid. Capital already has more access to AI, both quantitatively (tokens for dollars) and qualitatively (biggest players got Mythos first). Expect this trend to continue.
SimianSci 2 hours ago [-]
Never heard of a stratified economy?
Spoiler alert: none of us will be in the good part.
techblueberry 2 hours ago [-]
Tear down or reinforce?
llmslave 2 hours ago [-]
capital/ability to leverage labor is going to lose power
wstrange 2 hours ago [-]
I'm not so sure. It seems those with capital will accumulate it even faster.
Without some kind of income redistribution we are sailing into dark waters.
techblueberry 2 hours ago [-]
Let the ruling classes tremble at a Communistic revolution. The proletarians have nothing to lose but their chains. They have a world to win.
Workingmen of all countries unite!
Translation: hahahahahahahahahhahahaha but in your defense, I would give anything to be wrong.
reducesuffering 2 hours ago [-]
Anthropic has finally come around to what others have already realized far sooner. Little time left now. Notice how shallow the arguments and consistently wrong the AGI naysayers have been year after year.
> If it were possible to effectively slow the development of this technology to give ourselves more time to deal with its immense implications, we think that would likely be a good thing
Even Anthropic wants to Pause AI now. There must really be not much time left for "edging". Please write to your lawmakers, no matter whether you are in the US, Europe, China, or elsewhere. Only an international agreement between governments can enforce an AI-Pause and eliminate the necessity to dangerously push the frontier.
Whichever side I may stand on, pausing just seems unnatural? Life is movement.
honeycrispy 2 hours ago [-]
And happiness is restraint.
honeycrispy 2 hours ago [-]
That would be like trying to get every country to agree to give up nukes.
mofeien 2 hours ago [-]
Or agree on finding ways to promote peaceful use of nuclear energy. This has been done, there are thousands of people working on it around the globe and 180+ member states of the IAEA. It's not easy, there have been close calls.
And cooperating interntionally to buy ourselves time to find ways to develop this "last invention" is a way that will do good for humanity seems to be on a similar level.
ChrisLTD 2 hours ago [-]
Or stop making more, and testing more, which we got the biggest countries to do, at least for a time.
One of my focuses now is my own model-agnostic, harness and workflow orchestration (I know everyone is building these) , baselining on opus, and aiming to transition to Chinese models like deepseek in the short term and hopefully open, self hosted models in the future (which I plan to open source).
The nonstop marketing fluff from anthropic while their service quality and availability noticeably degrades... just continues to destroy my trust in the company.
They aren't saying they have fully automated luxury AGI, they specifically list the ways models fall short of that bar and caution against people taking the 8x figure as the actual uplift number. At the same time they recognize that 80% of new code is now AI-authored, when two years ago those models were little more than toys. And frankly that checks out: if two years ago you told me we'd have something like Opus 4.8/GPT 5.5 I would have rolled to disbelieve.
https://fxtwitter.com/trq212/status/2014051501786931427
> Most people's mental model of Claude Code is that "it's just a TUI" but it should really be closer to "a small game engine".
Also remember when XP was super bloated cause it needed 64MB?
I am not cynical enough to believe that Anthropic's warnings are pure marketing hype. Let's hope that it is instead overconfidence or the result of too much time talking to their own chatbot.
> If nukes were not invented yet, would it really be a good idea to build and sell them as fast as possible (in peace time, no less)?
Arguably, yes.
It doesn't really have to be dishonest, he could really believe it. I do believe, however, that it is incredibly wrong and is functioning as marketing hype.
I always was fascinated (obsessed?) by robots that build robots, or even things like this that can contribute a lot to making the next version of itself: https://buildyourcnc.com/products/cnc-machine-blacktoe-v4-2x... (cnc router that cuts plywood, and is made out of cnc-router cut plywood)
This is my own effort at an AI assisted coding environment optimized for building itself: https://recursi.dev/ (just launching it, hope its ok to mention it, it is free/open source.... here is the HN link that has gotten no love yet: https://news.ycombinator.com/item?id=48401022 )
Personally I think harnesses are as important as the AI itself, and have this crazytheory that even if the models stopped improving today we could still have massive advances in the harnesses alone.
i think thats the path to async agi these labs are imagining. The only limit is that sensor data you have on the world or your system, how long your willing to wait, and how much you're willing to spend to parallelize it.
maybe once you start building out these verified workflows you can feed that back into training and hte model starts to get a feel for the world to the point that it can intuit things since it has these sub paths built.
my personal agi test is can a model, trained on video of someone knocking on a door and then open it encounter a microwave for the first time and open it when the foods done without knocking.
There's a ton of other tricks to it, but mostly keeping the protocol simple for the AI so it can concentrate on coding logic and not stuff like managing BS boilerplate, dependencies, etc. (for instance I make extensive use of things like abstract syntax tree library to help with surgical edits from the LLM)
That said, I would be very open to collaborating with someone who builds such small models, I don't think the system strictly needs it, but it also could have some extra power if it had it.
Tell me more! This takes me way back. I did one like this in the GPT-4 days! (8k context window)
recursi.dev
Seriously, I'm looking for collaborators.
There's upwards of 80,000 lines of code in the editor system, a lot to it to make sure that even newbies don't get stuck.... so that's kind of proof the system works since it doesn't break down when the codebase grows large.
What about the hypothesis that AI is generating more verbose code? I just see the text pretending to acknowledge "LOC != Productivity" and then using it as a metric anyway.
I'm sure he thought that was a crowning achievement, proof that AI can enable 10X developers, after all, what engineer could write 40k lines of code in a week?
I declined to review it, stating that I couldn't possibly vet 40k lines of code, and wouldn't put my reputation on the line to stamp the work as good. The PR nagged me for 2 weeks from my todo list and then disappeared. I don't know if he found another dev to get an approval from, or if the PR was abandoned. But I know for sure that him and I are on two totally separate islands around the value of LLMs.
Shifting their focus from Training new models to instead serving inference, they would greatly reduce their spend. In fact this is something being reported on that they are already doing, which is the reason for their first ever profitable quarter.
Its awfully convenient that the company which has greatly reduced its spend on training is now asking for a slow down in this area.
The Claude code quality and operational security of Anthropic have already been analyzed by the public.
If you compare the output of (purportedly) trillion dollar corporations to Bell Labs or even Microsoft Research it is embarrassing. But the output is a fixture on any discussion board.
Opus 4.6/4.7 was consistently successful at getting 2-3x speed improvement with just one pass. It can also do the inverse: improve the performance metrics for better quality without causing a significant regression in speed. Then GPT-5.5 turned out to be much better at this workflow, often getting a multiplicative 1.5x-2x improvement above what Opus could do.
I now have quite a few GPT-5.5-optimized projects in various domains that are feature complete and are substantially more performant than existing SOTA implementations that I plan to open source as soon as possible: the bottleneck is polish as usual.
The orthogonality thesis sounds like a fun gotcha but if you give it some thought you realise how strange it sounds and the opposite thesis - collinearity thesis is actually correct.
1. Intelligence transfers and compounds
2. Goals of agents are not arbitrary
3. Our goals and agent goals are more likely to be aligned at the deeper level
Interesting - they're commiting to kickoff policy conventions to organize a world-slowdown of frontier LLM building. If they actually are able to crack it, this will give a much needed breather IMO. As exciting as the last ~6 months have been, there's some bigger questions to go answer now.
In my mind we should be trying to push AI along the Linux trajectory. You have a free and open source product, developed by a decentralized team with a strong code of ethics, running on commodity hardware. There can still be trillion dollar industries built on top of it, but the core technology is democratized and available to everybody. I don't see how we get there if we allow a handful of companies to dictate where development of the technology goes.
the actual race is to keep having revenue, since everyone is still willing to pay more for the best model.
we as consumers of LLM models lose out by the arms race ending by the creation of a cartel
what happens if they get this regulatory capture is that all the frontier labs put effort into making inference cheaper, and become extraordinarily profitable, at the expense of us consumers, who really want better models, at a subsidized price
https://www.italianrenaissance.org/wp-content/uploads/2012/0...
Or is this?
https://www.egypttoursportal.com/images/2024/02/Ouroboros-Sy...
https://knowyourmeme.com/memes/obama-awards-obama-a-medal
So based on my experience with the verbosity and non-DRYness of LLM code, a solid 2.5x in value delivered. Not bad!
I simultaneously think the AI revolution is making real revolutionary gains and am mystified by the lying.
An accurate Translation seems to be “we made this shit up, but it feels right”
So, right now it's a verbose code generator.
But post-IPO it will be wonderful - sentient, self-improving (recursively, iteratively, asymptotically), full of loving grace.
We hold these truths to be self-evident.
I for one, believe that we should pause all work on AI for the forseeable future. This is almost impossible to orchestrate - but we should still try nevertheless. Maybe we are not able to pause, but we are able to slow down. That might give us more room, to maybe able to pause in the future. But going ahead is too dangerous.
And its not just Anthropic which is saying this. Even Geoffry Hinton has said the same thing. If there is a non-zero chance that AI can kill all of humanity, and both Geoffry and Anthropic have the same position, then it makes sense for us to be hundred percent sure before we move ahead. Dario/Anthropic have already made their money from AI, maybe they are just being honest about what they think lies ahead.
the end of humanity has a strong case for banning all burning of fossil fuels immediately
the end of humanity as a sales tactic to increase your stock price does not
these are companies working on their IPO to make sure they can get the best price, not people being honest about what they think lies ahead.
if they were being honest about what lies ahead, they'd unilaterally stop training, and put all of their money into FPV drone bombs to destroy datacenters being used for training or inference
if you actually believe the thing is gonna kill everyone, you're not gonna worry about how you stop it, and certainly not keep building and operating the thing
that they arent buying anti-tank mines to drop on data centers says they arent in the slightest serious about it
The same bozo who claimed radiologists would be out of a job by now.
The data does not support what you nor others say. Jesus christ. Cant believe people are this dumb. Has LLMs infested the minds of people to the extent they can't critically analyse whats happening infront of their eyes?
Month 1 - 6 months to AGI
Month 2 - We will Replace all jobs
Month 3 - Okay maybe only the SWEs, programming is solved
Month 4 - Announce model that is too dangerous to release
Month 5 - Releases dangerous model
Month 6 - This is it! We will replace AIs with more AIs (*secretly files for IPO)
AI is here to stay, like it or not but it is not the solution to everything. If it is, what is Anthropic's moat? A better model? I don't see any ecosystem being built by them, as MCP is almost obsolete except for some very niche use case. And they're doing stuff that a non-profit version of OpenAI would do. Can we trust a for-profit company to stand against their investors during a conflict of interest? Because running a company for maximum profit versus being ethical is two different end of the spectrum.
The problem is, if you’re any sort of knowledge worker, you’re essentially providing the same thing: you’re an intelligence with agency.
MCP is irrelevant. The moat is the quality of intelligence the service providers sell, including you. Tokens aren’t fungible between providers until you measure that they are for your use case, that’s kinda sorta the goal of job interviews.
Thus the moat will be that they’re providing the best models for the things people need other intelligent people for, but we should expect there will be limits on how much share they can economically take assuming competitors are optimizing for slightly different targets (but there’s still significant overlap in capability). This will disappear, but it’s always a question of when. The path matters as much as the destination.
Note that implications for you and me are exactly what the article says they are: nobody knows, but it’ll be a dramatic shift.
free chatgpt doesn't need to exist anymore. its job was to build hype/interest and it did.
but take it away and you solve many social problems and annoyances caused by AI with no loss to the upside of AI. no more cheating students in school. no more shitty linkedin posts. no more dangerous "therapy sessions" that give bad advice.
Don't ask people to explain the article to you if you're too lazy to open it yourself.
The metric being tracked, code commits, is hilariously one sided. Philosophically, if you had one part of your work now practically free, you'd like to utilize that freedom to maximally cover for the other parts, for instance:
Instead of thinking about edge cases with brain and whiteboard, you can have the LLMs to simply generate most possibility including tests for it, because that is cheaper. There's probably 50x more commits of which 40 will be revert pairs but we are only twice as fast. And in reality nothing did change because the outcome remain the same. I can't see how it is necessarily different in the LLM space.
I've been struggling to capture this sentiment for myself in a way that hits. If shipping code is a commodity then why is everyone's immediate priority seemingly to ship 10x more code. It just makes no sense. I can't seem to get off this hill. Company-wide AI mandates and 100 fleet Agent orchestration Rube Goldberg machines... it's getting wild out there.
Meanwhile my Claude Pro ($200/year) does force me to smooth out my usage and plan more (Sonnet/Opus advisor split). But other than that, I can't imagine what I'd be doing with 20x (200x?) the compute to code sling. I think I'd lose my mind.
I wonder how much of current engineering practices can be traced to what's pushed to company leaders on LinkedIn.
Every company is shitting bricks pushing for faster development and speed, gotta go fast to nowhere in particular, and I'm convinced it's tied to constant bombardment of the idea that they're doing to be left out or obsolete if they don't get in the ship NOW.
For instance, if I churned out 20x more code, threw away 19x code with rewrites and reverts and discards and accomplished the same project to the same standard 70% faster, would I do it? Yes. The part that matter is not 20x code, it is 70% faster.
Code is both the final product, and a tool to achieve that. We used to have a much harder time to realize the "tool" part, but now we are here. This also means any measurement centered on code being the final product is going to cease being effective or realistic.
This is contentious because I'm not exactly advocating for arbitrary gate-keepers. The nuance is that building usable stuff is hard. And not a matter of shipping more code. I take your point to mean well it depends on what that code is doing. If 20x more code is in a meta-harness of simulation and such to arrive at the leading candidate for what hits production, well then you've got my attention there.
- A lot of half-baked features or half-done features. - Or have significant overlap with existing features, and aren’t clearly an improvement.
More code is not better. More features are not better. It would be lovely to see more intentional design than just more.
I know they’re dog fooding this. I have to believe they have some people with taste. So it makes me wonder if anyone has the time to think or if they’re just shoveling prompts as fast as possible.
You will forgive me when, between muted snickers, I express considerable doubt that Anthropic will be able to bring its AI to a point of "self-improving" any time soon.
> A meaningful slowdown or pause would require multiple well-resourced labs at or near the frontier, in multiple countries, agreeing to stop under the same conditions. It would also require that each can verify that the others have actually stopped. Due to the unique characteristics of AI systems, the detectability (a lower standard than verifiability) element of this arms control problem is much more challenging than with other technologies. Training runs are far easier to conceal than missile silos, their inputs are general-purpose, and the incentive to defect quietly is enormous, because whoever continues while others pause could inherit the lead. A credible pause also has to specify what triggers it, what lifts it, and who adjudicates.
And later:
> In the coming months, we will organize conversations where policymakers, researchers, civil society, and other AI companies can help answer some of the questions this piece raises, especially around full recursive self-improvement and how to create better options for coordination and deliberation. We’ll publish what comes out of it. The window to investigate the questions together is here, and people outside AI companies should be involved in this deliberation.
It feels like both open source can flourish while the frontier is deliberately regulated?
If they wanted to they could have convened an international forum with commercial and political stakeholders years ago. Less talk, more do.
Consequences are: financial crisis.
Be careful what you wish for IOW.
So the most capital intensive industry we've ever created will put less power in the hands of those with capital?
I'm sorry, I have no idea how you came to that conclusion...
Without some kind of income redistribution we are sailing into dark waters.
Workingmen of all countries unite!
Translation: hahahahahahahahahhahahaha but in your defense, I would give anything to be wrong.
https://intelligence.org/agi-ruin/
Even Anthropic wants to Pause AI now. There must really be not much time left for "edging". Please write to your lawmakers, no matter whether you are in the US, Europe, China, or elsewhere. Only an international agreement between governments can enforce an AI-Pause and eliminate the necessity to dangerously push the frontier.
https://pauseai.info/
And cooperating interntionally to buy ourselves time to find ways to develop this "last invention" is a way that will do good for humanity seems to be on a similar level.