Vibecoding: a beginner's guide

Foxhound5366 · Apr 24, 2026

tetrasect said:
New paper by Microsoft Research:

Our large-scale experiment with 19 LLMs reveals that current models degrade documents during delegation: even frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT 5.4) corrupt an average of 25% of document content by the end of long workflows, with other models failing more severely. Additional experiments reveal that agentic tool use does not improve performance on DELEGATE-52, and that degradation severity is exacerbated by document size, length of interaction, or presence of distractor files. Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents, compounding over long interaction.

LLMs Corrupt Your Documents When You Delegate

arxiv.org

Some of this is just going to be the big players playing the game and defending their own ecosystems. This comes FROM Microsoft after all, which is really bitter no doubt that nobody is rating their efforts with Copilot.

Hemps · Apr 24, 2026

Since using this the quality of my projects has gone up by strides - https://www.promptcowboy.ai/

promptcowboy to create my prompt
prompt created I paste into claude
once i have used free claude credits
drag and drop project into Deepseek and continue the rest of the day

Herr der Verboten · Apr 24, 2026

tetrasect said:
New paper by Microsoft Research:

Our large-scale experiment with 19 LLMs reveals that current models degrade documents during delegation: even frontier models (Gemini 3.1 Pro, Claude 4.6 Opus, GPT 5.4) corrupt an average of 25% of document content by the end of long workflows, with other models failing more severely. Additional experiments reveal that agentic tool use does not improve performance on DELEGATE-52, and that degradation severity is exacerbated by document size, length of interaction, or presence of distractor files. Our analysis shows that current LLMs are unreliable delegates: they introduce sparse but severe errors that silently corrupt documents, compounding over long interaction.

LLMs Corrupt Your Documents When You Delegate

arxiv.org

I'm sure you were not supposed to have dozens of open chats and dozens of days long running chats?

Also worth knowing for the uninitiated... be clear as **** on what you actually want / going to do otherwise the LLM will assume. Hell it does not have your context, nor does it live in your mind and code.

to add: "hey claude be jesus take the wheel" - you are going to get ****ed and it doesn't matter if you are on opus or not.

DrJohnZoidberg · Apr 24, 2026

These tools getting really good. Currently using Claude Opus 4.7, it's much better at not generating garbage.

Herr der Verboten · Apr 24, 2026

DrJohnZoidberg said:
These tools getting really good. Currently using Claude Opus 4.7, it's much better at not generating garbage.

Depends but mostly yes, it is an improvement but can just as well turn on you and go seriously south.

DrJohnZoidberg · Apr 24, 2026

Herr der Verboten said:
Depends but mostly yes, it is an improvement but can just as well turn on you and go seriously south.

It turned on me the other day, but it just ended up being an April fools thing

tetrasect · Apr 24, 2026

Foxhound5366 said:
Some of this is just going to be the big players playing the game and defending their own ecosystems. This comes FROM Microsoft after all, which is really bitter no doubt that nobody is rating their efforts with Copilot.

Copilot is not a model, it's an agent that uses ChatGPT.

Also, this is a research paper, not some opinion piece in Huisgenoot.

tetrasect · Apr 24, 2026

Herr der Verboten said:
I'm sure you were not supposed to have dozens of open chats and dozens of days long running chats?

Also worth knowing for the uninitiated... be clear as **** on what you actually want / going to do otherwise the LLM will assume. Hell it does not have your context, nor does it live in your mind and code.

to add: "hey claude be jesus take the wheel" - you are going to get ****ed and it doesn't matter if you are on opus or not.

It's not about time or prompt specificity, it's about interactions. From the title page of the paper:

Herr der Verboten · Apr 24, 2026

Are you an AI proof reader or a real software developer?

tetrasect said:
Copilot is not a model, it's an agent that uses ChatGPT.

Also, this is a research paper, not some opinion piece in Huisgenoot.

That is so but remember copilot gives you the power of 10x developers.

Foxhound5366 · Apr 24, 2026

tetrasect said:
Copilot is not a model, it's an agent that uses ChatGPT.

Also, this is a research paper, not some opinion piece in Huisgenoot.

Yes, cigarette companies never paid for research papers published in reputable journals that proved that smoking is safe for you. Oh wait...

saor · Apr 25, 2026

Hemps said:
Since using this the quality of my projects has gone up by strides - https://www.promptcowboy.ai/

promptcowboy to create my prompt
prompt created I paste into claude
once i have used free claude credits
drag and drop project into Deepseek and continue the rest of the day

Giving Deepseek a go and so far it's pretty good. Gemini Pro was great up until I hit about 300 lines of code, after which it refused to return complete code after multiple changes, offering only snippets with little context .Tells me my code is taxing it's memory, I tell it that 300 lines of code is barely anything...

Proceeds to give me code with multiple strings of code completely changed.

Gemini is very well priced and great to get something going but you hit a very obvious wall quite soon. Hopefully gets fixed because I like how it works.

semaphore · Apr 25, 2026

saor said:
Giving Deepseek a go and so far it's pretty good. Gemini Pro was great up until I hit about 300 lines of code, after which it refused to return complete code after multiple changes, offering only snippets with little context .Tells me my code is taxing it's memory, I tell it that 300 lines of code is barely anything...

View attachment 1903445

Proceeds to give me code with multiple strings of code completely changed.

Gemini is very well priced and great to get something going but you hit a very obvious wall quite soon. Hopefully gets fixed because I like how it works.

Gemini is trash.

Solarion · Apr 30, 2026

Herr der Verboten said:
Are you an AI proof reader or a real software developer?

That is so but remember copilot gives you the power of 10x developers.

Thus your workload will increase 10 fold and you'll still be paid the same.

Herr der Verboten · Apr 30, 2026

Solarion said:
Thus your workload will increase 10 fold and you'll still be paid the same.

Needed: AI proof readers

Herr der Verboten · Apr 30, 2026

In other news: seems opus 4.7 can now reach its limits 'too soon' - even on max:

Had a chuckle as I did than in less than one hour for the one before 1pm reset.

All I basically did was running sonnet 4.6 cli to do initial analysis of work done against provided markdown for some unit tests I created/revised and then had opus read and check against the running workbook I had for it.

I assume that happend because of sonnet firing off its own agents and having asked opus for making a condesed conversation catch-up markdown.

Still actually just ****ing around with a web chat driving the process and cli doing some actions for it like anylsis. Of course you need to be strict with the cli otherwise it will do **** you don't want and like any llm needs to be clear and concise messages...

Kinda works but still best if these tools augment the process and not 'jesus take the wheel' / you end up being glorified proof reader knowing ****. Lol.

semaphore · May 2, 2026

Herr der Verboten said:
In other news: seems opus 4.7 can now reach its limits 'too soon' - even on max:

Had a chuckle as I did than in less than one hour for the one before 1pm reset.

All I basically did was running sonnet 4.6 cli to do initial analysis of work done against provided markdown for some unit tests I created/revised and then had opus read and check against the running workbook I had for it.

I assume that happend because of sonnet firing off its own agents and having asked opus for making a condesed conversation catch-up markdown.

Still actually just ****ing around with a web chat driving the process and cli doing some actions for it like anylsis. Of course you need to be strict with the cli otherwise it will do **** you don't want and like any llm needs to be clear and concise messages...

Kinda works but still best if these tools augment the process and not 'jesus take the wheel' / you end up being glorified proof reader knowing ****. Lol.

Anthropic has been nerfing limits every week, even max accounts. I am currently using a ProX5 on OpenAI. But will upgrade to 20x. Limits are exceptionally generous on those. I've also applied to OpenAI for OSS support on the project im working on just need to see if they accept.

FiestaST · May 2, 2026

The Vibe Coding Era: Why AI Won’t Replace Software Engineers

FiestaST · May 24, 2026

https://twitter.com/x/status/2058410166605791500

https://twitter.com/x/status/2058448418863710429

Herr der Verboten · May 24, 2026

FiestaST said:
https://twitter.com/x/status/2058410166605791500

https://twitter.com/x/status/2058448418863710429

Any UX work we can look at for master Otsile?

Mike Hoxbig · May 24, 2026

FiestaST said:
https://twitter.com/x/status/2058410166605791500

https://twitter.com/x/status/2058448418863710429

Fake news, you can't vibe code something that doesn't exist.

Meanwhile detectives are on the hunt for A records that were stolen from DNS. We will work through the night and keep the public updated...

Join the MyBroadband community

Get started

Vibecoding: a beginner's guide

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master