AI Check-In: Which Agents Are You Currently Using?

Herr der Verboten · May 6, 2026

semaphore said:
Your prompting is bad. That is easily resolved.

This does play a large part, however past few days on opus 4.6 was fundamentally better than opus 4.7. Not only that also usage was magically less, however, not too surprising.

Herr der Verboten · May 6, 2026

semaphore said:

You can use a variation of this

Code:

# XX  Spec Map
This file is the short agent table of contents. Keep it concise.

## First Read

1. `docs/superpowers/review/track-start-prompt.md` - track/chunk workflow contract.
2. `docs/adherence.md` - execution and evidence constraints.
3. `docs/review_criteria.md` - consensus review contract.

## Track Plan Index (2026-05-05)
All 12 implementation track plans are in `docs/superpowers/plans/`:
| Track | Plan file |
|---|---|
| track01 | `2026-05-01-track-01-xx-setup-plan.md` |

## Conversation Commands (Spec-First)
- `./agent start track TRACK [chunk CHUNK]`: resolve scope and report scope only.
- `start track TRACK chunk CHUNK`: equivalent (tool-conversation form) for the above.
- `start chunk CHUNK`: infer active track from current context when clear; otherwise ask for track.
- `confirm start`: execute only the confirmed chunk plan.
- `./agent review track TRACK CHUNK`: run chunk review protocol against monitor/task artifacts.
- `review track TRACK CHUNK`: equivalent (tool-conversation form) for the above.
- `./agent review-track-gate TRACK [CHUNK]`: verify track/chunk gate completion from the latest review cycle.
- `./agent preflight`: run `tools/agent/preflight.sh`.

### Scope Convention
Treat these commands as *documentation planning gates* until implementation work starts. This repo is currently in spec stage.
## Hard Rules
1. Keep this repo spec-first: do not claim implementation closure without evidence artifacts.
2. Use small, explicit artifacts with low abstraction overhead.
3. Keep track/chunk evidence synchronized across:
   - `docs/superpowers/plans/`
   - `docs/superpowers/tasks/`
   - `docs/superpowers/evidence/`
   - `docs/superpowers/implementation/`
4. Adherence and review criteria are mandatory inputs before closing any chunk.
5. Use relative file paths that match repo-local references only.
6. Do not revert unrelated work unless explicitly asked.
## Rust Rules
[1] Read Cargo.toml, workspace, edition, rust-version, features, lints, CI, src/lib.rs, tests, unsafe/FFI before editing.
[2] Preserve public API, feature behavior, edition, and MSRV unless explicitly asked to change them.
[3] Borrow by default at API boundaries; avoid unnecessary ownership, allocation, and clone().
[4] Use Option for absence, Result for failure, and avoid unwrap on caller-controlled input.
[5] Keep binaries thin and reusable logic in libraries.
[6] Do not widen pub visibility to make code compile.
[7] Async: do not block executor threads or hold locks across .await.
[8] Unsafe: keep narrow, document every block with SAFETY, and wrap in safe abstractions.
[9] FFI: isolate raw bindings, document ownership/lifetime/layout/unwind assumptions.
[10] Add docs and tests for new public behavior.
[11] Run or satisfy fmt, clippy, check/test/doc, dependency/security, package, and semver gates.
[12] Flag semver impact for public API changes.
## Canonical Policy
- Product scope and architecture: `docs/xxx.md`
- Current plan index: `docs/plans/`
- Execution conventions and commands: this file + `docs/superpowers/review/track-start-prompt.md`
- Schemas: `docs/spec/*.schema.json`
- Review governance: `docs/adherence.md` and `docs/review_criteria.md`
## Harness Commands
- `scripts/check-spec-harness.sh` - lightweight spec consistency and link checks.
- `./agent start TRACK [CHUNK]` -> `tools/agent/start_track.sh`.
- `./agent review TRACK CHUNK` -> `tools/agent/review_chunk.sh`.
- `./agent review-track-gate TRACK [CHUNK]` -> `tools/agent/check_track_gate.sh`.
- `./agent next-chunk TRACK` -> `tools/agent/next_chunk.sh`.
- `agent check-track-artifacts` -> `scripts/check-track-artifacts.sh`.
- `agent check-agent-tools` -> `scripts/check-agent-tools.sh`.
- `agent preflight` -> `tools/agent/preflight.sh`.
## Track Closure Gate
- Track/chunk completion requires a successful latest-review-cycle pass:
  - `PASS 3/3`
  - 3/3 reviewers
  - zero blocking findings
  - zero non-blocking findings
  - PASS verdict
- Do not move to the next chunk or next track on narrative progress alone; require the applicable latest review cycle to reach `PASS 3/3` first.
- Current track01 execution constraint: stay mac-first; keep Windows/Linux/headless jobs scaffold-only and deferred from `chunk00` closure until explicitly redirected.
- Use `tools/agent/check_track_gate.sh TRACK CHUNK`.
## Verification
Run this before marking a track chunk complete:
```bash
scripts/check-spec-harness.sh
```
Run full workflow check before closure:
```bash
tools/agent/preflight.sh
```

adherence

Code:

# Strict Adherence Contract (xxx Spec Harness)
Date established: 2026-05-01
Scope: all active tracks/chunks unless explicitly overridden.
## Non-Negotiable Rules
1. No partial implementations.
2. No placeholders, TODO-backed behavior, or fake-green completion claims.
3. No narrowed review scope that omits required chunk files/tests/evidence.
4. No protocol drift from active track/spec artifacts.
5. No hidden behavior changes not reflected in task/plan/evidence artifacts.
6. Do not mark tracks/chunks done without evidence that supports every acceptance item.
7. Preserve artifact alignment between plan/task/monitor/implementation notes whenever either changes.
8. All command evidence must include command text and the relevant output summary.
## Authoritative Source Order
1. `xxxmd`
2. Active track plan
3. Active track task
4. Active monitor
5. `review_criteria.md`
## Completion Gate
1. Scope is executed exactly as committed in the confirmed start prompt.
2. Required docs/spec/test evidence exists in monitor and/or task artifacts.
3. Required commands are run and outputs are captured.
4. Review findings (if any) are captured, triaged, and resolved.
5. The latest review cycle satisfies the applicable criteria and has no unresolved blocking findings.
## Test/Evidence Standard
- Behavioral claims must be backed by executable evidence whenever possible.
- If a required command is unavailable, record the blocker and approval state explicitly.
## Review Integrity
- Do not claim work completed from assumptions or narrative-only updates.
- If evidence is found stale, add a remediation note and refresh it before chunk close.

review_criteria

Code:

# Review Criteria (xx)
Date established: 2026-05-01
Applies to: consensus reviews for active tracks/chunks
## Consensus Contract
1. Exactly 3 review inputs are required for a fresh review cycle.
2. Reviewers must run in strict isolation and use independent context.
3. Reviewers must not delegate or run orchestration scripts.
4. Include this line verbatim in every reviewer prompt:
   `Hard constraint: do not call spawn_agent, do not delegate, and do not run reviewer orchestration scripts.`
5. If any launch is interrupted or violates rules 1-4, discard and rerun until exactly three valid reviewers complete.
6. Final sign-off requires all three reviewers to return a non-blocking verdict.
7. Track/chunk completion check is explicit and requires:
   - 3 valid reviewers.
   - `PASS 3/3` review cycle status.
   - zero blocking findings.
   - zero non-blocking findings.
## Required Reviewer Output
Each review cycle entry must include:
- verdict (`PASS` or `FAIL`)
- blocking findings first (with file references where possible)
- non-blocking findings
- explicit sign-off recommendation
## Consensus Gate
- Any blocking finding => `FAIL`.
- `PASS` only when zero blocking findings and criteria above are satisfied.
- Chunk closure requires:
  - latest cycle status `PASS 3/3`
  - zero blocking findings
  - zero non-blocking findings
- Any invalid cycle must be rerun after blocker(s) are fixed.

This has been refined over a few months, there are a lot of other files you need but you can kinda infer what the rest should be. This uses spec driven development with superpowers and strict gating and review panels

I don't have such a setup, but nowadays I do something similar which yields a lot better results.

~~I don't know if gpt has a cli like claude code, which~~ claude cli is nice inline with a web chat where as the webchat is basically the runbook.

Obviously you can rewrite the runbooks like you have with supporting rails and it does the same in the end.

Bottom line is without these it's too much of wildwest result-set.

semaphore · May 6, 2026

Herr der Verboten said:
I don't have such a setup, but nowadays I do something similar which yields a lot better results.

~~I don't know if gpt has a cli like claude code, which~~ claude cli is nice inline with a web chat where as the webchat is basically the runbook.

Obviously you can rewrite the runbooks like you have with supporting rails and it does the same in the end.

Bottom line is without these it's too much of wildwest result-set.

~~I don't know if gpt has a cli like claude code, which~~ ---> You haven't used Codex, lol?

semaphore · May 6, 2026

Herr der Verboten said:
This does play a large part, however past few days on opus 4.6 was fundamentally better than opus 4.7. Not only that also usage was magically less, however, not too surprising.

Opus 4.7 is just 4.6. Anthropic are essentially just gaslighting scammers at this point. Dario is a sketchy character.

Herr der Verboten · May 6, 2026

semaphore said:
~~I don't know if gpt has a cli like claude code, which~~ ---> You haven't used Codex, lol?

it's been a long day then I realized by inference.

acidrain · Jun 3, 2026

Not sure this fits here but it seems to be the most appropriate thread.

Came across a YouTube video sponsored by Merlin AI. You can get the Pro version for 5 bucks a month on the annual plan using code AN5.

Seems like a good deal, compared to ChatGPT sub.

Nimz · Jun 3, 2026

acidrain said:
Not sure this fits here but it seems to be the most appropriate thread.

Came across a YouTube video sponsored by Merlin AI. You can get the Pro version for 5 bucks a month on the annual plan using code AN5.

Seems like a good deal, compared to ChatGPT sub.

limits will probably be kak

acidrain · Jun 3, 2026

Nimz said:
limits will probably be kak

$16 / day and $100 / month usage limits

Kosmos_ · Jun 3, 2026

Used Grok, now ChatGPT.

Ipwn 4 · Jun 15, 2026

Other: Around the time the Claude code source code leaked I made the jump to open code and ultimately to pi.dev to have far more control over my agents. Anthropic ruined the harness through bloated system prompts in an attempt to turn claude code into everything for everyone. Pi allows for absolute granular control allowing you to fine tune project specific context for repeatable outcomes.

My primary model as the moment is GPT 5.5 but I cycle through whatever is trending on Openrouter to see what the chinese are up to.

For actual dev the ralph loop has served me well when paired with a strong framework. Its token heavy as the agent spends quite a bit of time aligning changes with existing conventions but I'd say it's worth it in the long haul as the code it generates is aligned with the rest of the codebase.

Had a quick look at my usage tracker, averaging just shy of 2bn tokens per month, most of which is cached. Consumption is 20% email parsing, 40% PM / PA work and 40% code generation (which covers planning, build, review, test and QA through playwright

Herr der Verboten · Jun 15, 2026

Ipwn 4 said:
Other: Around the time the Claude code source code leaked I made the jump to open code and ultimately to pi.dev to have far more control over my agents. Anthropic ruined the harness through bloated system prompts in an attempt to turn claude code into everything for everyone. Pi allows for absolute granular control allowing you to fine tune project specific context for repeatable outcomes.

My primary model as the moment is GPT 5.5 but I cycle through whatever is trending on Openrouter to see what the chinese are up to.

For actual dev the ralph loop has served me well when paired with a strong framework. Its token heavy as the agent spends quite a bit of time aligning changes with existing conventions but I'd say it's worth it in the long haul as the code it generates is aligned with the rest of the codebase.

Had a quick look at my usage tracker, averaging just shy of 2bn tokens per month, most of which is cached. Consumption is 20% email parsing, 40% PM / PA work and 40% code generation (which covers planning, build, review, test and QA through playwright

Don't use it like a search engine?

saor · Jun 15, 2026

Installed codex. Much easier giving it access to files rather than constantly uploading to a web UI. Still need to see how it compares to Claude. Last I used gpt for code it wasn't great.

Herr der Verboten · Jun 15, 2026

saor said:
Installed codex. Much easier giving it access to files rather than constantly uploading to a web UI. Still need to see how it compares to Claude. Last I used gpt for code it wasn't great.

Claude has drastically improved after I have moved away from prompt / use these docs to a fully system of

claude.md
skills
memories

But you still have to properly set them up with frozen references and guidelines

Then when that is in place; prompting is minimal as its cli can now properly invoke skills without bloating itself into a stupor. However, initial setup, eval and refinement is a bit kak. So, to help with that I had some skills built in feedback to directly tell me why it did A or B instead of C, which I kept as progress log and feedback log.

Still... it augments the process. It is not the process.

saor · Jun 15, 2026

Herr der Verboten said:
Claude has drastically improved after I have moved away from prompt / use these docs to a fully system of

claude.md
skills
memories

But you still have to properly set them up with frozen references and guidelines

Then when that is in place; prompting is minimal as its cli can now properly invoke skills without bloating itself into a stupor. However, initial setup, eval and refinement is a bit kak. So, to help with that I had some skills built in feedback to directly tell me why it did A or B instead of C, which I kept as progress log and feedback log.

Still... it augments the process. It is not the process.

Ja with codex I realised that if I have the Arduino ide cli installed, codex can invoke the CLI, compile the code and check for errors before I upload to my microcontroller. Will give this a spin and then look at your Claude suggestion.

Herr der Verboten · Jun 15, 2026

saor said:
Ja with codex I realised that if I have the Arduino ide cli installed, codex can invoke the CLI, compile the code and check for errors before I upload to my microcontroller. Will give this a spin and then look at your Claude suggestion.

ill link the vid for that so there is more context.

Herr der Verboten · Jun 15, 2026

So, this guy may not be great but that is where I started from but also check how to actually make skills effectively. This also goes for claud.md, how not to bloat it and keep it kiss for claude itself not you the human. This is the part most people miss when they setup these for Claude it's for the machine not for you. What I did add specifically was a memory and claude.md entry that states do not just make memories - state when making one, why and then ask me if you can. If you don't do this it gets out of hand very quickly.

Then I made a skill to create skills and a skill to revise skill. Reason for this is to enforce that skills created / skills revised don't have conflicting or contradicting methodologies as well as the content they like frozen examples still makes sense. What you really want to avoid here is that as well is skills referencing items in your work that can change like code files.

Finally, when that is done, I use these to create or revise my skills like for example skill documentation. That skill itself knows exactly how to set itself up, add to index, add to topics, and finally create the initial documentation I can add to or keep as is.

Reason I you want this is that this is simpler and better than each time telling claude it must now do a doc, when it does a doc, it must do it like this, and having done it like this, it must now add it to the index, and when it was added to the index it must now add it to the searchable topics or pasting in prompts or docs. Now you just say <skill name> action or invoke this skill for that; or even have skills in skills so it fully automated, guard railed, etc.

FiestaST · Jun 15, 2026

This person gave Claude 12/10, that is pushing it

https://www.linkedin.com/in/theanishajain/

Not a typo mistake on the 3rd one.

I rated every AI tool you use (brutally):

✦ 𝗧𝗛𝗘 TOP TIER ✦

Claude - 12/10

☑ The one I actually work inside all day.
☑ Best at writing, thinking, & coding.
☑ Yes, I gave it a 12. I stand by it.

✦ 𝗣𝗘𝗥𝗙𝗘𝗖𝗧 𝗦𝗖𝗢𝗥𝗘𝗦 ✦

Wispr Flow – 10/10

☑ Write text with your voice.
☑ Insanely fast and accurate.
☑ Keeps training on your edits.

Granola – 10/10

☑ The best AI meeting notetaker.
☑ Made for simplicity and efficiency.
☑ One of the few that does NOT join the call.

Gamma – 10/10

☑ Make slides, with AI.
☑ The best PowerPoint killer.
☑ With taste in their designs.

✦ 𝗧𝗛𝗘 𝗦𝗢𝗟𝗜𝗗 𝗢𝗡𝗘𝗦 ✦

Notion - 9/10

☑ Where my whole brain lives.
☑ AI is now baked into every page.
☑ Still the best workspace, period.

Canva – 9/10

☑ The non-designer design tool.
☑ Getting scary good at AI features.
☑ I make half my visuals here.

ChatGPT Image - 9/10

☑ Shockingly good at text in images.
☑ My go-to for quick visuals.
☑ The one OpenAI feature I open daily.

ChatGPT – 8/10

☑ Still the default for most people.
☑ Great all-rounder, master of none.
☑ Loses to Claude where it matters to me.

Grok – 8/10

☑ The alternative from Elon Musk.
☑ Best at searching tweets, and that's nice.
☑ Not at the top yet, but growing (very) fast.

NotebookLM – 8/10

☑ Upload your courses and actually learn.
☑ Turns sources into podcasts and mindmaps.
☑ Takes up to 50 sources, even from YouTube.

Perplexity – 7/10

☑ The one AI built to search.
☑ Most of the LLMs live inside it.
☑ The "Discover" tab beats Google News.

Grammarly - 7/10

☑ The quiet one nobody talks about.
☑ Still fixes what I miss every day.
☑ Boring, but it earns its spot.

✦ 𝗧𝗛𝗘 𝗠𝗜𝗗 𝗢𝗡𝗘𝗦 ✦

Gemini – 6/10

☑ Google's answer to ChatGPT.
☑ Not great at search or text, but good at…
☑ … multi-lingual tasks.

Nano Banana – 5/10

☑ Genuinely fun image model.
☑ Stole the show for a while.
☑ Switched to ChatGPT image for better quality.

✦ 𝗧𝗛𝗘 𝗗𝗜𝗦𝗔𝗣𝗣𝗢𝗜𝗡𝗧𝗠𝗘𝗡𝗧𝗦 ✦

Otter – 3/10

☑ The famous meeting notetaker?
☑ I hate having it connected to the call.
☑ Granola does this better in every way.

✦ 𝗧𝗛𝗘 𝗭𝗘𝗥𝗢𝗦 ✦

Copilot – the worst LLM at almost everything - 0/10
Replit – far better options for vibecoding - 0/10

FiestaST · Jun 16, 2026

L-Dog · Jun 18, 2026

Getting a bit fed up with claude (been down twice over the past weeks while I was busy with a project). Heard good things about cursor anyone here use it and how was the experience ?

Hemps · Jun 24, 2026

Anyone try freebuff - https://freebuff.com/

I know it's Chinese but damn, even on the simple Mimi 2.5 this thing crunches, results are decent, of course project was started in Claude Code.

Join the MyBroadband community

Get started

AI Check-In: Which Agents Are You Currently Using?

Which Agents Are You Currently Using?

ChatGPT

Claude

Gemini

Microsoft Copilot

Perplexity

Cursor

Notion AI

Self-hosted agents

I’m not currently using any AI agents

Other — comment below

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Executive Member

Executive Member

Executive Member

Well-Known Member

Expert Member

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Honorary Master

Executive Member

Honorary Master