Skip to content

AI-powered login automation. Uses Claude to classify login pages and Playwright to interact with them.

License

Notifications You must be signed in to change notification settings

RichardHruby/login-machine

Repository files navigation

by Anon

Login Machine

One browser agent loop that logs into any website.
Credentials never touch the model. They flow directly into the browser DOM.

MIT License GitHub Stars Blog Post Live Demo

Try the live demoArticleQuick StartHow It WorksScreen TypesDesign PrinciplesArchitecture


Why

If you're building browser agents that need to log into websites, you know the pain. Every website has a different login flow, and traditional automation means writing a dedicated script for each one. Hardcoded selectors, brittle state machines, everything breaks when a site ships a redesign.

But login pages are designed for humans. Every screen is self-contained. You can always figure out what to do just by looking at it. An LLM with vision can do the same.

At Anon the Login Machine replaced hundreds of per-website scripts with a single agent loop that works for any login flow: multi-step credentials, SSO pickers, MFA prompts, magic links, all handled by the same code.

Quick Start

cp .env.example .env.local
# Fill in your API keys
npm install
npm run dev

Open http://localhost:3000 and paste a login URL, or try the hosted demo.

Environment Variables

Variable Description
ANTHROPIC_API_KEY Anthropic API key for Claude
BROWSERBASE_API_KEY BrowserBase API key
BROWSERBASE_PROJECT_ID BrowserBase project ID

How It Works

flowchart TD
    A(["Navigate to login page"]) --> B["Take Screenshot<br/>+ Extract HTML"]
    B --> C["LLM Classifies<br/>Screen Type"]
    C --> D{"Logged in?"}
    D -->|"No"| F["Request Input<br/>from User"]
    D -->|"Yes"| E(["Done"])
    F --> G["Submit Input<br/>in Browser"]
    G --> B
Loading

Stripped HTML + screenshot. Raw page HTML is full of scripts, styles, SVGs, and tracking pixels. The extractor walks the DOM recursively, strips everything except form-relevant tags and attributes, and traverses Shadow DOM boundaries so enterprise SSO widgets aren't missed. This cuts token usage by roughly 10x on complex pages and reduces hallucinated locators.

Credential isolation. The LLM analyzes the page and returns structured data describing what fields exist and their Playwright locators. It never sees what the user types. Credentials flow directly from the user into the browser DOM via Playwright.

Self-correcting locators. Every LLM-generated Playwright locator is validated against the live DOM before use. If a locator doesn't match, the error is fed back to the LLM in <error-history> tags for retry with context (up to 3 attempts).

Screen Types

The LLM classifies every page into one of six types, each with a strict Zod schema:

Type What It Is How It's Handled
credential_login_form Email, password, OTP fields + submit button Shows dynamic form → user fills → agent types into DOM
choice_screen Account picker, SSO options, workspace selector Shows buttons → user picks → agent clicks
magic_login_link "Check your email" screens Shows URL input → user pastes link → agent navigates
loading_screen Spinners, redirects, Cloudflare challenges Auto-waits and re-analyzes (max 12 retries)
blocked_screen Cookie banners, popups blocking the flow Auto-dismisses and re-analyzes
logged_in_screen Dashboard, homepage Terminal success state

Design Principles

  1. Observe, don't assume. Every action is followed by a fresh page analysis. The system never guesses what screen comes next.
  2. Validate before acting. LLM outputs are checked against the live DOM. Hallucinated selectors are caught and corrected before they cause errors.
  3. Fail forward with context. When something goes wrong, the error becomes part of the next attempt's context. The LLM doesn't repeat the same mistake.

Architecture

src/
├── app/api/chat/route.ts           # Single API endpoint (start/submit/close)
├── components/
│   ├── chat.tsx                     # Main UI shell (browser + chat columns)
│   ├── message-bubble.tsx           # Dynamic message rendering
│   ├── credential-form.tsx          # Login form with inline errors
│   ├── choice-buttons.tsx           # SSO / option selection
│   ├── magic-link-input.tsx         # Magic link URL input
│   └── log-panel.tsx               # Terminal-style log viewer
├── hooks/use-login-session.ts       # State management + SSE streaming
└── lib/ai-login/
    ├── agent.ts                     # LLM analysis + screen handlers
    ├── browser.ts                   # BrowserBase + Playwright (stateless)
    ├── prompts.ts                   # System prompt for classification
    └── types.ts                     # Zod schemas for all screen types

Stack

Component Technology
Frontend Next.js 16, React 19, Tailwind 4
LLM Claude Sonnet 4.5 via Vercel AI SDK
Browser Playwright over CDP via BrowserBase
Validation Zod schemas + live DOM locator checks

License

MIT. See LICENSE.


Built by @RichardHruby and @jesse-olympus at Anon

About

AI-powered login automation. Uses Claude to classify login pages and Playwright to interact with them.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •