All Insights
March 2026

Sunday coffee & Code: Website scraping and email drafting with multi-modal AI fallback

his weekend’s experiment tackled a problem I’ve been curious about for a while. I had a request for an AI-driven, Google-based solution to scan websites and draft emails from the content. I wondered whether an AI vision-based approach could reliably extract data from modern websites where traditional scraping may fail?

By Steve Harris

This weekend’s experiment tackled a problem I’ve been curious about for a while. I had a request for an AI-driven, Google-based solution to scan websites and draft emails from the content. I wondered whether an AI vision-based approach could reliably extract data from modern websites where traditional scraping may fail? So far, it looks like 𝘆𝗲𝘀 - though it needs more testing. I also found, again, that getting something running in Google’s ecosystem was very smooth - brilliant in fact. 𝗧𝗵𝗲 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵 𝗦𝘁𝗲𝗽 𝟭: Instead of relying on brittle CSS selectors, I used a headless browser API (ApiFlash) to capture both the raw HTML and a full-page screenshot of the target URL. 𝗦𝘁𝗲𝗽 𝟮: I passed both assets to Google Gemini 2.5 Flash with fallback logic: • First, scan the HTML for job postings. • If that fails, fall back to the screenshot and use multi-modal reasoning to identify the content visually, like a human would. 𝗪𝗵𝗮𝘁 𝗰𝗮𝗺𝗲 𝗼𝘂𝘁 𝗼𝗳 𝘁𝗵𝗲 𝗯𝘂𝗶𝗹𝗱 • A fully decoupled Python agent deployed on 𝗚𝗼𝗼𝗴𝗹𝗲 𝗖𝗹𝗼𝘂𝗱 𝗥𝘂𝗻 𝗝𝗼𝗯𝘀, giving me a more scalable and timeout-resistant option than the original Google Apps Script proof of concept. • A 𝗚𝗼𝗼𝗴𝗹𝗲 𝗦𝗵𝗲𝗲𝘁𝘀 dashboard to manage target URLs and track execution status. • Automated email drafting through the 𝗚𝗺𝗮𝗶𝗹 𝗔𝗣𝗜, with drafts only, so a human stays in the loop before anything is sent. • A path toward stronger identity separation using 𝗗𝗼𝗺𝗮𝗶𝗻-𝗪𝗶𝗱𝗲 𝗗𝗲𝗹𝗲𝗴𝗮𝘁𝗶𝗼𝗻 in the next version. • No Claude Code this week - this was built in a single chat thread with 𝗚𝗲𝗺𝗶𝗻𝗶 𝟯.𝟭 Pro, including the repo documentation and Mermaid diagrams. 𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝘁𝗶𝗼𝗻𝘀 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝗽𝗿𝗼𝗰𝗲𝘀𝘀 • Vision-based AI looks like a useful fallback for sites where HTML parsing misses content. • Building directly from 𝗚𝗼𝗼𝗴𝗹𝗲 𝗖𝗹𝗼𝘂𝗱 𝗦𝗵𝗲𝗹𝗹 using 𝘨𝘤𝘭𝘰𝘶𝘥 𝘳𝘶𝘯 𝘫𝘰𝘣𝘴 𝘥𝘦𝘱𝘭𝘰𝘺 --𝘴𝘰𝘶𝘳𝘤𝘦 . was refreshingly straightforward. • Cost appears extremely low, likely fractions of a penny per run, though APIFlash costs are still to be confirmed. • Fast redeployment made iteration easy. • This could be built in tools like n8n, but once things move beyond low-code workflows, I still prefer code. It gives me better visibility and control, being closer to the execution environment. 𝗪𝗵𝘆 𝗜 𝗹𝗶𝗸𝗲 𝘁𝗵𝗶𝘀 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵 • Solves the client problem • It satisfied my curiosity and let me explore a new pattern. • It appears more resilient by using AI vision as a fallback, rather than depending entirely on scraping. • It keeps a human safely in the loop at the final stage. 𝗡𝗲𝘅𝘁 𝘀𝘁𝗲𝗽: add to the URL list and assess extraction accuracy for AI vision fallback.

Want to Discuss This Topic?

Steve is always happy to have a direct conversation.