This weekend’s experiment tackled a problem I’ve been curious about for a while.
I had a request for an AI-driven, Google-based solution to scan websites and draft emails from the content. I wondered whether an AI vision-based approach could reliably extract data from modern websites where traditional scraping may fail?
So far, it looks like 𝘆𝗲𝘀 - though it needs more testing.
I also found, again, that getting something running in Google’s ecosystem was very smooth - brilliant in fact.
𝗧𝗵𝗲 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵
𝗦𝘁𝗲𝗽 𝟭: Instead of relying on brittle CSS selectors, I used a headless browser API (ApiFlash) to capture both the raw HTML and a full-page screenshot of the target URL.
𝗦𝘁𝗲𝗽 𝟮: I passed both assets to Google Gemini 2.5 Flash with fallback logic:
• First, scan the HTML for job postings.
• If that fails, fall back to the screenshot and use multi-modal reasoning to identify the content visually, like a human would.
𝗪𝗵𝗮𝘁 𝗰𝗮𝗺𝗲 𝗼𝘂𝘁 𝗼𝗳 𝘁𝗵𝗲 𝗯𝘂𝗶𝗹𝗱
• A fully decoupled Python agent deployed on 𝗚𝗼𝗼𝗴𝗹𝗲 𝗖𝗹𝗼𝘂𝗱 𝗥𝘂𝗻 𝗝𝗼𝗯𝘀, giving me a more scalable and timeout-resistant option than the original Google Apps Script proof of concept.
• A 𝗚𝗼𝗼𝗴𝗹𝗲 𝗦𝗵𝗲𝗲𝘁𝘀 dashboard to manage target URLs and track execution status.
• Automated email drafting through the 𝗚𝗺𝗮𝗶𝗹 𝗔𝗣𝗜, with drafts only, so a human stays in the loop before anything is sent.
• A path toward stronger identity separation using 𝗗𝗼𝗺𝗮𝗶𝗻-𝗪𝗶𝗱𝗲 𝗗𝗲𝗹𝗲𝗴𝗮𝘁𝗶𝗼𝗻 in the next version.
• No Claude Code this week - this was built in a single chat thread with 𝗚𝗲𝗺𝗶𝗻𝗶 𝟯.𝟭 Pro, including the repo documentation and Mermaid diagrams.
𝗢𝗯𝘀𝗲𝗿𝘃𝗮𝘁𝗶𝗼𝗻𝘀 𝗳𝗿𝗼𝗺 𝘁𝗵𝗲 𝗽𝗿𝗼𝗰𝗲𝘀𝘀
• Vision-based AI looks like a useful fallback for sites where HTML parsing misses content.
• Building directly from 𝗚𝗼𝗼𝗴𝗹𝗲 𝗖𝗹𝗼𝘂𝗱 𝗦𝗵𝗲𝗹𝗹 using 𝘨𝘤𝘭𝘰𝘶𝘥 𝘳𝘶𝘯 𝘫𝘰𝘣𝘴 𝘥𝘦𝘱𝘭𝘰𝘺 --𝘴𝘰𝘶𝘳𝘤𝘦 . was refreshingly straightforward.
• Cost appears extremely low, likely fractions of a penny per run, though APIFlash costs are still to be confirmed.
• Fast redeployment made iteration easy.
• This could be built in tools like n8n, but once things move beyond low-code workflows, I still prefer code. It gives me better visibility and control, being closer to the execution environment.
𝗪𝗵𝘆 𝗜 𝗹𝗶𝗸𝗲 𝘁𝗵𝗶𝘀 𝗮𝗽𝗽𝗿𝗼𝗮𝗰𝗵
• Solves the client problem
• It satisfied my curiosity and let me explore a new pattern.
• It appears more resilient by using AI vision as a fallback, rather than depending entirely on scraping.
• It keeps a human safely in the loop at the final stage.
𝗡𝗲𝘅𝘁 𝘀𝘁𝗲𝗽: add to the URL list and assess extraction accuracy for AI vision fallback.
All Insights
March 2026
Sunday coffee & Code: Website scraping and email drafting with multi-modal AI fallback
his weekend’s experiment tackled a problem I’ve been curious about for a while. I had a request for an AI-driven, Google-based solution to scan websites and draft emails from the content. I wondered whether an AI vision-based approach could reliably extract data from modern websites where traditional scraping may fail?
By Steve Harris
Want to Discuss This Topic?
Steve is always happy to have a direct conversation.
