Post

I Built a WhatsApp AI Assistant in 24 Hours (and You Can Too!)

I Built a WhatsApp AI Assistant in 24 Hours (and You Can Too!)

The 2 AM Epiphany

It was 2 AM on a Thursday. I was lying in bed, scrolling through WhatsApp messages I’d forgotten to reply to (again), when it hit me: What if WhatsApp could reply for me? But like, intelligently?

By 2 AM Friday, I had a fully functional WhatsApp AI assistant. Here’s how that chaotic 24-hour journey went, and how you can build your own in way less time (because you’ll learn from my mistakes).

Hour 0-2: The “How Hard Could It Be?” Phase

First thought: WhatsApp API must be simple, right? Wrong.

WhatsApp Business API requires:

  • Business verification (takes days)
  • Facebook Business Manager (nightmare fuel)
  • Monthly fees (my wallet cried)

Then I discovered the hack: WhatsApp Web + Puppeteer = Freedom 🎉

Hour 3-6: The Great Authentication Dance

1
2
3
4
5
6
7
// My first attempt (spoiler: it failed spectacularly)
const client = new Client();
client.on('qr', qr => {
    console.log('QR Code:', qr);
    // Me: "Why isn't this working??"
    // Also me: Forgot to actually display the QR code 🤦
});

After three coffees and a minor existential crisis:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
const { Client, LocalAuth } = require('whatsapp-web.js');
const qrcode = require('qrcode-terminal');

const client = new Client({
    authStrategy: new LocalAuth(),
    puppeteer: {
        headless: true,
        args: ['--no-sandbox']
    }
});

client.on('qr', qr => {
    qrcode.generate(qr, { small: true });
    console.log('Scan this QR code with WhatsApp!');
});

Success! The QR code appeared. I scanned it. Magic happened. ✨

Hour 7-12: Making It Smart with Gemini

Time to add the AI brain. Google’s Gemini API to the rescue:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
const { GoogleGenerativeAI } = require('@google/generative-ai');
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

client.on('message', async msg => {
    // Ignore group messages (learned this the hard way)
    if (msg.from.includes('@g.us')) return;
    
    // The magic happens here
    const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });
    
    // Add personality (because why not?)
    const prompt = `You're a helpful but slightly sarcastic assistant. 
                   User says: "${msg.body}"
                   Reply in a friendly, conversational tone.`;
    
    const result = await model.generateContent(prompt);
    const reply = result.response.text();
    
    msg.reply(reply);
});

Hour 13-16: The Feature Creep Begins

“Basic replies are boring,” I thought. “Let’s add EVERYTHING!”

Voice Messages? Done.

1
2
3
4
5
6
7
8
if (msg.hasMedia && msg.type === 'ptt') {
    // Convert voice to text using Google Speech-to-Text
    const text = await transcribeAudio(msg);
    // Process with Gemini
    // Reply with voice using Google Text-to-Speech
    const audioReply = await generateVoiceReply(text);
    msg.reply(audioReply);
}

Image Analysis? Why Not.

1
2
3
4
5
6
7
if (msg.hasMedia && msg.type === 'image') {
    const media = await msg.downloadMedia();
    const prompt = `What's in this image? Be witty about it.`;
    // Gemini can see! 👀
    const result = await model.generateContent([prompt, media]);
    msg.reply(result.response.text());
}

Hour 17-20: The “Oh No” Moments

Disaster #1: The Infinite Loop

1
2
3
4
5
6
// DON'T DO THIS
client.on('message', async msg => {
    const reply = await generateReply(msg.body);
    msg.reply(reply);
    // Problem: It replies to its own messages 😱
});

Disaster #2: The Spam Apocalypse

My friend decided to test it by sending 50 messages in 10 seconds. The bot replied to ALL of them. My WhatsApp got temporarily banned. Oops.

The Fix: Rate Limiting

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
const rateLimiter = new Map();
const RATE_LIMIT = 5; // messages per minute

client.on('message', async msg => {
    const senderId = msg.from;
    const now = Date.now();
    
    if (!rateLimiter.has(senderId)) {
        rateLimiter.set(senderId, []);
    }
    
    const timestamps = rateLimiter.get(senderId);
    const recentMessages = timestamps.filter(t => now - t < 60000);
    
    if (recentMessages.length >= RATE_LIMIT) {
        return; // Ignore the message
    }
    
    timestamps.push(now);
    // Process the message...
});

Hour 21-24: Deploy to Google Cloud Run

Because what’s the point if it’s not running 24/7?

1
2
3
4
5
6
7
FROM node:18
WORKDIR /app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 8080
CMD ["node", "index.js"]

Deploy with one command:

1
2
3
4
gcloud run deploy whatsapp-ai \
  --source . \
  --region us-central1 \
  --allow-unauthenticated

The Final Product

After 24 hours, I had:

  • ✅ AI that responds to text messages
  • ✅ Voice message transcription and replies
  • ✅ Image analysis with witty comments
  • ✅ Rate limiting (no more bans!)
  • ✅ Deployed on Google Cloud
  • ✅ My sleep schedule completely destroyed

Lessons Learned

  1. Start simple: Get basic text replies working first
  2. Test with a spare number: Trust me on this one
  3. Rate limiting is NOT optional: WhatsApp will ban you
  4. Use Gemini Flash: It’s fast and cheap (basically free for personal use)
  5. Cloud Run > VM: Scales to zero, costs nothing when idle

Try It Yourself (The Sane Version)

Here’s the cleaned-up version that won’t get you banned:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
const { Client, LocalAuth } = require('whatsapp-web.js');
const { GoogleGenerativeAI } = require('@google/generative-ai');
require('dotenv').config();

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
const model = genAI.getGenerativeModel({ model: 'gemini-1.5-flash' });

const client = new Client({
    authStrategy: new LocalAuth()
});

// Track your own number to prevent self-replies
let myNumber = null;

client.on('ready', () => {
    console.log('WhatsApp AI is ready!');
    myNumber = client.info.wid._serialized;
});

client.on('message', async msg => {
    // Ignore groups and self
    if (msg.from.includes('@g.us') || msg.from === myNumber) return;
    
    try {
        const result = await model.generateContent(
            `Reply to: "${msg.body}" (keep it under 100 words)`
        );
        await msg.reply(result.response.text());
    } catch (error) {
        console.error('Error:', error);
        await msg.reply("I'm having a moment. Try again later! 🤖");
    }
});

client.initialize();

What’s Next?

I’m now working on:

  • Context memory: Remember previous conversations
  • Custom commands: /remind, /summarize, /translate
  • Multi-language support: Because why should only English speakers have AI friends?
  • Business mode: Professional responses during work hours

The Plot Twist

Remember those messages I forgot to reply to? The AI assistant now handles them… but I still forget to check what it said. Some problems, even AI can’t solve. 😅


Want the full code? Check out the GitHub repo and build your own assistant. Just promise you won’t use it to spam people!

Have you built something cool with WhatsApp? Share your chaos stories in the comments!

This post is licensed under CC BY 4.0 by the author.