Gemini API Integration
UniPulse uses Google Gemini 2.5 Flash as its primary AI model via the @google/genai SDK for all text generation, and text-embedding-004 for vector embeddings.
SDK Setup
import { GoogleGenAI } from '@google/genai';
const genai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY });
| Model | Purpose | Configuration |
|---|---|---|
gemini-2.5-flash | Text generation (captions, replies, analysis) | Temperature varies by task |
text-embedding-004 | Vector embeddings (similarity, classification) | Fixed dimensions |
Core AI Functions
All AI functions are in apps/api/src/services/ai.service.ts:
| Function | Purpose | Input | Output |
|---|---|---|---|
callGemini() | Base function for all Gemini API calls | Prompt, system prompt, temperature | Generated text |
generateCaption() | Generate post captions | Topic, platform, brand voice, language, count | Array of caption options |
rewriteCaption() | Rewrite an existing caption | Caption, instruction, platform | Rewritten caption |
generateHashtags() | Generate relevant hashtags | Caption, platform, count | Array of hashtags |
generateCTA() | Generate call-to-action text | Context, goal, platform | CTA options |
translateCaption() | Translate caption to another language | Caption, target language | Translated text |
generateImage() | Generate image from prompt | Description, style, dimensions | Image URL |
API Endpoints
All AI routes are under /api/v1/ai:
| Endpoint | Method | Function Called | Rate Limited |
|---|---|---|---|
/api/v1/ai/caption/generate | POST | generateCaption() | aiLimiter + requireQuota |
/api/v1/ai/caption/rewrite | POST | rewriteCaption() | aiLimiter + requireQuota |
/api/v1/ai/hashtags/generate | POST | generateHashtags() | aiLimiter + requireQuota |
/api/v1/ai/cta/generate | POST | generateCTA() | aiLimiter + requireQuota |
/api/v1/ai/translate | POST | translateCaption() | aiLimiter + requireQuota |
/api/v1/ai/image/generate | POST | generateImage() | aiLimiter + requireQuota |
/api/v1/ai/chat | POST | AI chat | aiLimiter |
/api/v1/ai/suggestions | GET | Background suggestions | requireFeature |
Where AI Is Used Across the Platform
| Feature | Service | Description | Temperature |
|---|---|---|---|
| Caption generation | ai.service | Generate social media captions | 0.8 (creative) |
| Caption rewriting | ai.service | Rewrite with specific instructions | 0.7 |
| Hashtag generation | ai.service | Platform-optimized hashtags | 0.5 |
| CTA generation | ai.service | Call-to-action text | 0.7 |
| Translation | ai.service | Multi-language caption translation | 0.3 (precise) |
| Brand voice application | brand-voice.service | Apply trained brand voice style | 0.6 |
| Content calendar | content-calendar.service | Plan content schedules with AI | 0.8 |
| Content repurposing | content-repurpose.service | Transform content across formats | 0.7 |
| Trend analysis | trend-scanner.service | Detect and analyze trending topics | 0.5 |
| Performance advisor | ai-advisor.service | Generate performance recommendations | 0.4 |
| Conversation replies | conversation-brain.service | Generate context-aware replies | 0.6 |
| Intent classification | intent-classifier.service | Classify message intent and sentiment | 0.2 (deterministic) |
| Post classification | classification.service | Categorize post content | 0.2 |
| A/B test evaluation | ab-test.service | Analyze test results with AI | 0.3 |
| Gap analysis | competitor.service | Competitive content gap analysis | 0.5 |
| Predictions | prediction.service | Engagement and revenue forecasting | 0.3 |
Usage Tracking
All AI calls are tracked per workspace via incrementUsage():
// After each successful AI call
await incrementUsage(workspaceId, 'ai_generations');
Usage is stored in the WorkspaceUsage model and checked by requireQuota middleware before each AI request. The AiPromptLog model records individual API calls for debugging and cost analysis:
| Field | Purpose |
|---|---|
workspaceId | Workspace that made the call |
promptType | Type of prompt (caption, hashtag, reply, etc.) |
tokens | Token count (input + output) |
model | Model used (gemini-2.5-flash) |
latencyMs | Response time |
createdAt | Timestamp |
Prompt Type Index (PROMPT_CONFIGS)
All prompt types are indexed in PROMPT_CONFIGS for consistent configuration:
const PROMPT_CONFIGS = {
caption_generate: { model: 'gemini-2.5-flash', temperature: 0.8, maxTokens: 2000 },
caption_rewrite: { model: 'gemini-2.5-flash', temperature: 0.7, maxTokens: 2000 },
hashtag_generate: { model: 'gemini-2.5-flash', temperature: 0.5, maxTokens: 500 },
cta_generate: { model: 'gemini-2.5-flash', temperature: 0.7, maxTokens: 500 },
translate: { model: 'gemini-2.5-flash', temperature: 0.3, maxTokens: 2000 },
intent_classify: { model: 'gemini-2.5-flash', temperature: 0.2, maxTokens: 200 },
conversation: { model: 'gemini-2.5-flash', temperature: 0.6, maxTokens: 1000 },
// ... more prompt types
};
Rate Limiting & Cost Control
| Control | Implementation |
|---|---|
| Per-IP rate limiting | aiLimiter in rateLimiter.ts |
| Per-workspace quotas | requireQuota('ai_generations') middleware |
| Plan-based limits | PlanFeature defines max AI calls per billing period |
| Usage dashboard | Admin panel shows AI consumption per workspace |
| Cost tracking | AiPromptLog records tokens for cost analysis |
Environment Variables
| Variable | Required | Description |
|---|---|---|
GEMINI_API_KEY | Yes (for AI features) | Google AI API key from ai.google.dev |
The GEMINI_API_KEY should never be exposed to the frontend. All AI calls go through the backend API, which adds authentication, rate limiting, and usage tracking before calling Gemini.
- Conversation Engine -- ICE 3-step LLM pipeline
- Prompt Templates -- prompt engineering patterns
- Embeddings -- vector embedding details
- Memory System -- conversation memory tiers
- Services -- AI service layer
- Queue System -- async AI queues