Embeddings & Classification

UniPulse uses Google's text-embedding-004 model to generate vector embeddings for content similarity, classification, and audience clustering.

Embedding Model

Property	Value
Model	`text-embedding-004`
Provider	Google AI (via `@google/genai`)
Dimensions	768
Storage	`PostEmbedding` table in PostgreSQL

Embedding Service

embedding.service.ts provides vector embedding operations:

Function	Description	Input	Output
`generate()`	Generate embedding for text	Text string	Float array (768 dims)
`search()`	Find similar content by embedding	Query embedding, threshold	Ranked results
`similarity()`	Compute cosine similarity between two texts	Text A, Text B	Score (0-1)

// Generate embedding for a post
const embedding = await embeddingService.generate(post.caption);

// Store in database
await prisma.postEmbedding.create({
  data: {
    postId: post.id,
    embedding: embedding, // Float array
    model: 'text-embedding-004',
  },
});

// Find similar posts
const similar = await embeddingService.search(queryEmbedding, {
  workspaceId,
  threshold: 0.8,
  limit: 10,
});

Use Cases

Content Similarity

Find posts similar to a given post for content recommendations:

Application	Description
Content recommendations	Suggest similar performing posts for inspiration
Duplicate detection	Detect near-duplicate content before publishing
Content clustering	Group posts by semantic topic

Post Classification

classification.service.ts uses embeddings combined with Gemini to classify post content:

Classification	Categories	Purpose
Content type	Educational, promotional, entertainment, inspirational, news, behind-the-scenes	Analytics breakdown by type
Topic tags	Auto-generated topic labels	Content organization
Sentiment	Positive, negative, neutral	Sentiment analysis
Performance bucket	High, medium, low predicted engagement	Content strategy

The PostClassification model stores results:

model PostClassification {
  id         String   @id @default(cuid())
  postId     String   @unique
  category   String   // Content type
  tags       String[] // Topic tags
  sentiment  String   // Sentiment
  confidence Float    // Classification confidence
}

Audience Clustering

Embeddings of audience interactions are used to cluster similar audience members:

Application	Description
Segment discovery	Auto-discover audience segments based on behavior patterns
Interest profiling	Map audience members to interest topics
Lookalike audiences	Find audience members similar to high-value customers

Brand Voice Matching

Measure how closely generated content matches the workspace's brand voice:

const brandVoiceEmbedding = await embeddingService.generate(brandVoice.samples.join(' '));
const captionEmbedding = await embeddingService.generate(generatedCaption);
const matchScore = cosineSimilarity(brandVoiceEmbedding, captionEmbedding);
// matchScore: 0.0 (no match) to 1.0 (perfect match)

Async Processing

Embedding generation happens asynchronously via the post-classify queue:

This ensures embedding generation doesn't slow down post creation.

Database Models

model PostEmbedding {
  id        String   @id @default(cuid())
  postId    String   @unique
  embedding Float[]  // 768-dimensional vector
  model     String   // "text-embedding-004"
  createdAt DateTime @default(now())

  @@index([postId])
}

Cross-Reference

Gemini API -- text-embedding-004 model details
Conversation Engine -- embeddings in audience context
Queue System -- post-classify queue
Schema Overview -- PostEmbedding and PostClassification models

Embedding Model​

Embedding Service​

Use Cases​

Content Similarity​

Post Classification​

Audience Clustering​

Brand Voice Matching​

Async Processing​

Database Models​