Skip to main content

Embeddings & Classification

UniPulse uses Google's text-embedding-004 model to generate vector embeddings for content similarity, classification, and audience clustering.


Embedding Model

PropertyValue
Modeltext-embedding-004
ProviderGoogle AI (via @google/genai)
Dimensions768
StoragePostEmbedding table in PostgreSQL

Embedding Service

embedding.service.ts provides vector embedding operations:

FunctionDescriptionInputOutput
generate()Generate embedding for textText stringFloat array (768 dims)
search()Find similar content by embeddingQuery embedding, thresholdRanked results
similarity()Compute cosine similarity between two textsText A, Text BScore (0-1)
// Generate embedding for a post
const embedding = await embeddingService.generate(post.caption);

// Store in database
await prisma.postEmbedding.create({
data: {
postId: post.id,
embedding: embedding, // Float array
model: 'text-embedding-004',
},
});

// Find similar posts
const similar = await embeddingService.search(queryEmbedding, {
workspaceId,
threshold: 0.8,
limit: 10,
});

Use Cases

Content Similarity

Find posts similar to a given post for content recommendations:

ApplicationDescription
Content recommendationsSuggest similar performing posts for inspiration
Duplicate detectionDetect near-duplicate content before publishing
Content clusteringGroup posts by semantic topic

Post Classification

classification.service.ts uses embeddings combined with Gemini to classify post content:

ClassificationCategoriesPurpose
Content typeEducational, promotional, entertainment, inspirational, news, behind-the-scenesAnalytics breakdown by type
Topic tagsAuto-generated topic labelsContent organization
SentimentPositive, negative, neutralSentiment analysis
Performance bucketHigh, medium, low predicted engagementContent strategy

The PostClassification model stores results:

model PostClassification {
id String @id @default(cuid())
postId String @unique
category String // Content type
tags String[] // Topic tags
sentiment String // Sentiment
confidence Float // Classification confidence
}

Audience Clustering

Embeddings of audience interactions are used to cluster similar audience members:

ApplicationDescription
Segment discoveryAuto-discover audience segments based on behavior patterns
Interest profilingMap audience members to interest topics
Lookalike audiencesFind audience members similar to high-value customers

Brand Voice Matching

Measure how closely generated content matches the workspace's brand voice:

const brandVoiceEmbedding = await embeddingService.generate(brandVoice.samples.join(' '));
const captionEmbedding = await embeddingService.generate(generatedCaption);
const matchScore = cosineSimilarity(brandVoiceEmbedding, captionEmbedding);
// matchScore: 0.0 (no match) to 1.0 (perfect match)

Async Processing

Embedding generation happens asynchronously via the post-classify queue:

This ensures embedding generation doesn't slow down post creation.


Database Models

model PostEmbedding {
id String @id @default(cuid())
postId String @unique
embedding Float[] // 768-dimensional vector
model String // "text-embedding-004"
createdAt DateTime @default(now())

@@index([postId])
}

Cross-Reference