Embeddings & Classification
UniPulse uses Google's text-embedding-004 model to generate vector embeddings for content similarity, classification, and audience clustering.
Embedding Model
| Property | Value |
|---|---|
| Model | text-embedding-004 |
| Provider | Google AI (via @google/genai) |
| Dimensions | 768 |
| Storage | PostEmbedding table in PostgreSQL |
Embedding Service
embedding.service.ts provides vector embedding operations:
| Function | Description | Input | Output |
|---|---|---|---|
generate() | Generate embedding for text | Text string | Float array (768 dims) |
search() | Find similar content by embedding | Query embedding, threshold | Ranked results |
similarity() | Compute cosine similarity between two texts | Text A, Text B | Score (0-1) |
// Generate embedding for a post
const embedding = await embeddingService.generate(post.caption);
// Store in database
await prisma.postEmbedding.create({
data: {
postId: post.id,
embedding: embedding, // Float array
model: 'text-embedding-004',
},
});
// Find similar posts
const similar = await embeddingService.search(queryEmbedding, {
workspaceId,
threshold: 0.8,
limit: 10,
});
Use Cases
Content Similarity
Find posts similar to a given post for content recommendations:
| Application | Description |
|---|---|
| Content recommendations | Suggest similar performing posts for inspiration |
| Duplicate detection | Detect near-duplicate content before publishing |
| Content clustering | Group posts by semantic topic |
Post Classification
classification.service.ts uses embeddings combined with Gemini to classify post content:
| Classification | Categories | Purpose |
|---|---|---|
| Content type | Educational, promotional, entertainment, inspirational, news, behind-the-scenes | Analytics breakdown by type |
| Topic tags | Auto-generated topic labels | Content organization |
| Sentiment | Positive, negative, neutral | Sentiment analysis |
| Performance bucket | High, medium, low predicted engagement | Content strategy |
The PostClassification model stores results:
model PostClassification {
id String @id @default(cuid())
postId String @unique
category String // Content type
tags String[] // Topic tags
sentiment String // Sentiment
confidence Float // Classification confidence
}
Audience Clustering
Embeddings of audience interactions are used to cluster similar audience members:
| Application | Description |
|---|---|
| Segment discovery | Auto-discover audience segments based on behavior patterns |
| Interest profiling | Map audience members to interest topics |
| Lookalike audiences | Find audience members similar to high-value customers |
Brand Voice Matching
Measure how closely generated content matches the workspace's brand voice:
const brandVoiceEmbedding = await embeddingService.generate(brandVoice.samples.join(' '));
const captionEmbedding = await embeddingService.generate(generatedCaption);
const matchScore = cosineSimilarity(brandVoiceEmbedding, captionEmbedding);
// matchScore: 0.0 (no match) to 1.0 (perfect match)
Async Processing
Embedding generation happens asynchronously via the post-classify queue:
This ensures embedding generation doesn't slow down post creation.
Database Models
model PostEmbedding {
id String @id @default(cuid())
postId String @unique
embedding Float[] // 768-dimensional vector
model String // "text-embedding-004"
createdAt DateTime @default(now())
@@index([postId])
}
Cross-Reference
- Gemini API -- text-embedding-004 model details
- Conversation Engine -- embeddings in audience context
- Queue System -- post-classify queue
- Schema Overview -- PostEmbedding and PostClassification models