Summarization Pipelines
Problem
LLM summarization models work best with clean, focused content. Raw HTML includes navigation, ads, and boilerplate that wastes tokens and reduces summary quality.
Solution
Extract clean content first, then send it to your summarization model. This reduces token usage, improves summary quality, and ensures summaries focus on the main content.
Example Workflow
- Extract clean content from HTML using Content Extractor API
- Send clean content to your LLM summarization model
- Generate summary from focused content
- Store summary with original article metadata
Example Request
// Extract clean content
const extractResponse = await fetch('https://api.content-extractor.devstools.net/v1/extract', {
method: 'POST',
headers: {
'Authorization': 'Bearer ce_your_token_here',
'Content-Type': 'application/json'
},
body: JSON.stringify({ html: rawHtml })
});
const { title, content } = await extractResponse.json();
// Summarize clean content
const summary = await llm.summarize(`Title: ${title}
Content: ${content}`);
console.log('Summary:', summary);Benefits
- Reduced token usage (no navigation/ads)
- Higher quality summaries focused on main content
- Faster processing with less content
- Lower costs per summary