Powerful web page reader that reads and processes web pages, extracting clean text, metadata, and structured content. Perfect for content analysis, research, and data processing workflows.
Choose between single URL extraction or batch processing for multiple URLs simultaneously.
Get started quickly with these code examples for both TypeScript and cURL.
import { Zapserp, Page, PageMetadata } from 'zapserp'
const extractContent = async () => {
const zapserp = new Zapserp({
apiKey: 'YOUR_API_KEY'
})
const result: Page = await zapserp.reader({
url: 'https://example.com/article'
})
console.log('Title:', result.title)
console.log('Content:', result.content)
console.log('Content Length:', result.contentLength)
console.log('URL:', result.url)
// Access metadata properties
if (result.metadata) {
console.log('Description:', result.metadata.description)
console.log('Author:', result.metadata.author)
console.log('Published Time:', result.metadata.publishedTime)
console.log('Keywords:', result.metadata.keywords)
console.log('OG Title:', result.metadata.ogTitle)
console.log('OG Description:', result.metadata.ogDescription)
console.log('OG Image:', result.metadata.ogImage)
}
return result
}
// Helper function to safely access metadata
const getMetadataValue = (metadata: PageMetadata | undefined, key: keyof PageMetadata): string => {
return metadata?.[key] || 'Not available'
}
// Usage with metadata handling
extractContent().then(content => {
if (content) {
console.log('Content extracted successfully!')
console.log('Author:', getMetadataValue(content.metadata, 'author'))
console.log('Keywords:', getMetadataValue(content.metadata, 'keywords'))
console.log('OG Image:', getMetadataValue(content.metadata, 'ogImage'))
}
})
Advanced content extraction with comprehensive analysis and metadata
Advanced algorithms identify and extract the main content while filtering out ads, navigation, and clutter.
Extract titles, descriptions, authors, publication dates, and other structured metadata from web pages.
Automatic calculation of estimated reading time based on content length and complexity.
Robust error handling for broken links, timeout issues, and inaccessible content.
Optimized extraction engine processes most pages in under 3 seconds with high accuracy.
Support for websites in multiple languages and character sets with proper encoding handling.
Extract articles, blog posts, and research papers for analysis, summarization, or competitive research.
Gather content from multiple sources for training datasets, content aggregation, or market research.
Extract content from old websites or platforms for migration to new systems or content management platforms.
Analyze competitor content, extract metadata, and understand content structure for SEO optimization.