Web Page Reader

Extract Clean Content from Any URL

Powerful web page reader that reads and processes web pages, extracting clean text, metadata, and structured content. Perfect for content analysis, research, and data processing workflows.

Any WebsiteReal-time ProcessingClean & Safe

Two Powerful Crawling Methods

Choose between single URL extraction or batch processing for multiple URLs simultaneously.

Single URL Reader

Crawl a single webpage with detailed analysis and metadata.

Clean, readable text crawling
Metadata extraction (title, description, author)
Reading time estimation
Content length analysis
Fast processing (< 3 seconds)

Batch URL Reader

Process multiple URLs simultaneously for efficient bulk content extraction.

Process up to 10 URLs at once
Parallel processing for speed
Bulk content analysis
Error handling per URL
Cost-effective for large datasets

Implementation Examples

Get started quickly with these code examples for both TypeScript and cURL.

Single URL Reader

Extract content from a single URL with detailed metadata and analysis.

Code Example

Copy and customize the code for your integration

import { Zapserp, Page, PageMetadata } from 'zapserp'

const extractContent = async () => {
  const zapserp = new Zapserp({
    apiKey: 'YOUR_API_KEY'
  })
  
  const result: Page = await zapserp.reader({
    url: 'https://example.com/article'
  })
  
  console.log('Title:', result.title)
  console.log('Content:', result.content)
  console.log('Content Length:', result.contentLength)
  console.log('URL:', result.url)
  
  // Access metadata properties
  if (result.metadata) {
    console.log('Description:', result.metadata.description)
    console.log('Author:', result.metadata.author)
    console.log('Published Time:', result.metadata.publishedTime)
    console.log('Keywords:', result.metadata.keywords)
    console.log('OG Title:', result.metadata.ogTitle)
    console.log('OG Description:', result.metadata.ogDescription)
    console.log('OG Image:', result.metadata.ogImage)
  }
  
  return result
}

// Helper function to safely access metadata
const getMetadataValue = (metadata: PageMetadata | undefined, key: keyof PageMetadata): string => {
  return metadata?.[key] || 'Not available'
}

// Usage with metadata handling
extractContent().then(content => {
  if (content) {
    console.log('Content extracted successfully!')
    console.log('Author:', getMetadataValue(content.metadata, 'author'))
    console.log('Keywords:', getMetadataValue(content.metadata, 'keywords'))
    console.log('OG Image:', getMetadataValue(content.metadata, 'ogImage'))
  }
})

Powerful Features

Advanced content extraction with comprehensive analysis and metadata

Smart Content Extraction

Advanced algorithms identify and extract the main content while filtering out ads, navigation, and clutter.

Rich Metadata

Extract titles, descriptions, authors, publication dates, and other structured metadata from web pages.

Reading Time Analysis

Automatic calculation of estimated reading time based on content length and complexity.

Error Handling

Robust error handling for broken links, timeout issues, and inaccessible content.

Fast Processing

Optimized extraction engine processes most pages in under 3 seconds with high accuracy.

Global Support

Support for websites in multiple languages and character sets with proper encoding handling.

Perfect for These Use Cases

Content Research

Extract articles, blog posts, and research papers for analysis, summarization, or competitive research.

Data Collection

Gather content from multiple sources for training datasets, content aggregation, or market research.

Content Migration

Extract content from old websites or platforms for migration to new systems or content management platforms.

SEO Analysis

Analyze competitor content, extract metadata, and understand content structure for SEO optimization.

Accelerate Your Research Workflow

Enterprise-grade search aggregation and content extraction for researchers, analysts, and content teams

Enterprise Security

99.9% Uptime

24/7 Support