Skip to main content

Overview

The json_schema option lets you define a custom extraction schema so the parser returns only the structured fields you specify. This is powerful for extracting specific data points from any web page without writing custom parsing logic.

Schema Format

The schema is an object with a properties array. Each property defines a field to extract:
{
  "properties": [
    {
      "name": "field_name",
      "type": "string"
    }
  ]
}

Supported Types

TypeDescription
stringText content
numberNumeric values
booleanTrue/false values
mapNested object — use properties for nested fields
arrayList of items — use items to define the item schema

Examples

Extract Article Metadata

{
  "url": "https://example.com/blog/article",
  "options": {
    "json_schema": {
      "properties": [
        { "name": "title", "type": "string" },
        { "name": "author", "type": "string" },
        { "name": "publish_date", "type": "string" },
        { "name": "word_count", "type": "number" }
      ]
    }
  }
}

Extract Product Data

{
  "url": "https://example.com/product/12345",
  "options": {
    "json_schema": {
      "properties": [
        { "name": "product_name", "type": "string" },
        { "name": "price", "type": "number" },
        { "name": "in_stock", "type": "boolean" },
        {
          "name": "reviews",
          "type": "array",
          "items": {
            "properties": [
              { "name": "rating", "type": "number" },
              { "name": "text", "type": "string" },
              { "name": "author", "type": "string" }
            ]
          }
        }
      ]
    }
  }
}

Nested Objects

{
  "url": "https://example.com/company",
  "options": {
    "json_schema": {
      "properties": [
        { "name": "company_name", "type": "string" },
        {
          "name": "contact",
          "type": "map",
          "properties": [
            { "name": "email", "type": "string" },
            { "name": "phone", "type": "string" },
            { "name": "address", "type": "string" }
          ]
        }
      ]
    }
  }
}

Notes

  • JSON schema extraction works with any URL, not just supported platforms
  • The parser uses AI to intelligently match your schema fields to page content
  • Field names should be descriptive — the parser uses them to understand what data to extract
  • JSON schema extraction costs 1 token (standard) unless combined with AI options