Skip to main content

PDF Ingestion

Upload and process documents directly inside Snowflake.

Use this page when you want to add new PDF or ZIP files to your dataset.


Upload & Processing Interface

PDF Ingestion Interface

Upload files and select the appropriate processing mode.

Supported formats:

  • Single PDF file
  • ZIP archive containing multiple PDFs

Processing Modes

Choose a mode based on document quality and structure.

ModeBest ForUse When
Simple ($)Clean digital PDFsText is selectable and well-formatted
Moderate ($$)Mixed layoutsTables, headers, multi-column text
Complex ($$$)Scanned / noisy PDFsImages, poor OCR, complex layouts

Start with Simple. Upgrade only if extraction quality is poor.


File Upload Options

Single PDF

Use when uploading one document.

ZIP File

Use when uploading multiple PDFs in batch.

Each PDF inside the ZIP is processed separately.

Limit: 200MB per file


Requirements

Before uploading, ensure:

  • CREATE TABLE permissions on your target schema
  • Sufficient data access grants if the tables are already created as outlined in access grants.

Large File Processing

Staging from Local Environment

For files (PDF/Zip) greater than 200MB, you can process them directly through SnowSQL staging:

  1. Stage the file using SnowSQL:

    PUT file://path/to/your/large-file @my_stage;
  2. Process staged files through the ZettaQuant application interface

Note: For detailed information on staging files from your local environment, refer to the Snowflake documentation on PUT command.

Best Practices for Large Files:

  • Compress files before staging to reduce transfer time
  • Use internal stages for better performance
  • Consider splitting very large archives into smaller batches

Troubleshooting

  • Upload or ingestion fails → Check execution details in Telemetry & Logs

  • Permission errors → Confirm required privileges are applied in Data Access Grants

  • No output tables created → Verify Data Configuration is completed and tables exist in Snowflake

  • Poor text extraction / missing tables → Try switching to Moderate or Complex processing mode

  • ZIP file not processed → Ensure the ZIP contains valid .pdf files only

  • Duplicate document warning → File already exists in the dataset and was skipped

  • Slow or stalled processing → Verify warehouse size and GPU compute pool status

  • PDF preview not loading → Refresh the page and reselect the uploaded file


Next Steps

After ingestion:

  1. ZQ Classify — Run analysis
  2. Telemetry — Monitor processing