Data Configuration
Configure your data sources and processing options for the ZettaQuant SEAL pipeline.
Data Source Options
ZettaQuant supports multiple data ingestion methods to fit your workflow:
Existing Snowflake Tables
Connect to your current document and sentence tables with minimal setup.
Prerequisites:
- Document table with
document_idcolumn - Sentence table with
document_id,sentence_id, andsentence_textcolumns - Proper access grants configured
PDF Ingestion
Upload and process PDF documents directly through the Streamlit interface.
Features:
- Drag-and-drop interface for single PDF files or ZIP files containing PDF files
- Automatic text extraction and table creation
- Batch processing capabilities
Requirements:
CREATE TABLEpermissions on your target schema- Sufficient data access grants if the tables are already created as outlined in access grants.
Large File Processing
Staging from Local Environment
For files (PDF/Zip) greater than 200MB, you can process them directly through SnowSQL staging:
-
Stage the file using SnowSQL:
PUT file://path/to/your/large-file @my_stage; -
Process staged files through the ZettaQuant application interface
Note: For detailed information on staging files from your local environment, refer to the Snowflake documentation on PUT command.
Best Practices for Large Files:
- Compress files before staging to reduce transfer time
- Use internal stages for better performance
- Consider splitting very large archives into smaller batches
Troubleshooting
Common Issues
- Permission Errors: Verify data access grants
- Processing Failures: Check compute pool status
- Format Errors: Validate input document structure