Data Import Log Retriever Guide#
The Data Import Log Retriever (get-di-logs) is a diagnostic tool that retrieves error logs and associated MARC records from failed Data Import jobs by querying FOLIO’s PostgreSQL database directly.
Overview#
When MARC records fail to import through FOLIO’s Data Import system, the error information is stored in the database but can be difficult to access through the UI. This tool:
Queries the PostgreSQL database for failed records
Extracts the original MARC record that caused the error
Generates a TSV error report with error messages
Creates a MARC file containing all failed records for reprocessing
Supports SSH tunneling for secure database access
Note
This tool requires direct PostgreSQL database access, which is typically only available to system administrators or in hosted environments with proper credentials.
Prerequisites#
PostgreSQL Dependencies#
The get-di-logs command requires the PostgreSQL optional dependencies:
# Using pip
pip install 'folio_data_import[postgres]'
# Using uv
uv add 'folio_data_import[postgres]'
Required Access#
FOLIO API credentials - Username/password with read access
PostgreSQL database access - Direct connection to the FOLIO database
SSH access (optional) - If the database is only accessible via a bastion/jump host
Job Execution IDs#
You need the Job Execution IDs from the Data Import jobs you want to analyze. These can be found:
In FOLIO’s Data Import logs UI
In the
marc_import_job_ids.txtfile generated by themarccommandVia the FOLIO API at
/change-manager/jobExecutions
Basic Usage#
Command Line#
folio-data-import get-di-logs \
--folio-url https://okapi.folio.example.com \
--folio-tenant my_tenant \
--folio-username admin \
--db-config db_config.json \
--job-ids-file marc_import_job_ids.txt
Using the Standalone Script#
folio-di-logs \
--folio-url https://okapi.folio.example.com \
--folio-tenant my_tenant \
--folio-username admin \
--db-config db_config.json \
--job-ids-file marc_import_job_ids.txt
Configuration Files#
Database Configuration (db_config.json)#
The database configuration file specifies how to connect to the PostgreSQL database:
{
"host": "folio-db.cluster-abc123.us-east-1.rds.amazonaws.com",
"port": 5432,
"database": "folio",
"user": "folio_readonly",
"password": "your_password_here"
}
Field |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
- |
PostgreSQL server hostname |
|
No |
|
PostgreSQL port |
|
Yes |
- |
Database name (usually |
|
Yes |
- |
Database username |
|
No |
- |
Database password (can be omitted for passwordless auth) |
SSH Tunnel Configuration (ssh_config.json)#
If the database is only accessible through an SSH bastion host, provide an SSH configuration:
{
"ssh_tunnel": true,
"ssh_host": "bastion.example.com",
"ssh_user": "tunnel_user",
"ssh_private_key_path": "~/.ssh/id_rsa",
"use_ssh_config": true,
"ssh_path": "/usr/bin/ssh"
}
Field |
Required |
Default |
Description |
|---|---|---|---|
|
Yes |
|
Enable SSH tunneling |
|
If tunneling |
- |
SSH bastion/jump host |
|
No |
- |
SSH username (uses SSH config if not set) |
|
No |
- |
Path to SSH private key |
|
No |
|
Use |
|
No |
|
Path to SSH binary |
AWS SSM Session Manager Example#
For AWS environments using SSM Session Manager:
{
"ssh_tunnel": true,
"ssh_host": "i-0abc123def456789",
"use_ssh_config": true
}
With corresponding ~/.ssh/config:
Host i-*
ProxyCommand sh -c "aws ssm start-session --target %h --document-name AWS-StartSSHSession --parameters 'portNumber=%p'"
User ec2-user
Job IDs File#
A plain text file with one Job Execution ID per line:
a1b2c3d4-e5f6-7890-abcd-ef1234567890
b2c3d4e5-f6a7-8901-bcde-f12345678901
c3d4e5f6-a7b8-9012-cdef-123456789012
Lines starting with # or empty lines are ignored.
Command-Line Parameters#
Connection Parameters#
Parameter |
Environment Variable |
Description |
|---|---|---|
|
|
FOLIO Gateway/Okapi URL |
|
|
FOLIO tenant ID |
|
|
FOLIO username |
|
|
FOLIO password |
Database Parameters#
Parameter |
Default |
Description |
|---|---|---|
|
Required |
Path to database configuration JSON file |
|
(none) |
Path to SSH tunnel configuration JSON file |
Input/Output Parameters#
Parameter |
Default |
Description |
|---|---|---|
|
|
Path to file containing Job Execution IDs |
|
|
Path for the error report TSV output |
|
|
Path for the MARC records output |
Other Options#
Parameter |
Default |
Description |
|---|---|---|
|
|
Disable progress bar display |
|
|
Enable debug logging |
Output Files#
Error Report (TSV)#
A tab-separated file with error details:
'Error Log' 'MARC Record'
'"Error processing record: Invalid indicator value"' '00732nam a2200241...'
'"Duplicate record detected"' '00845cam a2200289...'
The report can be opened in Excel or any spreadsheet application for analysis.
MARC Records File#
A binary MARC21 file (.mrc) containing all the records that failed to import. This file can be:
Edited to fix issues using MarcEdit or similar tools
Reprocessed through Data Import after corrections
Analyzed with pymarc or other MARC processing tools
Workflow Examples#
Basic Error Retrieval#
# After running a MARC import that had errors
folio-data-import get-di-logs \
--db-config db_config.json \
--job-ids-file marc_import_job_ids.txt \
--report-file-path errors_2024.tsv \
--marc-file-path failed_records_2024.mrc
With SSH Tunnel#
folio-data-import get-di-logs \
--db-config db_config.json \
--ssh-config ssh_config.json \
--job-ids-file job_ids.txt
Using Environment Variables#
export FOLIO_URL="https://okapi.folio.example.com"
export FOLIO_TENANT="my_tenant"
export FOLIO_USERNAME="admin"
export FOLIO_PASSWORD="secret"
folio-data-import get-di-logs \
--db-config db_config.json \
--job-ids-file job_ids.txt
Complete Migration Error Analysis Workflow#
# 1. Run the MARC import
folio-data-import marc \
--marc-file-path migration_records.mrc \
--import-profile-name "Default - Create Instance and SRS MARC Bib" \
--job-ids-file-path migration_job_ids.txt
# 2. After import completes, retrieve any errors
folio-data-import get-di-logs \
--db-config db_config.json \
--job-ids-file migration_job_ids.txt \
--report-file-path migration_errors.tsv \
--marc-file-path migration_failed.mrc
# 3. Review errors, fix MARC records, and re-import
folio-data-import marc \
--marc-file-path migration_failed_fixed.mrc \
--import-profile-name "Default - Create Instance and SRS MARC Bib"
Programmatic Usage#
The DILogRetriever class can be used directly in Python scripts:
import folioclient
from folio_data_import.DILogRetriever import DILogRetriever
from folio_data_import._postgres import PostgresConfig, SSHTunnelConfig
from folio_data_import._progress import NoOpProgressReporter
# Create FOLIO client
folio_client = folioclient.FolioClient(
gateway_url="https://okapi.folio.example.com",
tenant_id="my_tenant",
username="admin",
password="secret",
)
# Database configuration
db_config = PostgresConfig(
host="folio-db.example.com",
port=5432,
database="folio",
user="folio",
password="db_password",
)
# SSH tunnel (optional)
ssh_config = SSHTunnelConfig(ssh_tunnel=False)
# Create retriever
retriever = DILogRetriever(
folio_client=folio_client,
db_config=db_config,
ssh_tunnel_config=ssh_config,
progress_reporter=NoOpProgressReporter(),
)
# Retrieve errors
job_ids = ["a1b2c3d4-e5f6-7890-abcd-ef1234567890"]
error_logs = retriever.retrieve_errors_with_marc(job_ids)
# Process results
for error_message, marc_record in error_logs:
print(f"Error: {error_message}")
print(f"Title: {marc_record.title}")
# Or generate report files
retriever.generate_error_report_and_marc_file(
error_logs=error_logs,
report_file_path="errors.tsv",
marc_file_path="failed.mrc",
)
Troubleshooting#
“PostgreSQL support requires the ‘postgres’ optional dependencies”#
Install the PostgreSQL dependencies:
pip install 'folio_data_import[postgres]'
Connection Refused / Timeout#
Verify the database host and port are correct
Check firewall rules allow the connection
If using SSH tunneling, verify the SSH configuration
Ensure the database is accepting connections from your IP
“ssh_host is required when ssh_tunnel is enabled”#
Your ssh_config.json has "ssh_tunnel": true but is missing ssh_host. Add the bastion host:
{
"ssh_tunnel": true,
"ssh_host": "your-bastion-host.example.com"
}
No Records Found#
Verify the Job Execution IDs are correct
Check that the jobs actually had errors (not just warnings)
Ensure you’re connecting to the correct tenant’s database
Permission Denied#
The database user needs SELECT permissions on:
{tenant}_mod_source_record_manager.journal_records{tenant}_mod_source_record_manager.incoming_records
Security Considerations#
Warning
This tool requires direct database access. Follow your organization’s security policies:
Store credentials securely (use environment variables or secrets management)
Use read-only database accounts when possible
Prefer SSH tunneling over direct database exposure
Rotate passwords regularly
Audit access to database credentials
See Also#
MARC Data Import Guide - MARC import documentation
Troubleshooting - General troubleshooting guide