Core Concepts

Core Concepts#

This page explains the key concepts and architecture behind folio-data-import.

Architecture Overview#

folio-data-import provides three specialized tools for loading data into FOLIO:

MARC Data Import: Loads bibliographic MARC records via FOLIO’s Data Import change-manager APIs
User Import: Loads user records from JSON Lines files via FOLIO’s /users and /service-points-users APIs
Batch Poster: Posts Inventory records (Instances, Holdings, Items) via batch storage APIs

Each tool is optimized for its specific data type and uses appropriate FOLIO APIs.

MARC Data Import#

FOLIO Data Import System#

The MARC tool uses FOLIO’s Data Import system via the change-manager APIs. This requires:

Job Profiles: Pre-configured workflows in FOLIO that define how MARC records are processed
Match Profiles: Criteria for identifying existing records to update
Action Profiles: Instructions for creating or updating FOLIO records
Mapping Profiles: Rules for extracting data from MARC fields to FOLIO fields

Note

Job Profiles are configured in FOLIO, not in this tool. The tool uploads MARC records to FOLIO’s Data Import system using a selected Job Profile. All field mapping, matching, and action logic is defined in FOLIO.

Workflow#

Initialize a job execution via /change-manager/jobExecutions
Associate the selected Job Profile with the job
Upload MARC records in chunks via /change-manager/jobExecutions/{id}/records
FOLIO processes records asynchronously according to the Job Profile
Job status can be monitored in FOLIO’s Data Import logs

MARC Preprocessors#

Optional preprocessors modify MARC records before upload to FOLIO. Available preprocessors include:

clean_999_fields: Removes 999 fields with first indicator ‘f’ and second indicator ‘f’; moves other 999 fields to 945
strip_999_ff_fields: Removes only 999 fields with indicators ‘f’, ‘f’
clean_non_ff_999_fields: Moves non-ff 999 fields to 945 with indicators ‘9’, ‘9’
prepend_prefix_001: Adds a custom prefix to the 001 field (requires configuration)
prepend_ppn_prefix_001: Adds “(PPN)” prefix to the 001 field
prepend_abes_prefix_001: Adds “(ABES)” prefix to the 001 field
fix_bib_leader: Corrects invalid MARC leader values
clean_empty_fields: Removes fields with no subfields or empty subfield values
sudoc_supercede_prep: SUDOC-specific processing for superseded records

See the MARC Preprocessors Guide for detailed documentation.

User Import#

Input Format#

The User tool reads JSON Lines files where each line is a complete user object following the mod-user-import schema:

{"username": "jdoe", "externalSystemId": "12345", "patronGroup": "undergraduate", "personal": {"lastName": "Doe", "firstName": "John"}}
{"username": "asmith", "externalSystemId": "12346", "patronGroup": "faculty", "personal": {"lastName": "Smith", "firstName": "Alice"}}

This tool is an alternative to FOLIO’s built-in /user-import API, offering additional features and more granular control.

User Matching#

The tool matches incoming users against existing FOLIO users to determine whether to create or update:

Default: Match by externalSystemId
Alternative: Match by username or barcode via --user-match-key
ID Override: If the input record contains an id field, that UUID is always used for matching (takes precedence over --user-match-key)

Reference Data Resolution#

The tool automatically resolves human-readable names to FOLIO UUIDs:

Patron group names → Patron group UUIDs
Service point codes → Service point UUIDs
Address type names → Address type UUIDs

Service Points#

Service point assignments are handled separately from the user record via FOLIO’s /service-points-users API:

{
  "username": "jdoe",
  "servicePointsUser": {
    "servicePointsIds": ["MAIN-CIRC", "BRANCH-REF"],
    "defaultServicePointId": "MAIN-CIRC"
  }
}

The servicePointsUser object is extracted from the input and processed separately after the user is created/updated.

Field Protection#

Two mechanisms protect fields from being overwritten during updates:

CLI option: --fields-to-protect username,email,barcode
Per-user setting: customFields.protectedFields in the existing user record (comma-separated string)

Both sources are combined when determining which fields to preserve.

Workflow#

JSON Lines File (.jsonl)
    ↓
Parse user objects
    ↓
Resolve reference data (patron groups, address types)
    ↓
Query existing user (by externalSystemId/username/barcode/id)
    ↓
If exists: Update (preserving protected fields)
If new: Create user
    ↓
POST/PUT via /users API
    ↓
Handle servicePointsUser separately via /service-points-users API

Batch Poster#

Inventory Records Only#

Batch Poster works with FOLIO Inventory storage records:

Instances: Bibliographic records
Holdings: Holdings records attached to instances
Items: Item records attached to holdings
ShadowInstances: Consortium shadow copies (ECS environments)

Input Format#

Input files are JSON Lines format with one record per line:

{"id": "uuid-1", "title": "The Great Gatsby", "instanceTypeId": "uuid", "source": "FOLIO"}
{"id": "uuid-2", "title": "1984", "instanceTypeId": "uuid", "source": "FOLIO"}

Batch Storage APIs#

The tool uses FOLIO’s batch synchronous storage endpoints:

/item-storage/batch/synchronous
/holdings-storage/batch/synchronous
/instance-storage/batch/synchronous

These endpoints accept arrays of records and process them synchronously.

Upsert Mode#

When --upsert is enabled:

Records are matched by id field
Existing records are fetched to get their _version for optimistic locking
New records are created; existing records are updated

Preservation Options (Upsert Only)#

When updating existing records, you can preserve specific data:

--preserve-statistical-codes: Keep existing statistical codes (merged with new)
--preserve-administrative-notes: Keep existing administrative notes (merged with new)
--preserve-temporary-locations: Keep existing temporary location (Items only)
--preserve-temporary-loan-types: Keep existing temporary loan type (Items only)
Item status is preserved by default; use --overwrite-item-status to change

Always-Protected Fields#

Certain fields are always preserved from existing records, regardless of configuration:

hrid (human-readable ID): Changing it would break external references
lastCheckIn (Items only): Circulation data that should not be overwritten

MARC Source Protection#

For Instance records with a MARC source (e.g., source: "MARC" or source: "CONSORTIUM-MARC"), Batch Poster automatically restricts patching to only these fields:

discoverySuppress, staffSuppress, deleted (suppression flags)
statisticalCodeIds, administrativeNotes, instanceStatusId

This prevents overwriting MARC-managed fields like title or contributors, which would be reverted on the next SRS update anyway.

Selective Patching#

For fine-grained updates, use --patch-existing-records with --patch-paths:

--upsert --patch-existing-records --patch-paths "barcode,materialTypeId"

This updates only the specified fields while preserving all others from the existing record.

Consortium Shadow Instances#

For FOLIO ECS (consortium) environments, use --object-type ShadowInstances to post shadow copies to member tenants. This automatically converts the source field to consortium format:

MARC → CONSORTIUM-MARC
FOLIO → CONSORTIUM-FOLIO

Use --member-tenant-id to specify the target member tenant.

Rerunning Failed Records#

When --rerun-failed-records is enabled (along with --failed-records-file), the tool automatically reprocesses any failed records one at a time after the main batch run completes:

folio-data-import batch-poster --object-type Items \
  --file-path items.jsonl --upsert \
  --failed-records-file failed.jsonl --rerun-failed-records

This streams through the failed records file, giving each record an individual retry. Records that still fail are written to a new file with _rerun suffix (e.g., failed_rerun.jsonl).

Workflow#

JSON Lines File (.jsonl)
    ↓
Parse records
    ↓
Batch records (default: 100 per batch)
    ↓
If upsert: Fetch existing records for _version
    ↓
POST to /{type}-storage/batch/synchronous?upsert=true
    ↓
Track success/failure per batch

Common Patterns#

Batch Processing#

All tools process records in batches:

Configurable batch size
Efficient API usage
Progress tracking per batch
Failed record tracking

Progress Tracking#

Real-time progress bars show:

Total records to process
Records completed
Success/failure counts

Disable for CI/CD environments: --no-progress

Error Handling#

MARC Import:

Errors are tracked in FOLIO’s Data Import system
Job IDs are saved locally to marc_import_job_ids.txt
Check job status and errors in FOLIO UI (Data Import logs)

User Import:

Failed user records are written to failed_user_import_TIMESTAMP.txt
File contains only the raw user JSON (errors are logged to console/log file)
Processing continues after failures

Batch Poster:

Failed batches are written to a specified file (--failed-records-file)
Detailed error messages logged
Processing continues after batch failures

Authentication#

All tools authenticate via FOLIO’s login API:

--gateway-url https://folio-snapshot-okapi.dev.folio.org \
--tenant-id diku \
--username diku_admin \
--password admin

Or via environment variables:

export FOLIO_GATEWAY_URL="https://folio-snapshot-okapi.dev.folio.org"
export FOLIO_TENANT_ID="diku"
export FOLIO_USERNAME="diku_admin"
export FOLIO_PASSWORD="admin"

Configuration Files#

All tools accept JSON configuration files:

folio-data-import marc config.json
folio-data-import users config.json
folio-data-import batch-poster config.json

Configuration files use snake_case keys matching the CLI parameter names.

Core Concepts

Contents

Core Concepts#

Architecture Overview#

MARC Data Import#

FOLIO Data Import System#

Workflow#

MARC Preprocessors#

User Import#

Input Format#

User Matching#

Reference Data Resolution#

Service Points#

Field Protection#

Workflow#

Batch Poster#

Inventory Records Only#

Input Format#

Batch Storage APIs#

Upsert Mode#

Preservation Options (Upsert Only)#

Always-Protected Fields#

MARC Source Protection#

Selective Patching#

Consortium Shadow Instances#

Rerunning Failed Records#

Workflow#

Common Patterns#

Batch Processing#

Progress Tracking#

Error Handling#

Authentication#

Configuration Files#

See Also#