API Reference#

Auto-generated reference documentation for the FOLIO Data Import Python API.

Main Modules#

BatchPoster#

The core module for batch posting inventory records to FOLIO.

BatchPoster module for FOLIO inventory batch operations.

This module provides functionality for batch posting of Instances, Holdings, and Items to FOLIO’s inventory storage endpoints with support for upsert operations.

class folio_data_import.BatchPoster.BatchPosterStats(**data)

Bases: BaseModel

Statistics for batch posting operations.

records_processed: int
records_posted: int
records_created: int
records_updated: int
records_failed: int
batches_posted: int
batches_failed: int
rerun_succeeded: int
rerun_still_failed: int
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

folio_data_import.BatchPoster.get_api_info(object_type)

Get API endpoint information for a given object type.

Parameters:

object_type (str) – The type of object (Instances, Holdings, Items)

Return type:

Dict[str, Any]

Returns:

Dictionary containing API endpoint information

Raises:

ValueError – If object_type is not supported

folio_data_import.BatchPoster.deep_update(target, source)

Recursively update target dictionary with values from source dictionary.

Parameters:
  • target (dict) – The dictionary to update

  • source (dict) – The dictionary to merge into target

Return type:

None

folio_data_import.BatchPoster.extract_paths(record, paths)

Extract specified paths from a record.

Parameters:
  • record (dict) – The record to extract from

  • paths (List[str]) – List of JSON paths to extract (e.g., [‘statisticalCodeIds’, ‘status’])

Return type:

dict

Returns:

Dictionary containing only the specified paths

class folio_data_import.BatchPoster.BatchPoster(folio_client, config, failed_records_file=None, reporter=None)

Bases: object

Handles batch posting of inventory records to FOLIO.

This class provides functionality for posting Instances, Holdings, and Items to FOLIO’s batch inventory endpoints with support for upsert operations.

class Config(**data)

Bases: BaseModel

Configuration for BatchPoster operations.

object_type: Annotated[Literal['Instances', 'Holdings', 'Items', 'ShadowInstances']]
batch_size: Annotated[int]
upsert: Annotated[bool]
preserve_statistical_codes: Annotated[bool]
preserve_administrative_notes: Annotated[bool]
preserve_temporary_locations: Annotated[bool]
preserve_temporary_loan_types: Annotated[bool]
preserve_item_status: Annotated[bool]
patch_existing_records: Annotated[bool]
patch_paths: Annotated[Optional[List[str]]]
rerun_failed_records: Annotated[bool]
no_progress: Annotated[bool]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

__init__(folio_client, config, failed_records_file=None, reporter=None)

Initialize BatchPoster.

Parameters:
  • folio_client (FolioClient) – Authenticated FOLIO client

  • config (Config) – Configuration for batch posting

  • failed_records_file – Optional file handle or path for writing failed records. Can be an open file handle (managed by caller) or a string/Path (will be opened/closed by BatchPoster).

  • reporter (ProgressReporter | None) – Optional progress reporter. If None, uses NoOpProgressReporter.

async __aenter__()

Async context manager entry.

async __aexit__(exc_type, exc_val, exc_tb)

Async context manager exit.

handle_upsert_for_statistical_codes(updates, keep_existing)

Handle statistical codes during upsert based on configuration.

Parameters:
  • updates (dict) – Dictionary being prepared for update

  • keep_existing (dict) – Dictionary of fields to preserve from existing record

Return type:

None

handle_upsert_for_administrative_notes(updates, keep_existing)

Handle administrative notes during upsert based on configuration.

Parameters:
  • updates (dict) – Dictionary being prepared for update

  • keep_existing (dict) – Dictionary of fields to preserve from existing record

Return type:

None

handle_upsert_for_temporary_locations(updates, keep_existing)

Handle temporary locations during upsert based on configuration.

Parameters:
  • updates (dict) – Dictionary being prepared for update

  • keep_existing (dict) – Dictionary of fields to preserve from existing record

Return type:

None

handle_upsert_for_temporary_loan_types(updates, keep_existing)

Handle temporary loan types during upsert based on configuration.

Parameters:
  • updates (dict) – Dictionary being prepared for update

  • keep_existing (dict) – Dictionary of fields to preserve from existing record

Return type:

None

keep_existing_fields(updates, existing_record)

Preserve specific fields from existing record during upsert.

Always preserves hrid (human-readable ID) and lastCheckIn (circulation data) from existing records to prevent data loss. Optionally preserves status based on configuration.

Parameters:
  • updates (dict) – Dictionary being prepared for update

  • existing_record (dict) – The existing record in FOLIO

Return type:

None

patch_record(new_record, existing_record, patch_paths)

Update new_record with values from existing_record according to patch_paths.

Parameters:
  • new_record (dict) – The new record to be updated

  • existing_record (dict) – The existing record to patch from

  • patch_paths (List[str]) – List of fields in JSON Path notation to patch during upsert

Return type:

None

prepare_record_for_upsert(new_record, existing_record)

Prepare a record for upsert by adding version and patching fields.

For MARC-sourced Instance records, only suppression flags, deleted status, statistical codes, administrative notes, and instance status are allowed to be patched. This protects MARC-managed fields from being overwritten.

Parameters:
  • new_record (dict) – The new record to prepare

  • existing_record (dict) – The existing record in FOLIO

Return type:

None

async fetch_existing_records(record_ids)

Fetch existing records from FOLIO by their IDs.

Parameters:

record_ids (List[str]) – List of record IDs to fetch

Return type:

Dict[str, dict]

Returns:

Dictionary mapping record IDs to their full records

static set_consortium_source(record)

Convert source field for consortium shadow instances.

For shadow instances in ECS/consortium environments, the source field must be prefixed with “CONSORTIUM-” to distinguish them from local records.

Parameters:

record (dict) – The record to modify (modified in place)

Return type:

None

async set_versions_for_upsert(batch)

Fetch existing record versions and prepare batch for upsert.

Only records that already exist in FOLIO will have their _version set and be prepared for update. New records will not have _version set.

Parameters:

batch (List[dict]) – List of records to prepare for upsert

Return type:

None

async post_batch(batch)

Post a batch of records to FOLIO.

Parameters:

batch (List[dict]) – List of records to post

Return type:

tuple[Response, int, int]

Returns:

Tuple of (response data dict, number of creates, number of updates)

Raises:
  • folioclient.FolioClientError – If FOLIO API returns an error

  • folioclient.FolioConnectionError – If connection to FOLIO fails

async post_records(records)

Post records in batches.

Failed records will be written to the file handle provided during initialization.

Parameters:

records – Records to post. Can be: - List of dict records - File-like object containing JSON lines (one record per line) - String/Path to a file containing JSON lines

Return type:

None

async do_work(file_paths)

Main orchestration method for processing files.

This is the primary entry point for batch posting from files. It handles:

  • Single or multiple file processing

  • Progress tracking and logging

  • Failed record collection

  • Statistics reporting

Mimics the folio_migration_tools BatchPoster.do_work() workflow.

Note

To write failed records, pass a file handle or path to the BatchPoster constructor’s failed_records_file parameter.

Parameters:

file_paths (Union[str, Path, List[Union[str, Path]]]) – Path(s) to JSONL file(s) to process

Return type:

BatchPosterStats

Returns:

Final statistics from the posting operation

Example:

config = BatchPosterConfig(
    object_type="Items",
    batch_size=100,
    upsert=True
)

reporter = RichProgressReporter(enabled=True)

# With failed records file
with open("failed_items.jsonl", "w") as failed_file:
    poster = BatchPoster(
        folio_client, config,
        failed_records_file=failed_file,
        reporter=reporter
    )
    async with poster:
        stats = await poster.do_work(["items1.jsonl", "items2.jsonl"])

# Or let BatchPoster manage the file
poster = BatchPoster(
    folio_client, config,
    failed_records_file="failed_items.jsonl",
    reporter=reporter
)
async with poster:
    stats = await poster.do_work("items.jsonl")

print(f"Posted: {stats.records_posted}, Failed: {stats.records_failed}")
async rerun_failed_records_one_by_one()

Reprocess failed records one at a time.

Streams through the failed records file, processing each record individually. Records that still fail are written to a new file with ‘_rerun’ suffix. This gives each record a second chance with individual error handling.

Return type:

None

get_stats()

Get current posting statistics.

Return type:

BatchPosterStats

Returns:

Current statistics

folio_data_import.BatchPoster.get_human_readable_size(size, precision=2)

Convert bytes to human-readable format.

Parameters:
  • size (int) – Size in bytes

  • precision (int) – Number of decimal places

Return type:

str

Returns:

Human-readable size string

folio_data_import.BatchPoster.get_req_size(response)
folio_data_import.BatchPoster.main(config_file=None, *, gateway_url=None, tenant_id=None, username=None, password=None, member_tenant_id=None, object_type=None, file_paths=None, batch_size=100, upsert=False, preserve_statistical_codes=False, preserve_administrative_notes=False, preserve_temporary_locations=False, preserve_temporary_loan_types=False, overwrite_item_status=False, patch_existing_records=False, patch_paths=None, failed_records_file=None, rerun_failed_records=False, no_progress=False, debug=False)

Command-line interface to batch post inventory records to FOLIO

Parameters:
  • config_file (Path | None) – Path to JSON config file (overrides CLI parameters).

  • gateway_url (str | None) – The FOLIO API Gateway URL.

  • tenant_id (str | None) – The tenant id.

  • username (str | None) – The FOLIO username.

  • password (str | None) – The FOLIO password.

  • member_tenant_id (str | None) – The FOLIO ECS member tenant id (if applicable).

  • object_type (Optional[Literal['Instances', 'Holdings', 'Items', 'ShadowInstances']]) – Type of inventory object (Instances, Holdings, or Items).

  • file_paths (tuple[Path, ...] | None) – Path(s) to JSONL file(s) to post.

  • batch_size (int) – Number of records to include in each batch (1-1000).

  • upsert (bool) – Enable upsert mode to update existing records.

  • preserve_statistical_codes (bool) – Preserve existing statistical codes during upsert.

  • preserve_administrative_notes (bool) – Preserve existing administrative notes during upsert.

  • preserve_temporary_locations (bool) – Preserve temporary location assignments during upsert.

  • preserve_temporary_loan_types (bool) – Preserve temporary loan type assignments during upsert.

  • overwrite_item_status (bool) – Overwrite item status during upsert.

  • patch_existing_records (bool) – Enable selective field patching during upsert.

  • patch_paths (str | None) – Comma-separated list of field paths to patch.

  • failed_records_file (Path | None) – Path to file for writing failed records.

  • rerun_failed_records (bool) – After the main run, reprocess failed records one at a time.

  • no_progress (bool) – Disable progress bar display.

  • debug (bool) – Enable debug logging.

Return type:

None

folio_data_import.BatchPoster.parse_config_file(config_file)
folio_data_import.BatchPoster.parse_patch_paths(patch_paths)
folio_data_import.BatchPoster.expand_file_paths(file_paths)
async folio_data_import.BatchPoster.run_batch_poster(folio_client, config, files_to_process, failed_records_file)

Run the batch poster operation.

Parameters:
  • folio_client (FolioClient) – Authenticated FOLIO client

  • config (Config) – BatchPoster configuration

  • files_to_process (List[Path]) – List of file paths to process

  • failed_records_file (Path | None) – Optional path for failed records

folio_data_import.BatchPoster.log_final_stats(poster)

Log the final statistics after batch posting.

Parameters:

poster (BatchPoster) – The BatchPoster instance containing the stats

Return type:

None

MARCDataImport#

Module for importing MARC records using FOLIO’s Data Import APIs.

class folio_data_import.MARCDataImport.MARCImportStats(**data)

Bases: BaseModel

Statistics for MARC import operations.

records_sent: int
records_processed: int
created: int
updated: int
discarded: int
error: int
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class folio_data_import.MARCDataImport.MARCImportJob(folio_client, config, reporter=None)

Bases: object

Class to manage importing MARC data (Bib, Authority) into FOLIO using the Change Manager APIs (folio-org/mod-source-record-manager), rather than file-based Data Import. When executed in an interactive environment, it can provide progress bars for tracking the number of records both uploaded and processed.

Parameters:
  • folio_client (FolioClient) – An instance of the FolioClient class.

  • marc_files (list) – A list of Path objects representing the MARC files to import.

  • import_profile_name (str) – The name of the data import job profile to use.

  • batch_size (int) – The number of source records to include in a record batch (default=10).

  • batch_delay (float) – The number of seconds to wait between record batches (default=0).

  • no_progress (bool) – Disable progress bars (eg. for running in a CI environment).

  • marc_record_preprocessors (list or str) – A list of callables, or a string representing a comma-separated list of MARC record preprocessor names to apply to each record before import.

  • preprocessor_args (dict) – A dictionary of arguments to pass to the MARC record preprocessor(s).

  • let_summary_fail (bool) – If True, will not retry or fail the import if the final job summary cannot be retrieved (default=False).

  • split_files (bool) – If True, will split each file into smaller jobs of size split_size

  • split_size (int) – The number of records to include in each split file (default=1000).

  • split_offset (int) – The number of split files to skip before starting processing (default=0).

  • job_ids_file_path (str) – The path to the file where job IDs will be saved (default=”marc_import_job_ids.txt”).

  • show_file_names_in_data_import_logs (bool) – If True, will set the file name for each job in the data import logs.

class Config(**data)

Bases: BaseModel

Configuration for MARC import operations.

marc_files: Annotated[List[Path]]
import_profile_name: Annotated[str]
batch_size: Annotated[int]
batch_delay: Annotated[float]
marc_record_preprocessors: Annotated[Union[List[Callable], str, None]]
preprocessors_args: Annotated[Optional[Dict[str, Dict]]]
no_progress: Annotated[bool]
no_summary: Annotated[bool]
let_summary_fail: Annotated[bool]
split_files: Annotated[bool]
split_size: Annotated[int]
split_offset: Annotated[int]
job_ids_file_path: Annotated[Path | None]
show_file_names_in_data_import_logs: Annotated[bool]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

bad_records_file: BinaryIO
failed_batches_file: BinaryIO
task_sent: str
task_imported: str
http_client: Client
current_file: Union[List[Path], List[BinaryIO]]
record_batch: List[bytes]
last_current: int = 0
total_records_sent: int = 0
finished: bool = False
job_id: str = ''
job_ids: List[str]
job_hrid: int = 0
reporter: ProgressReporter
async do_work()

Performs the necessary work for data import.

This method initializes an HTTP client, files to store records that fail to send, and calls the appropriate method to import MARC files based on the configuration.

Return type:

None

Returns:

None

async process_split_files()

Process the import of files in smaller batches. This method is called when split_files is set to True. It splits each file into smaller chunks and processes them one by one.

async wrap_up()

Wraps up the data import process.

This method is called after the import process is complete. It checks for empty bad records and error files and removes them.

Return type:

None

Returns:

None

async get_job_status()

Retrieves the status of a job execution.

Return type:

None

Returns:

None

Raises:

IndexError – If the job execution with the specified ID is not found.

async set_job_file_name()

Sets the file name for the current job execution.

Return type:

None

Returns:

None

async create_folio_import_job()

Creates a job execution for importing data into FOLIO.

Return type:

None

Returns:

None

Raises:

FolioHTTPError – If there is an error creating the job.

property import_profile: dict

Returns the import profile for the current job execution.

Returns:

The import profile for the current job execution.

Return type:

dict

async set_job_profile()

Sets the job profile for the current job execution.

Return type:

None

Returns:

The response from the HTTP request to set the job profile.

async static read_total_records(files)

Count records from files with per-file logging.

Parameters:

files (list) – List of files to read.

Returns:

The total number of records found in the files.

Return type:

int

async process_record_batch(batch_payload)

Processes a record batch.

Parameters:

batch_payload (dict) – A records payload containing the current batch of MARC records.

Return type:

None

async process_records(files, total_records)

Process records from the given files.

Parameters:
  • files (list) – List of files to process.

  • total_records (int) – Total number of records to process.

  • pbar_sent – Progress bar for tracking the number of records sent.

Return type:

None

Returns:

None

move_file_to_complete(file_path)
Return type:

None

async create_batch_payload(counter, total_records, is_last)

Create a batch payload for data import.

Parameters:
  • counter (int) – The current counter value.

  • total_records (int) – The total number of records.

  • is_last (bool) – Indicates if this is the last batch.

Returns:

The batch payload containing the ID, records metadata, and initial records.

Return type:

dict

static split_marc_file(file_path, batch_size)

Generator to iterate over MARC records in batches, yielding BytesIO objects.

Return type:

Generator[BytesIO, None, None]

async import_marc_file()

Imports MARC file into the system.

This method performs the following steps: 1. Creates a FOLIO import job. 2. Retrieves the import profile. 3. Sets the job profile. 4. Opens the MARC file(s) and reads the total number of records. 5. Displays progress bars for imported and sent records. 6. Processes the records and updates the progress bars. 7. Checks the job status periodically until the import is finished.

Note: This method assumes that the necessary instance attributes are already set.

Return type:

None

Returns:

None

async cancel_job()

Cancels the current job execution.

This method sends a request to cancel the job execution and logs the result.

Return type:

None

Returns:

None

async log_job_summary()
async get_job_summary()

Retrieves the job summary for the current job execution.

Returns:

The job summary for the current job execution.

Return type:

dict

folio_data_import.MARCDataImport.main(config_file=None, *, gateway_url=None, tenant_id=None, username=None, password=None, marc_file_paths=None, member_tenant_id=None, import_profile_name=None, batch_size=10, batch_delay=0.0, preprocessors=None, preprocessors_config=None, file_names_in_di_logs=False, split_files=False, split_size=1000, split_offset=0, no_progress=False, no_summary=False, let_summary_fail=False, job_ids_file_path=None, debug=False)

Command-line interface to batch import MARC records into FOLIO using FOLIO Data Import

Parameters:
  • config_file (Path | None) – Path to JSON config file for the import job, overrides other parameters if provided.

  • gateway_url (str) – The FOLIO API Gateway URL.

  • tenant_id (str) – The tenant id.

  • username (str) – The FOLIO username.

  • password (str) – The FOLIO password.

  • marc_file_paths (List[Path]) – The MARC file(s) or glob pattern(s) to import.

  • member_tenant_id (str) – The FOLIO ECS member tenant id (if applicable).

  • import_profile_name (str) – The name of the import profile to use.

  • batch_size (int) – The number of records to send in each batch.

  • batch_delay (float) – The delay (in seconds) between sending each batch.

  • preprocessors (str) – Comma-separated list of MARC record preprocessors to use.

  • preprocessors_config (str) – Path to JSON config file for the preprocessors.

  • file_names_in_di_logs (bool) – Show file names in data import logs.

  • split_files (bool) – Split files into smaller batches.

  • split_size (int) – The number of records per split batch.

  • split_offset (int) – The number of split batches to skip before starting import.

  • no_progress (bool) – Disable progress bars.

  • no_summary (bool) – Skip the final job summary.

  • let_summary_fail (bool) – Let the final summary check fail without exiting.

  • preprocessor_config (str) – Path to JSON config file for the preprocessor.

  • job_ids_file_path (str) – Path to file to write job IDs to.

  • debug (bool) – Enable debug logging.

Return type:

None

folio_data_import.MARCDataImport.select_import_profile(folio_client)
folio_data_import.MARCDataImport.collect_marc_file_paths(marc_file_paths)
async folio_data_import.MARCDataImport.run_job(job)

UserImport#

Module for importing user data into FOLIO.

class folio_data_import.UserImport.UserImporterStats(**data)

Bases: BaseModel

Statistics for user import operations.

created: int
updated: int
failed: int
deleted: int
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class folio_data_import.UserImport.UserImporter(folio_client, config, reporter=None)

Bases: object

Class to import mod-user-import compatible user objects (eg. from folio_migration_tools UserTransformer task) from a JSON-lines file into FOLIO

class Config(**data)

Bases: BaseModel

Configuration for UserImporter operations.

library_name: Annotated[str]
batch_size: Annotated[int]
user_match_key: Annotated[Literal['externalSystemId', 'username', 'barcode']]
only_update_present_fields: Annotated[bool]
default_preferred_contact_type: Annotated[Literal['001', '002', '003', '004', '005', 'mail', 'email', 'text', 'phone', 'mobile']]
fields_to_protect: Annotated[List[str]]
limit_simultaneous_requests: Annotated[int]
user_file_paths: Annotated[Union[Path, List[Path], None]]
no_progress: Annotated[bool]
delete_all: Annotated[bool]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

logfile: AsyncTextIOWrapper
errorfile: AsyncTextIOWrapper
http_client: AsyncClient
static build_ref_data_id_map(folio_client, endpoint, key, name)

Builds a map of reference data IDs.

Parameters:
  • folio_client (folioclient.FolioClient) – A FolioClient object.

  • endpoint (str) – The endpoint to retrieve the reference data from.

  • key (str) – The key to use as the map key.

Returns:

A dictionary mapping reference data keys to their corresponding IDs.

Return type:

dict

static validate_uuid(uuid_string)

Validate a UUID string.

Parameters:

uuid_string (str) – The UUID string to validate.

Returns:

True if the UUID is valid, otherwise False.

Return type:

bool

async setup(error_file_path)

Sets up the importer by initializing necessary resources.

Parameters:
  • log_file_path (Path) – The path to the log file.

  • error_file_path (Path) – The path to the error file.

Return type:

None

async close()

Closes the importer by releasing any resources.

Return type:

None

async do_import()

Main method to import users.

This method triggers the process of importing users by calling the process_file method. Supports both single file path and list of file paths.

Return type:

None

async get_existing_user(user_obj)

Retrieves an existing user from FOLIO based on the provided user object.

Parameters:

user_obj – The user object containing the information to match against existing users.

Return type:

dict

Returns:

The existing user object if found, otherwise an empty dictionary.

async get_existing_rp(user_obj, existing_user)

Retrieves the existing request preferences for a given user.

Parameters:
  • user_obj (dict) – The user object.

  • existing_user (dict) – The existing user object.

Returns:

The existing request preferences for the user.

Return type:

dict

async get_existing_pu(user_obj, existing_user)

Retrieves the existing permission user for a given user.

Parameters:
  • user_obj (dict) – The user object.

  • existing_user (dict) – The existing user object.

Returns:

The existing permission user object.

Return type:

dict

async map_address_types(user_obj, line_number)

Maps address type names in the user object to the corresponding ID in the address_type_map.

Parameters:
  • user_obj (dict) – The user object containing personal information.

  • address_type_map (dict) – A dictionary mapping address type names to their ID values.

Return type:

None

Returns:

None

Raises:

KeyError – If an address type name in the user object is not found in address_type_map.

async map_patron_groups(user_obj, line_number)

Maps the patron group of a user object using the provided patron group map.

Parameters:
  • user_obj (dict) – The user object to update.

  • patron_group_map (dict) – A dictionary mapping patron group names.

Return type:

None

Returns:

None

async map_departments(user_obj, line_number)

Maps the departments of a user object using the provided department map.

Parameters:
  • user_obj (dict) – The user object to update.

  • department_map (dict) – A dictionary mapping department names.

Return type:

None

Returns:

None

async update_existing_user(user_obj, existing_user, protected_fields)

Updates an existing user with the provided user object.

Parameters:
  • user_obj (dict) – The user object containing the updated user information.

  • existing_user (dict) – The existing user object to be updated.

  • protected_fields (dict) – A dictionary containing the protected fields and their values.

Returns:

A tuple containing the updated existing user object and the API response.

Return type:

tuple

Raises:

None

async create_new_user(user_obj)

Creates a new user in the system.

Parameters:

user_obj (dict) – A dictionary containing the user information.

Returns:

A dictionary representing the response from the server.

Return type:

dict

Raises:

HTTPError – If the HTTP request to create the user fails.

async set_preferred_contact_type(user_obj, existing_user)

Sets the preferred contact type for a user object. If the provided preferred contact type is not valid, the default preferred contact type is used, unless the previously existing user object has a valid preferred contact type set. In that case, the existing preferred contact type is used.

Return type:

None

async create_or_update_user(user_obj, existing_user, protected_fields, line_number)

Creates or updates a user based on the given user object and existing user.

Parameters:
  • user_obj (dict) – The user object containing the user details.

  • existing_user (dict) – The existing user object to be updated, if available.

  • logs (dict) – A dictionary to keep track of the number of updates and failures.

Returns:

The updated or created user object, or an empty dictionary an error occurs.

Return type:

dict

async process_user_obj(user)

Process a user object. If not type is found in the source object, type is set to “patron”.

Parameters:

user (str) – The user data to be processed, as a json string.

Returns:

The processed user object.

Return type:

dict

async get_protected_fields(existing_user)

Retrieves the protected fields from the existing user object, combining both the customFields.protectedFields list and any fields_to_protect passed on the CLI.

Parameters:

existing_user (dict) – The existing user object.

Returns:

A dictionary containing the protected fields and their values.

Return type:

dict

async process_existing_user(user_obj)

Process an existing user.

Parameters:

user_obj (dict) – The user object to process.

Returns:

A tuple containing the request preference object (rp_obj),

the service points user object (spu_obj), the existing user object, the protected fields, the existing request preference object (existing_rp), the existing PU object (existing_pu), and the existing SPU object (existing_spu).

Return type:

tuple

async create_or_update_rp(rp_obj, existing_rp, new_user_obj)

Creates or updates a requet preference object based on the given parameters.

Parameters:
  • rp_obj (object) – A new requet preference object.

  • existing_rp (object) – The existing resource provider object, if it exists.

  • new_user_obj (object) – The new user object.

Returns:

None

async create_new_rp(new_user_obj)

Creates a new request preference for a user.

Parameters:

new_user_obj (dict) – The user object containing the user’s ID.

Raises:

HTTPError – If there is an error in the HTTP request.

Returns:

None

async update_existing_rp(rp_obj, existing_rp)

Updates an existing request preference with the provided request preference object.

Parameters:
  • rp_obj (dict) – The request preference object containing the updated values.

  • existing_rp (dict) – The existing request preference object to be updated.

Raises:

HTTPError – If the PUT request to update the request preference fails.

Return type:

None

Returns:

None

async create_perms_user(new_user_obj)

Creates a permissions user object for the given new user.

Parameters:

new_user_obj (dict) – A dictionary containing the details of the new user.

Raises:

HTTPError – If there is an error while making the HTTP request.

Return type:

None

Returns:

None

async delete_user(existing_user, existing_rp, existing_pu, existing_spu, line_number)

Deletes a user and associated objects.

Parameters:
  • existing_user (dict) – The existing user object to be deleted.

  • existing_rp (dict) – The existing request preference object associated with the user.

  • existing_pu (dict) – The existing permission user object associated with the user.

  • existing_spu (dict) – The existing service points user object associated with the user.

  • line_number (int) – The line number in the input file for logging purposes.

Return type:

None

Returns:

None

async process_line(user, line_number)

Process a single line of user data.

Parameters:
  • user (str) – The user data to be processed.

  • logs (dict) – A dictionary to store logs.

Return type:

None

Returns:

None

Raises:

Any exceptions that occur during the processing.

async map_service_points(spu_obj, existing_user)

Maps the service points of a user object using the provided service point map.

Parameters:
  • spu_obj (dict) – The service-points-user object to update.

  • existing_user (dict) – The existing user object associated with the spu_obj.

Returns:

None

async handle_service_points_user(spu_obj, existing_spu, existing_user)

Handles processing a service-points-user object for a user.

Parameters:
  • spu_obj (dict) – The service-points-user object to process.

  • existing_spu (dict) – The existing service-points-user object, if it exists.

  • existing_user (dict) – The existing user object associated with the spu_obj.

async get_existing_spu(existing_user)

Retrieves the existing service-points-user object for a given user.

Parameters:

existing_user (dict) – The existing user object.

Returns:

The existing service-points-user object.

Return type:

dict

async create_new_spu(spu_obj, existing_user)

Creates a new service-points-user object for a given user.

Parameters:
  • spu_obj (dict) – The service-points-user object to create.

  • existing_user (dict) – The existing user object.

Returns:

None

async update_existing_spu(spu_obj, existing_spu)

Updates an existing service-points-user object with the provided service-points-user object.

Parameters:
  • spu_obj (dict) – The service-points-user object containing the updated values.

  • existing_spu (dict) – The existing service-points-user object to be updated.

Returns:

None

async process_file(openfile)

Process the user object file.

Parameters:

openfile (TextIOWrapper) – The file or file-like object to process.

Return type:

None

get_stats()

Get current import statistics.

Return type:

UserImporterStats

Returns:

Current statistics

folio_data_import.UserImport.main(config_file=None, *, gateway_url=None, tenant_id=None, username=None, password=None, library_name=None, user_file_paths=None, member_tenant_id=None, delete_all=False, fields_to_protect=None, update_only_present_fields=False, limit_async_requests=10, batch_size=250, report_file_base_path=None, user_match_key='externalSystemId', default_preferred_contact_type='email', no_progress=False, yes=False, debug=False)

Command-line interface to batch import users into FOLIO

Parameters:
  • config_file (Path | None) – Path to a JSON configuration file. Overrides job configuration parameters if provided.

  • gateway_url (str) – The FOLIO gateway URL.

  • tenant_id (str) – The FOLIO tenant ID.

  • username (str) – The FOLIO username.

  • password (str) – The FOLIO password.

  • library_name (str) – The library name associated with the job.

  • user_file_paths (Tuple[Path, ]) – Path(s) to the user data file(s). Use –user-file-paths or –user-file-path (deprecated, will be removed in future versions).

  • member_tenant_id (str) – The FOLIO ECS member tenant id (if applicable).

  • delete_all (bool) – Whether to delete existing users, rather than create/update.

  • fields_to_protect (str) – Comma-separated list of fields to protect during update.

  • update_only_present_fields (bool) – Whether to update only fields present in the input.

  • limit_async_requests (int) – The maximum number of concurrent async HTTP requests.

  • batch_size (int) – The number of users to process in each batch.

  • report_file_base_path (Path) – The base path for report files.

  • user_match_key (str) – The key to match users (externalSystemId, username, barcode).

  • default_preferred_contact_type (str) – The default preferred contact type for users

  • no_progress (bool) – Whether to disable the progress bar.

  • yes (bool) – Skip confirmation prompt for destructive operations (e.g. –delete-all).

  • debug (bool) – Enable debug logging.

Return type:

None

folio_data_import.UserImport.pathify_user_file_paths(user_file_paths)
async folio_data_import.UserImport.run_user_importer(importer, error_file_path)

Custom Exceptions#

Exception classes used throughout the toolkit.

Custom exceptions for the Folio Data Import module.

exception folio_data_import.custom_exceptions.FolioDataImportError

Bases: Exception

Base class for all exceptions in the Folio Data Import module.

exception folio_data_import.custom_exceptions.FolioDataImportBatchError(batch_id, message, exception=None)

Bases: FolioDataImportError

Exception raised for errors in the Folio Data Import batch process.

batch_id -- ID of the batch that caused the error
message -- explanation of the error
exception folio_data_import.custom_exceptions.FolioDataImportJobError(job_id, message, exception=None)

Bases: FolioDataImportError

Exception raised for errors in the Folio Data Import job process.

job_id -- ID of the job that caused the error
message -- explanation of the error

Progress Reporting#

Progress tracking and reporting utilities.

Progress reporting abstraction for FOLIO data import tasks.

This module provides a UI-agnostic progress reporting system that can be used across all import tasks (BatchPoster, UserImport, MARCDataImport, etc.) with support for multiple simultaneous tasks and easy backend swapping.

class folio_data_import._progress.TaskStatus(*values)

Bases: Enum

Status of a progress task.

PENDING = 'pending'
RUNNING = 'running'
COMPLETED = 'completed'
FAILED = 'failed'
class folio_data_import._progress.ProgressReporter(*args, **kwargs)

Bases: Protocol

Protocol defining the interface for progress reporters.

This protocol allows for easy swapping between different UI implementations (CLI, GUI, web) without changing the core business logic.

start_task(name, total=None, description=None)

Start a new progress task.

Parameters:
  • name (str) – Unique identifier for the task

  • total (int | None) – Total number of items to process (None for indeterminate)

  • description (str | None) – Human-readable task description

Return type:

str

Returns:

Task ID that can be used to update this task

update_task(task_id, advance=0, total=None, description=None, **stats)

Update an existing task’s progress and statistics.

Parameters:
  • task_id (str) – ID of the task to update

  • advance (int) – Number of items to advance by

  • total (int | None) – New total (if changed)

  • description (str | None) – New description (if changed)

  • **stats (Any) – Additional statistics to track (created, updated, failed, etc.)

Return type:

None

finish_task(task_id, status=TaskStatus.COMPLETED)

Mark a task as finished.

Parameters:
  • task_id (str) – ID of the task to finish

  • status (TaskStatus) – Final status of the task

Return type:

None

is_active()

Check if progress reporting is active.

Return type:

bool

class folio_data_import._progress.BaseProgressReporter(enabled=True)

Bases: ABC

Abstract base class for progress reporters.

Provides common functionality and enforces the interface contract.

__init__(enabled=True)

Initialize the progress reporter.

Parameters:

enabled (bool) – Whether progress reporting is enabled

abstractmethod start_task(name, total=None, description=None)

Start a new progress task.

Return type:

str

abstractmethod update_task(task_id, advance=0, total=None, description=None, **stats)

Update an existing task.

Return type:

None

abstractmethod finish_task(task_id, status=TaskStatus.COMPLETED)

Finish a task.

Return type:

None

is_active()

Check if reporter is active.

Return type:

bool

get_stats(task_id)

Get statistics for a task.

Return type:

dict[str, Any] | None

abstractmethod __enter__()

Enter context manager.

Return type:

BaseProgressReporter

abstractmethod __exit__(exc_type, exc_val, exc_tb)

Exit context manager.

Return type:

None

class folio_data_import._progress.ItemsPerSecondColumn(table_column=None)

Bases: ProgressColumn

Renders the speed in items per second.

render(task)

Should return a renderable object.

Return type:

Text

class folio_data_import._progress.UserStatsColumn(table_column=None)

Bases: ProgressColumn

render(task)

Should return a renderable object.

Return type:

Text

class folio_data_import._progress.BatchPosterStatsColumn(table_column=None)

Bases: ProgressColumn

Renders statistics for batch posting operations.

render(task)

Should return a renderable object.

Return type:

Text

class folio_data_import._progress.GenericStatsColumn(table_column=None)

Bases: ProgressColumn

Renders generic statistics for any task.

The stat_configs class attribute can be customized by subclassing or direct assignment to change which stats are displayed and their styling.

Example:

# Customize via subclass
class CustomStatsColumn(GenericStatsColumn):
    stat_configs = [
        ("imported", "Imported", "bright_blue"),
        ("skipped", "Skipped", "yellow"),
        ("failed", "Failed", "red"),
    ]

# Or modify directly
GenericStatsColumn.stat_configs.append(("custom_stat", "Custom", "magenta"))
stat_configs: list[tuple[str, str, str]] = [('posted', 'Posted', 'bright_green'), ('created', 'Created', 'green'), ('updated', 'Updated', 'cyan'), ('deleted', 'Deleted', 'yellow'), ('failed', 'Failed', 'red'), ('processed', 'Processed', 'blue')]
render(task)

Render statistics based on configured stats.

Return type:

Text

class folio_data_import._progress.RichProgressReporter(enabled=True, show_speed=True, show_time=True)

Bases: BaseProgressReporter

Rich terminal-based progress reporter.

Provides a beautiful CLI progress display using the Rich library with support for multiple simultaneous tasks, live updates, and logging.

__init__(enabled=True, show_speed=True, show_time=True)

Initialize the Rich progress reporter.

Parameters:
  • enabled (bool) – Whether progress reporting is enabled

  • show_speed (bool) – Whether to show items/second

  • show_time (bool) – Whether to show elapsed/remaining time

start_task(name, total=None, description=None)

Start a new progress task.

Return type:

str

update_task(task_id, advance=0, total=None, description=None, **stats)

Update an existing task.

Return type:

None

finish_task(task_id, status=TaskStatus.COMPLETED)

Finish a task.

Return type:

None

__enter__()

Enter context manager.

Return type:

RichProgressReporter

__exit__(exc_type, exc_val, exc_tb)

Exit context manager.

Return type:

None

class folio_data_import._progress.RedisProgressReporter(enabled=True, redis_url='redis://localhost:6379', session_id=None, ttl=3600)

Bases: BaseProgressReporter

Progress reporter that stores state in Redis for distributed access.

Stores progress updates in Redis that can be accessed by separate processes or API endpoints. Requires redis package to be installed.

Example

>>> reporter = RedisProgressReporter(
...     redis_url="redis://localhost:6379",
...     session_id="import-123"
... )
>>> with reporter:
...     task_id = reporter.start_task("users", total=100)
...     reporter.update_task(task_id, advance=10, created=5)
__init__(enabled=True, redis_url='redis://localhost:6379', session_id=None, ttl=3600)

Initialize the Redis progress reporter.

Parameters:
  • enabled (bool) – Whether progress reporting is enabled

  • redis_url (str) – Redis connection URL

  • session_id (str | None) – Unique identifier for this progress session

  • ttl (int) – Time-to-live for session data in seconds (default: 1 hour)

Raises:

ImportError – If redis package is not installed

start_task(name, total=None, description=None)

Start a new progress task.

Return type:

str

update_task(task_id, advance=0, total=None, description=None, **stats)

Update an existing task.

Return type:

None

finish_task(task_id, status=TaskStatus.COMPLETED)

Finish a task.

Return type:

None

classmethod get_session(session_id, redis_url='redis://localhost:6379')

Get the current state of a session from Redis.

Parameters:
  • session_id (str) – The session ID to retrieve

  • redis_url (str) – Redis connection URL

Return type:

dict[str, Any] | None

Returns:

Session data dictionary or None if not found

Raises:

ImportError – If redis package is not installed

classmethod delete_session(session_id, redis_url='redis://localhost:6379')

Delete a session from Redis.

Parameters:
  • session_id (str) – The session ID to delete

  • redis_url (str) – Redis connection URL

Return type:

bool

Returns:

True if deleted, False if not found

Raises:

ImportError – If redis package is not installed

__enter__()

Enter context manager.

Return type:

RedisProgressReporter

__exit__(exc_type, exc_val, exc_tb)

Exit context manager.

Return type:

None

class folio_data_import._progress.NoOpProgressReporter

Bases: BaseProgressReporter

No-operation progress reporter for when progress display is disabled.

__init__()

Initialize the no-op reporter.

start_task(name, total=None, description=None)

Start a task (no-op).

Return type:

str

update_task(task_id, advance=0, total=None, description=None, **stats)

Update a task (no-op).

Return type:

None

finish_task(task_id, status=TaskStatus.COMPLETED)

Finish a task (no-op).

Return type:

None

__enter__()

Enter context manager.

Return type:

NoOpProgressReporter

__exit__(exc_type, exc_val, exc_tb)

Exit context manager.

Return type:

None

MARC Preprocessors#

MARC record preprocessor functions.

class folio_data_import.marc_preprocessors._preprocessors.MARCPreprocessor(preprocessors, **kwargs)

Bases: object

A class to preprocess MARC records for data import into FOLIO.

__init__(preprocessors, **kwargs)

Initialize the MARCPreprocessor with a list of preprocessors.

Parameters:

preprocessors (Union[str, List[Callable]]) – A string of comma-separated function names or a list of callable preprocessor functions to apply.

do_work(record)

Preprocess the MARC record.

Return type:

Record

folio_data_import.marc_preprocessors._preprocessors.prepend_prefix_001(record, prefix)

Prepend a prefix to the record’s 001 field.

Parameters:
  • record (Record) – The MARC record to preprocess.

  • prefix (str) – The prefix to prepend to the 001 field.

Returns:

The preprocessed MARC record.

Return type:

Record

folio_data_import.marc_preprocessors._preprocessors.prepend_ppn_prefix_001(record, **kwargs)

Prepend the PPN prefix to the record’s 001 field. Useful when importing records from the ABES SUDOC catalog

Parameters:

record (Record) – The MARC record to preprocess.

Returns:

The preprocessed MARC record.

Return type:

Record

folio_data_import.marc_preprocessors._preprocessors.prepend_abes_prefix_001(record, **kwargs)

Prepend the ABES prefix to the record’s 001 field. Useful when importing records from the ABES SUDOC catalog

Parameters:

record (Record) – The MARC record to preprocess.

Returns:

The preprocessed MARC record.

Return type:

Record

folio_data_import.marc_preprocessors._preprocessors.strip_999_ff_fields(record, **kwargs)

Strip all 999 fields with ff indicators from the record. Useful when importing records exported from another FOLIO system

Parameters:

record (Record) – The MARC record to preprocess.

Returns:

The preprocessed MARC record.

Return type:

Record

folio_data_import.marc_preprocessors._preprocessors.clean_999_fields(record, **kwargs)

The presence of 999 fields, with or without ff indicators, can cause issues with data import mapping in FOLIO. This function calls strip_999_ff_fields to remove 999 fields with ff indicators and then copies the remaining 999 fields to 945 fields.

Parameters:

record (Record) – The MARC record to preprocess.

Returns:

The preprocessed MARC record.

Return type:

Record

folio_data_import.marc_preprocessors._preprocessors.clean_non_ff_999_fields(record, **kwargs)

When loading migrated MARC records from folio_migration_tools, the presence of other 999 fields than those set by the migration process can cause the record to fail to load properly. This preprocessor function moves all 999 fields with non-ff indicators to 945 fields with 99 indicators.

Return type:

Record

folio_data_import.marc_preprocessors._preprocessors.sudoc_supercede_prep(record, **kwargs)

Preprocesses a record from the ABES SUDOC catalog to copy 035 fields with a $9 subfield value of ‘sudoc’ to 935 fields with a $a subfield prefixed with “(ABES)”. This is useful when importing newly-merged records from the SUDOC catalog when you want the new record to replace the old one in FOLIO. This also applyes the prepend_ppn_prefix_001 function to the record.

Parameters:

record (Record) – The MARC record to preprocess.

Returns:

The preprocessed MARC record.

Return type:

Record

folio_data_import.marc_preprocessors._preprocessors.clean_empty_fields(record, **kwargs)

Remove empty fields and subfields from the record. These can cause data import mapping issues in FOLIO. Removals are logged at custom log level 26, which is used by folio_migration_tools to populate the data issues report.

Parameters:

record (Record) – The MARC record to preprocess.

Returns:

The preprocessed MARC record.

Return type:

Record

folio_data_import.marc_preprocessors._preprocessors.fix_bib_leader(record, **kwargs)

Fixes the leader of the record by setting the record status to ‘c’ (modified record) and the type of record to ‘a’ (language material).

Parameters:

record (Record) – The MARC record to preprocess.

Returns:

The preprocessed MARC record.

Return type:

Record

folio_data_import.marc_preprocessors._preprocessors.move_authority_subfield_9_to_0_all_controllable_fields(record, **kwargs)

Move subfield 9 from authority fields to subfield 0. This is useful when importing records from the ABES SUDOC catalog.

Parameters:

record (Record) – The MARC record to preprocess.

Returns:

The preprocessed MARC record.

Return type:

Record

folio_data_import.marc_preprocessors._preprocessors.mark_deleted(record, **kwargs)

Mark the record as deleted by setting the record status to ‘d’.

Parameters:

record (Record) – The MARC record to preprocess.

Returns:

The preprocessed MARC record.

Return type:

Record

folio_data_import.marc_preprocessors._preprocessors.remove_non_numeric_fields(record, **kwargs)

Remove all fields from the record that have non-numeric tags (not matching pattern 001-999). Also removes field 000, which is invalid in MARC records.

Parameters:

record (Record) – The MARC record to preprocess.

Returns:

The preprocessed MARC record.

Return type:

Record

folio_data_import.marc_preprocessors._preprocessors.ordinal(n)
Return type:

str

See Also#