API Reference#
Auto-generated reference documentation for the FOLIO Data Import Python API.
Main Modules#
BatchPoster#
The core module for batch posting inventory records to FOLIO.
BatchPoster module for FOLIO inventory batch operations.
This module provides functionality for batch posting of Instances, Holdings, and Items to FOLIO’s inventory storage endpoints with support for upsert operations.
- class folio_data_import.BatchPoster.BatchPosterStats(**data)
Bases:
BaseModelStatistics for batch posting operations.
-
records_processed:
int
-
records_posted:
int
-
records_created:
int
-
records_updated:
int
-
records_failed:
int
-
batches_posted:
int
-
batches_failed:
int
-
rerun_succeeded:
int
-
rerun_still_failed:
int
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
records_processed:
- folio_data_import.BatchPoster.get_api_info(object_type)
Get API endpoint information for a given object type.
- Parameters:
object_type (
str) – The type of object (Instances, Holdings, Items)- Return type:
Dict[str,Any]- Returns:
Dictionary containing API endpoint information
- Raises:
ValueError – If object_type is not supported
- folio_data_import.BatchPoster.deep_update(target, source)
Recursively update target dictionary with values from source dictionary.
- Parameters:
target (
dict) – The dictionary to updatesource (
dict) – The dictionary to merge into target
- Return type:
None
- folio_data_import.BatchPoster.extract_paths(record, paths)
Extract specified paths from a record.
- Parameters:
record (
dict) – The record to extract frompaths (
List[str]) – List of JSON paths to extract (e.g., [‘statisticalCodeIds’, ‘status’])
- Return type:
dict- Returns:
Dictionary containing only the specified paths
- class folio_data_import.BatchPoster.BatchPoster(folio_client, config, failed_records_file=None, reporter=None)
Bases:
objectHandles batch posting of inventory records to FOLIO.
This class provides functionality for posting Instances, Holdings, and Items to FOLIO’s batch inventory endpoints with support for upsert operations.
- class Config(**data)
Bases:
BaseModelConfiguration for BatchPoster operations.
-
object_type:
Annotated[Literal['Instances','Holdings','Items','ShadowInstances']]
-
batch_size:
Annotated[int]
-
upsert:
Annotated[bool]
-
preserve_statistical_codes:
Annotated[bool]
-
preserve_administrative_notes:
Annotated[bool]
-
preserve_temporary_locations:
Annotated[bool]
-
preserve_temporary_loan_types:
Annotated[bool]
-
preserve_item_status:
Annotated[bool]
-
patch_existing_records:
Annotated[bool]
-
patch_paths:
Annotated[Optional[List[str]]]
-
rerun_failed_records:
Annotated[bool]
-
no_progress:
Annotated[bool]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
object_type:
- __init__(folio_client, config, failed_records_file=None, reporter=None)
Initialize BatchPoster.
- Parameters:
folio_client (
FolioClient) – Authenticated FOLIO clientconfig (
Config) – Configuration for batch postingfailed_records_file – Optional file handle or path for writing failed records. Can be an open file handle (managed by caller) or a string/Path (will be opened/closed by BatchPoster).
reporter (
ProgressReporter|None) – Optional progress reporter. If None, uses NoOpProgressReporter.
- async __aenter__()
Async context manager entry.
- async __aexit__(exc_type, exc_val, exc_tb)
Async context manager exit.
- handle_upsert_for_statistical_codes(updates, keep_existing)
Handle statistical codes during upsert based on configuration.
- Parameters:
updates (
dict) – Dictionary being prepared for updatekeep_existing (
dict) – Dictionary of fields to preserve from existing record
- Return type:
None
- handle_upsert_for_administrative_notes(updates, keep_existing)
Handle administrative notes during upsert based on configuration.
- Parameters:
updates (
dict) – Dictionary being prepared for updatekeep_existing (
dict) – Dictionary of fields to preserve from existing record
- Return type:
None
- handle_upsert_for_temporary_locations(updates, keep_existing)
Handle temporary locations during upsert based on configuration.
- Parameters:
updates (
dict) – Dictionary being prepared for updatekeep_existing (
dict) – Dictionary of fields to preserve from existing record
- Return type:
None
- handle_upsert_for_temporary_loan_types(updates, keep_existing)
Handle temporary loan types during upsert based on configuration.
- Parameters:
updates (
dict) – Dictionary being prepared for updatekeep_existing (
dict) – Dictionary of fields to preserve from existing record
- Return type:
None
- keep_existing_fields(updates, existing_record)
Preserve specific fields from existing record during upsert.
Always preserves
hrid(human-readable ID) andlastCheckIn(circulation data) from existing records to prevent data loss. Optionally preservesstatusbased on configuration.- Parameters:
updates (
dict) – Dictionary being prepared for updateexisting_record (
dict) – The existing record in FOLIO
- Return type:
None
- patch_record(new_record, existing_record, patch_paths)
Update new_record with values from existing_record according to patch_paths.
- Parameters:
new_record (
dict) – The new record to be updatedexisting_record (
dict) – The existing record to patch frompatch_paths (
List[str]) – List of fields in JSON Path notation to patch during upsert
- Return type:
None
- prepare_record_for_upsert(new_record, existing_record)
Prepare a record for upsert by adding version and patching fields.
For MARC-sourced Instance records, only suppression flags, deleted status, statistical codes, administrative notes, and instance status are allowed to be patched. This protects MARC-managed fields from being overwritten.
- Parameters:
new_record (
dict) – The new record to prepareexisting_record (
dict) – The existing record in FOLIO
- Return type:
None
- async fetch_existing_records(record_ids)
Fetch existing records from FOLIO by their IDs.
- Parameters:
record_ids (
List[str]) – List of record IDs to fetch- Return type:
Dict[str,dict]- Returns:
Dictionary mapping record IDs to their full records
- static set_consortium_source(record)
Convert source field for consortium shadow instances.
For shadow instances in ECS/consortium environments, the source field must be prefixed with “CONSORTIUM-” to distinguish them from local records.
- Parameters:
record (
dict) – The record to modify (modified in place)- Return type:
None
- async set_versions_for_upsert(batch)
Fetch existing record versions and prepare batch for upsert.
Only records that already exist in FOLIO will have their _version set and be prepared for update. New records will not have _version set.
- Parameters:
batch (
List[dict]) – List of records to prepare for upsert- Return type:
None
- async post_batch(batch)
Post a batch of records to FOLIO.
- Parameters:
batch (
List[dict]) – List of records to post- Return type:
tuple[Response,int,int]- Returns:
Tuple of (response data dict, number of creates, number of updates)
- Raises:
folioclient.FolioClientError – If FOLIO API returns an error
folioclient.FolioConnectionError – If connection to FOLIO fails
- async post_records(records)
Post records in batches.
Failed records will be written to the file handle provided during initialization.
- Parameters:
records – Records to post. Can be: - List of dict records - File-like object containing JSON lines (one record per line) - String/Path to a file containing JSON lines
- Return type:
None
- async do_work(file_paths)
Main orchestration method for processing files.
This is the primary entry point for batch posting from files. It handles:
Single or multiple file processing
Progress tracking and logging
Failed record collection
Statistics reporting
Mimics the folio_migration_tools BatchPoster.do_work() workflow.
Note
To write failed records, pass a file handle or path to the BatchPoster constructor’s
failed_records_fileparameter.- Parameters:
file_paths (
Union[str,Path,List[Union[str,Path]]]) – Path(s) to JSONL file(s) to process- Return type:
BatchPosterStats- Returns:
Final statistics from the posting operation
Example:
config = BatchPosterConfig( object_type="Items", batch_size=100, upsert=True ) reporter = RichProgressReporter(enabled=True) # With failed records file with open("failed_items.jsonl", "w") as failed_file: poster = BatchPoster( folio_client, config, failed_records_file=failed_file, reporter=reporter ) async with poster: stats = await poster.do_work(["items1.jsonl", "items2.jsonl"]) # Or let BatchPoster manage the file poster = BatchPoster( folio_client, config, failed_records_file="failed_items.jsonl", reporter=reporter ) async with poster: stats = await poster.do_work("items.jsonl") print(f"Posted: {stats.records_posted}, Failed: {stats.records_failed}")
- async rerun_failed_records_one_by_one()
Reprocess failed records one at a time.
Streams through the failed records file, processing each record individually. Records that still fail are written to a new file with ‘_rerun’ suffix. This gives each record a second chance with individual error handling.
- Return type:
None
- get_stats()
Get current posting statistics.
- Return type:
BatchPosterStats- Returns:
Current statistics
- folio_data_import.BatchPoster.get_human_readable_size(size, precision=2)
Convert bytes to human-readable format.
- Parameters:
size (
int) – Size in bytesprecision (
int) – Number of decimal places
- Return type:
str- Returns:
Human-readable size string
- folio_data_import.BatchPoster.get_req_size(response)
- folio_data_import.BatchPoster.main(config_file=None, *, gateway_url=None, tenant_id=None, username=None, password=None, member_tenant_id=None, object_type=None, file_paths=None, batch_size=100, upsert=False, preserve_statistical_codes=False, preserve_administrative_notes=False, preserve_temporary_locations=False, preserve_temporary_loan_types=False, overwrite_item_status=False, patch_existing_records=False, patch_paths=None, failed_records_file=None, rerun_failed_records=False, no_progress=False, debug=False)
Command-line interface to batch post inventory records to FOLIO
- Parameters:
config_file (
Path|None) – Path to JSON config file (overrides CLI parameters).gateway_url (
str|None) – The FOLIO API Gateway URL.tenant_id (
str|None) – The tenant id.username (
str|None) – The FOLIO username.password (
str|None) – The FOLIO password.member_tenant_id (
str|None) – The FOLIO ECS member tenant id (if applicable).object_type (
Optional[Literal['Instances','Holdings','Items','ShadowInstances']]) – Type of inventory object (Instances, Holdings, or Items).file_paths (
tuple[Path,...] |None) – Path(s) to JSONL file(s) to post.batch_size (
int) – Number of records to include in each batch (1-1000).upsert (
bool) – Enable upsert mode to update existing records.preserve_statistical_codes (
bool) – Preserve existing statistical codes during upsert.preserve_administrative_notes (
bool) – Preserve existing administrative notes during upsert.preserve_temporary_locations (
bool) – Preserve temporary location assignments during upsert.preserve_temporary_loan_types (
bool) – Preserve temporary loan type assignments during upsert.overwrite_item_status (
bool) – Overwrite item status during upsert.patch_existing_records (
bool) – Enable selective field patching during upsert.patch_paths (
str|None) – Comma-separated list of field paths to patch.failed_records_file (
Path|None) – Path to file for writing failed records.rerun_failed_records (
bool) – After the main run, reprocess failed records one at a time.no_progress (
bool) – Disable progress bar display.debug (
bool) – Enable debug logging.
- Return type:
None
- folio_data_import.BatchPoster.parse_config_file(config_file)
- folio_data_import.BatchPoster.parse_patch_paths(patch_paths)
- folio_data_import.BatchPoster.expand_file_paths(file_paths)
- async folio_data_import.BatchPoster.run_batch_poster(folio_client, config, files_to_process, failed_records_file)
Run the batch poster operation.
- Parameters:
folio_client (
FolioClient) – Authenticated FOLIO clientconfig (
Config) – BatchPoster configurationfiles_to_process (
List[Path]) – List of file paths to processfailed_records_file (
Path|None) – Optional path for failed records
- folio_data_import.BatchPoster.log_final_stats(poster)
Log the final statistics after batch posting.
- Parameters:
poster (
BatchPoster) – The BatchPoster instance containing the stats- Return type:
None
MARCDataImport#
Module for importing MARC records using FOLIO’s Data Import APIs.
- class folio_data_import.MARCDataImport.MARCImportStats(**data)
Bases:
BaseModelStatistics for MARC import operations.
-
records_sent:
int
-
records_processed:
int
-
created:
int
-
updated:
int
-
discarded:
int
-
error:
int
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
records_sent:
- class folio_data_import.MARCDataImport.MARCImportJob(folio_client, config, reporter=None)
Bases:
objectClass to manage importing MARC data (Bib, Authority) into FOLIO using the Change Manager APIs (folio-org/mod-source-record-manager), rather than file-based Data Import. When executed in an interactive environment, it can provide progress bars for tracking the number of records both uploaded and processed.
- Parameters:
folio_client (
FolioClient) – An instance of the FolioClient class.marc_files (
list) – A list of Path objects representing the MARC files to import.import_profile_name (
str) – The name of the data import job profile to use.batch_size (
int) – The number of source records to include in a record batch (default=10).batch_delay (
float) – The number of seconds to wait between record batches (default=0).no_progress (
bool) – Disable progress bars (eg. for running in a CI environment).marc_record_preprocessors (
listorstr) – A list of callables, or a string representing a comma-separated list of MARC record preprocessor names to apply to each record before import.preprocessor_args (
dict) – A dictionary of arguments to pass to the MARC record preprocessor(s).let_summary_fail (
bool) – If True, will not retry or fail the import if the final job summary cannot be retrieved (default=False).split_files (
bool) – If True, will split each file into smaller jobs of size split_sizesplit_size (
int) – The number of records to include in each split file (default=1000).split_offset (
int) – The number of split files to skip before starting processing (default=0).job_ids_file_path (
str) – The path to the file where job IDs will be saved (default=”marc_import_job_ids.txt”).show_file_names_in_data_import_logs (
bool) – If True, will set the file name for each job in the data import logs.
- class Config(**data)
Bases:
BaseModelConfiguration for MARC import operations.
-
marc_files:
Annotated[List[Path]]
-
import_profile_name:
Annotated[str]
-
batch_size:
Annotated[int]
-
batch_delay:
Annotated[float]
-
marc_record_preprocessors:
Annotated[Union[List[Callable],str,None]]
-
preprocessors_args:
Annotated[Optional[Dict[str,Dict]]]
-
no_progress:
Annotated[bool]
-
no_summary:
Annotated[bool]
-
let_summary_fail:
Annotated[bool]
-
split_files:
Annotated[bool]
-
split_size:
Annotated[int]
-
split_offset:
Annotated[int]
-
job_ids_file_path:
Annotated[Path|None]
-
show_file_names_in_data_import_logs:
Annotated[bool]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
marc_files:
-
bad_records_file:
BinaryIO
-
failed_batches_file:
BinaryIO
-
task_sent:
str
-
task_imported:
str
-
http_client:
Client
-
current_file:
Union[List[Path],List[BinaryIO]]
-
record_batch:
List[bytes]
-
last_current:
int= 0
-
total_records_sent:
int= 0
-
finished:
bool= False
-
job_id:
str= ''
-
job_ids:
List[str]
-
job_hrid:
int= 0
-
reporter:
ProgressReporter
- async do_work()
Performs the necessary work for data import.
This method initializes an HTTP client, files to store records that fail to send, and calls the appropriate method to import MARC files based on the configuration.
- Return type:
None- Returns:
None
- async process_split_files()
Process the import of files in smaller batches. This method is called when split_files is set to True. It splits each file into smaller chunks and processes them one by one.
- async wrap_up()
Wraps up the data import process.
This method is called after the import process is complete. It checks for empty bad records and error files and removes them.
- Return type:
None- Returns:
None
- async get_job_status()
Retrieves the status of a job execution.
- Return type:
None- Returns:
None
- Raises:
IndexError – If the job execution with the specified ID is not found.
- async set_job_file_name()
Sets the file name for the current job execution.
- Return type:
None- Returns:
None
- async create_folio_import_job()
Creates a job execution for importing data into FOLIO.
- Return type:
None- Returns:
None
- Raises:
FolioHTTPError – If there is an error creating the job.
- property import_profile: dict
Returns the import profile for the current job execution.
- Returns:
The import profile for the current job execution.
- Return type:
dict
- async set_job_profile()
Sets the job profile for the current job execution.
- Return type:
None- Returns:
The response from the HTTP request to set the job profile.
- async static read_total_records(files)
Count records from files with per-file logging.
- Parameters:
files (
list) – List of files to read.- Returns:
The total number of records found in the files.
- Return type:
int
- async process_record_batch(batch_payload)
Processes a record batch.
- Parameters:
batch_payload (
dict) – A records payload containing the current batch of MARC records.- Return type:
None
- async process_records(files, total_records)
Process records from the given files.
- Parameters:
files (
list) – List of files to process.total_records (
int) – Total number of records to process.pbar_sent – Progress bar for tracking the number of records sent.
- Return type:
None- Returns:
None
- move_file_to_complete(file_path)
- Return type:
None
- async create_batch_payload(counter, total_records, is_last)
Create a batch payload for data import.
- Parameters:
counter (
int) – The current counter value.total_records (
int) – The total number of records.is_last (
bool) – Indicates if this is the last batch.
- Returns:
The batch payload containing the ID, records metadata, and initial records.
- Return type:
dict
- static split_marc_file(file_path, batch_size)
Generator to iterate over MARC records in batches, yielding BytesIO objects.
- Return type:
Generator[BytesIO,None,None]
- async import_marc_file()
Imports MARC file into the system.
This method performs the following steps: 1. Creates a FOLIO import job. 2. Retrieves the import profile. 3. Sets the job profile. 4. Opens the MARC file(s) and reads the total number of records. 5. Displays progress bars for imported and sent records. 6. Processes the records and updates the progress bars. 7. Checks the job status periodically until the import is finished.
Note: This method assumes that the necessary instance attributes are already set.
- Return type:
None- Returns:
None
- async cancel_job()
Cancels the current job execution.
This method sends a request to cancel the job execution and logs the result.
- Return type:
None- Returns:
None
- async log_job_summary()
- async get_job_summary()
Retrieves the job summary for the current job execution.
- Returns:
The job summary for the current job execution.
- Return type:
dict
- folio_data_import.MARCDataImport.main(config_file=None, *, gateway_url=None, tenant_id=None, username=None, password=None, marc_file_paths=None, member_tenant_id=None, import_profile_name=None, batch_size=10, batch_delay=0.0, preprocessors=None, preprocessors_config=None, file_names_in_di_logs=False, split_files=False, split_size=1000, split_offset=0, no_progress=False, no_summary=False, let_summary_fail=False, job_ids_file_path=None, debug=False)
Command-line interface to batch import MARC records into FOLIO using FOLIO Data Import
- Parameters:
config_file (
Path | None) – Path to JSON config file for the import job, overrides other parameters if provided.gateway_url (
str) – The FOLIO API Gateway URL.tenant_id (
str) – The tenant id.username (
str) – The FOLIO username.password (
str) – The FOLIO password.marc_file_paths (
List[Path]) – The MARC file(s) or glob pattern(s) to import.member_tenant_id (
str) – The FOLIO ECS member tenant id (if applicable).import_profile_name (
str) – The name of the import profile to use.batch_size (
int) – The number of records to send in each batch.batch_delay (
float) – The delay (in seconds) between sending each batch.preprocessors (
str) – Comma-separated list of MARC record preprocessors to use.preprocessors_config (
str) – Path to JSON config file for the preprocessors.file_names_in_di_logs (
bool) – Show file names in data import logs.split_files (
bool) – Split files into smaller batches.split_size (
int) – The number of records per split batch.split_offset (
int) – The number of split batches to skip before starting import.no_progress (
bool) – Disable progress bars.no_summary (
bool) – Skip the final job summary.let_summary_fail (
bool) – Let the final summary check fail without exiting.preprocessor_config (
str) – Path to JSON config file for the preprocessor.job_ids_file_path (
str) – Path to file to write job IDs to.debug (
bool) – Enable debug logging.
- Return type:
None
- folio_data_import.MARCDataImport.select_import_profile(folio_client)
- folio_data_import.MARCDataImport.collect_marc_file_paths(marc_file_paths)
- async folio_data_import.MARCDataImport.run_job(job)
UserImport#
Module for importing user data into FOLIO.
- class folio_data_import.UserImport.UserImporterStats(**data)
Bases:
BaseModelStatistics for user import operations.
-
created:
int
-
updated:
int
-
failed:
int
-
deleted:
int
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
created:
- class folio_data_import.UserImport.UserImporter(folio_client, config, reporter=None)
Bases:
objectClass to import mod-user-import compatible user objects (eg. from folio_migration_tools UserTransformer task) from a JSON-lines file into FOLIO
- class Config(**data)
Bases:
BaseModelConfiguration for UserImporter operations.
-
library_name:
Annotated[str]
-
batch_size:
Annotated[int]
-
user_match_key:
Annotated[Literal['externalSystemId','username','barcode']]
-
only_update_present_fields:
Annotated[bool]
-
default_preferred_contact_type:
Annotated[Literal['001','002','003','004','005','mail','email','text','phone','mobile']]
-
fields_to_protect:
Annotated[List[str]]
-
limit_simultaneous_requests:
Annotated[int]
-
user_file_paths:
Annotated[Union[Path,List[Path],None]]
-
no_progress:
Annotated[bool]
-
delete_all:
Annotated[bool]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
-
library_name:
-
logfile:
AsyncTextIOWrapper
-
errorfile:
AsyncTextIOWrapper
-
http_client:
AsyncClient
- static build_ref_data_id_map(folio_client, endpoint, key, name)
Builds a map of reference data IDs.
- Parameters:
folio_client (
folioclient.FolioClient) – A FolioClient object.endpoint (
str) – The endpoint to retrieve the reference data from.key (
str) – The key to use as the map key.
- Returns:
A dictionary mapping reference data keys to their corresponding IDs.
- Return type:
dict
- static validate_uuid(uuid_string)
Validate a UUID string.
- Parameters:
uuid_string (
str) – The UUID string to validate.- Returns:
True if the UUID is valid, otherwise False.
- Return type:
bool
- async setup(error_file_path)
Sets up the importer by initializing necessary resources.
- Parameters:
log_file_path (
Path) – The path to the log file.error_file_path (
Path) – The path to the error file.
- Return type:
None
- async close()
Closes the importer by releasing any resources.
- Return type:
None
- async do_import()
Main method to import users.
This method triggers the process of importing users by calling the process_file method. Supports both single file path and list of file paths.
- Return type:
None
- async get_existing_user(user_obj)
Retrieves an existing user from FOLIO based on the provided user object.
- Parameters:
user_obj – The user object containing the information to match against existing users.
- Return type:
dict- Returns:
The existing user object if found, otherwise an empty dictionary.
- async get_existing_rp(user_obj, existing_user)
Retrieves the existing request preferences for a given user.
- Parameters:
user_obj (
dict) – The user object.existing_user (
dict) – The existing user object.
- Returns:
The existing request preferences for the user.
- Return type:
dict
- async get_existing_pu(user_obj, existing_user)
Retrieves the existing permission user for a given user.
- Parameters:
user_obj (
dict) – The user object.existing_user (
dict) – The existing user object.
- Returns:
The existing permission user object.
- Return type:
dict
- async map_address_types(user_obj, line_number)
Maps address type names in the user object to the corresponding ID in the address_type_map.
- Parameters:
user_obj (
dict) – The user object containing personal information.address_type_map (
dict) – A dictionary mapping address type names to their ID values.
- Return type:
None- Returns:
None
- Raises:
KeyError – If an address type name in the user object is not found in address_type_map.
- async map_patron_groups(user_obj, line_number)
Maps the patron group of a user object using the provided patron group map.
- Parameters:
user_obj (
dict) – The user object to update.patron_group_map (
dict) – A dictionary mapping patron group names.
- Return type:
None- Returns:
None
- async map_departments(user_obj, line_number)
Maps the departments of a user object using the provided department map.
- Parameters:
user_obj (
dict) – The user object to update.department_map (
dict) – A dictionary mapping department names.
- Return type:
None- Returns:
None
- async update_existing_user(user_obj, existing_user, protected_fields)
Updates an existing user with the provided user object.
- Parameters:
user_obj (
dict) – The user object containing the updated user information.existing_user (
dict) – The existing user object to be updated.protected_fields (
dict) – A dictionary containing the protected fields and their values.
- Returns:
A tuple containing the updated existing user object and the API response.
- Return type:
tuple
- Raises:
None –
- async create_new_user(user_obj)
Creates a new user in the system.
- Parameters:
user_obj (
dict) – A dictionary containing the user information.- Returns:
A dictionary representing the response from the server.
- Return type:
dict
- Raises:
HTTPError – If the HTTP request to create the user fails.
- async set_preferred_contact_type(user_obj, existing_user)
Sets the preferred contact type for a user object. If the provided preferred contact type is not valid, the default preferred contact type is used, unless the previously existing user object has a valid preferred contact type set. In that case, the existing preferred contact type is used.
- Return type:
None
- async create_or_update_user(user_obj, existing_user, protected_fields, line_number)
Creates or updates a user based on the given user object and existing user.
- Parameters:
user_obj (
dict) – The user object containing the user details.existing_user (
dict) – The existing user object to be updated, if available.logs (
dict) – A dictionary to keep track of the number of updates and failures.
- Returns:
The updated or created user object, or an empty dictionary an error occurs.
- Return type:
dict
- async process_user_obj(user)
Process a user object. If not type is found in the source object, type is set to “patron”.
- Parameters:
user (
str) – The user data to be processed, as a json string.- Returns:
The processed user object.
- Return type:
dict
- async get_protected_fields(existing_user)
Retrieves the protected fields from the existing user object, combining both the customFields.protectedFields list and any fields_to_protect passed on the CLI.
- Parameters:
existing_user (
dict) – The existing user object.- Returns:
A dictionary containing the protected fields and their values.
- Return type:
dict
- async process_existing_user(user_obj)
Process an existing user.
- Parameters:
user_obj (
dict) – The user object to process.- Returns:
- A tuple containing the request preference object (rp_obj),
the service points user object (spu_obj), the existing user object, the protected fields, the existing request preference object (existing_rp), the existing PU object (existing_pu), and the existing SPU object (existing_spu).
- Return type:
tuple
- async create_or_update_rp(rp_obj, existing_rp, new_user_obj)
Creates or updates a requet preference object based on the given parameters.
- Parameters:
rp_obj (
object) – A new requet preference object.existing_rp (
object) – The existing resource provider object, if it exists.new_user_obj (
object) – The new user object.
- Returns:
None
- async create_new_rp(new_user_obj)
Creates a new request preference for a user.
- Parameters:
new_user_obj (
dict) – The user object containing the user’s ID.- Raises:
HTTPError – If there is an error in the HTTP request.
- Returns:
None
- async update_existing_rp(rp_obj, existing_rp)
Updates an existing request preference with the provided request preference object.
- Parameters:
rp_obj (
dict) – The request preference object containing the updated values.existing_rp (
dict) – The existing request preference object to be updated.
- Raises:
HTTPError – If the PUT request to update the request preference fails.
- Return type:
None- Returns:
None
- async create_perms_user(new_user_obj)
Creates a permissions user object for the given new user.
- Parameters:
new_user_obj (
dict) – A dictionary containing the details of the new user.- Raises:
HTTPError – If there is an error while making the HTTP request.
- Return type:
None- Returns:
None
- async delete_user(existing_user, existing_rp, existing_pu, existing_spu, line_number)
Deletes a user and associated objects.
- Parameters:
existing_user (
dict) – The existing user object to be deleted.existing_rp (
dict) – The existing request preference object associated with the user.existing_pu (
dict) – The existing permission user object associated with the user.existing_spu (
dict) – The existing service points user object associated with the user.line_number (
int) – The line number in the input file for logging purposes.
- Return type:
None- Returns:
None
- async process_line(user, line_number)
Process a single line of user data.
- Parameters:
user (
str) – The user data to be processed.logs (
dict) – A dictionary to store logs.
- Return type:
None- Returns:
None
- Raises:
Any exceptions that occur during the processing. –
- async map_service_points(spu_obj, existing_user)
Maps the service points of a user object using the provided service point map.
- Parameters:
spu_obj (
dict) – The service-points-user object to update.existing_user (
dict) – The existing user object associated with the spu_obj.
- Returns:
None
- async handle_service_points_user(spu_obj, existing_spu, existing_user)
Handles processing a service-points-user object for a user.
- Parameters:
spu_obj (
dict) – The service-points-user object to process.existing_spu (
dict) – The existing service-points-user object, if it exists.existing_user (
dict) – The existing user object associated with the spu_obj.
- async get_existing_spu(existing_user)
Retrieves the existing service-points-user object for a given user.
- Parameters:
existing_user (
dict) – The existing user object.- Returns:
The existing service-points-user object.
- Return type:
dict
- async create_new_spu(spu_obj, existing_user)
Creates a new service-points-user object for a given user.
- Parameters:
spu_obj (
dict) – The service-points-user object to create.existing_user (
dict) – The existing user object.
- Returns:
None
- async update_existing_spu(spu_obj, existing_spu)
Updates an existing service-points-user object with the provided service-points-user object.
- Parameters:
spu_obj (
dict) – The service-points-user object containing the updated values.existing_spu (
dict) – The existing service-points-user object to be updated.
- Returns:
None
- async process_file(openfile)
Process the user object file.
- Parameters:
openfile (
TextIOWrapper) – The file or file-like object to process.- Return type:
None
- get_stats()
Get current import statistics.
- Return type:
UserImporterStats- Returns:
Current statistics
- folio_data_import.UserImport.main(config_file=None, *, gateway_url=None, tenant_id=None, username=None, password=None, library_name=None, user_file_paths=None, member_tenant_id=None, delete_all=False, fields_to_protect=None, update_only_present_fields=False, limit_async_requests=10, batch_size=250, report_file_base_path=None, user_match_key='externalSystemId', default_preferred_contact_type='email', no_progress=False, yes=False, debug=False)
Command-line interface to batch import users into FOLIO
- Parameters:
config_file (
Path | None) – Path to a JSON configuration file. Overrides job configuration parameters if provided.gateway_url (
str) – The FOLIO gateway URL.tenant_id (
str) – The FOLIO tenant ID.username (
str) – The FOLIO username.password (
str) – The FOLIO password.library_name (
str) – The library name associated with the job.user_file_paths (
Tuple[Path,]) – Path(s) to the user data file(s). Use –user-file-paths or –user-file-path (deprecated, will be removed in future versions).member_tenant_id (
str) – The FOLIO ECS member tenant id (if applicable).delete_all (
bool) – Whether to delete existing users, rather than create/update.fields_to_protect (
str) – Comma-separated list of fields to protect during update.update_only_present_fields (
bool) – Whether to update only fields present in the input.limit_async_requests (
int) – The maximum number of concurrent async HTTP requests.batch_size (
int) – The number of users to process in each batch.report_file_base_path (
Path) – The base path for report files.user_match_key (
str) – The key to match users (externalSystemId, username, barcode).default_preferred_contact_type (
str) – The default preferred contact type for usersno_progress (
bool) – Whether to disable the progress bar.yes (
bool) – Skip confirmation prompt for destructive operations (e.g. –delete-all).debug (
bool) – Enable debug logging.
- Return type:
None
- folio_data_import.UserImport.pathify_user_file_paths(user_file_paths)
- async folio_data_import.UserImport.run_user_importer(importer, error_file_path)
Custom Exceptions#
Exception classes used throughout the toolkit.
Custom exceptions for the Folio Data Import module.
- exception folio_data_import.custom_exceptions.FolioDataImportError
Bases:
ExceptionBase class for all exceptions in the Folio Data Import module.
- exception folio_data_import.custom_exceptions.FolioDataImportBatchError(batch_id, message, exception=None)
Bases:
FolioDataImportErrorException raised for errors in the Folio Data Import batch process.
- batch_id -- ID of the batch that caused the error
- message -- explanation of the error
- exception folio_data_import.custom_exceptions.FolioDataImportJobError(job_id, message, exception=None)
Bases:
FolioDataImportErrorException raised for errors in the Folio Data Import job process.
- job_id -- ID of the job that caused the error
- message -- explanation of the error
Progress Reporting#
Progress tracking and reporting utilities.
Progress reporting abstraction for FOLIO data import tasks.
This module provides a UI-agnostic progress reporting system that can be used across all import tasks (BatchPoster, UserImport, MARCDataImport, etc.) with support for multiple simultaneous tasks and easy backend swapping.
- class folio_data_import._progress.TaskStatus(*values)
Bases:
EnumStatus of a progress task.
- PENDING = 'pending'
- RUNNING = 'running'
- COMPLETED = 'completed'
- FAILED = 'failed'
- class folio_data_import._progress.ProgressReporter(*args, **kwargs)
Bases:
ProtocolProtocol defining the interface for progress reporters.
This protocol allows for easy swapping between different UI implementations (CLI, GUI, web) without changing the core business logic.
- start_task(name, total=None, description=None)
Start a new progress task.
- Parameters:
name (
str) – Unique identifier for the tasktotal (
int|None) – Total number of items to process (None for indeterminate)description (
str|None) – Human-readable task description
- Return type:
str- Returns:
Task ID that can be used to update this task
- update_task(task_id, advance=0, total=None, description=None, **stats)
Update an existing task’s progress and statistics.
- Parameters:
task_id (
str) – ID of the task to updateadvance (
int) – Number of items to advance bytotal (
int|None) – New total (if changed)description (
str|None) – New description (if changed)**stats (
Any) – Additional statistics to track (created, updated, failed, etc.)
- Return type:
None
- finish_task(task_id, status=TaskStatus.COMPLETED)
Mark a task as finished.
- Parameters:
task_id (
str) – ID of the task to finishstatus (
TaskStatus) – Final status of the task
- Return type:
None
- is_active()
Check if progress reporting is active.
- Return type:
bool
- class folio_data_import._progress.BaseProgressReporter(enabled=True)
Bases:
ABCAbstract base class for progress reporters.
Provides common functionality and enforces the interface contract.
- __init__(enabled=True)
Initialize the progress reporter.
- Parameters:
enabled (
bool) – Whether progress reporting is enabled
- abstractmethod start_task(name, total=None, description=None)
Start a new progress task.
- Return type:
str
- abstractmethod update_task(task_id, advance=0, total=None, description=None, **stats)
Update an existing task.
- Return type:
None
- abstractmethod finish_task(task_id, status=TaskStatus.COMPLETED)
Finish a task.
- Return type:
None
- is_active()
Check if reporter is active.
- Return type:
bool
- get_stats(task_id)
Get statistics for a task.
- Return type:
dict[str,Any] |None
- abstractmethod __enter__()
Enter context manager.
- Return type:
BaseProgressReporter
- abstractmethod __exit__(exc_type, exc_val, exc_tb)
Exit context manager.
- Return type:
None
- class folio_data_import._progress.ItemsPerSecondColumn(table_column=None)
Bases:
ProgressColumnRenders the speed in items per second.
- render(task)
Should return a renderable object.
- Return type:
Text
- class folio_data_import._progress.UserStatsColumn(table_column=None)
Bases:
ProgressColumn- render(task)
Should return a renderable object.
- Return type:
Text
- class folio_data_import._progress.BatchPosterStatsColumn(table_column=None)
Bases:
ProgressColumnRenders statistics for batch posting operations.
- render(task)
Should return a renderable object.
- Return type:
Text
- class folio_data_import._progress.GenericStatsColumn(table_column=None)
Bases:
ProgressColumnRenders generic statistics for any task.
The
stat_configsclass attribute can be customized by subclassing or direct assignment to change which stats are displayed and their styling.Example:
# Customize via subclass class CustomStatsColumn(GenericStatsColumn): stat_configs = [ ("imported", "Imported", "bright_blue"), ("skipped", "Skipped", "yellow"), ("failed", "Failed", "red"), ] # Or modify directly GenericStatsColumn.stat_configs.append(("custom_stat", "Custom", "magenta"))
- stat_configs: list[tuple[str, str, str]] = [('posted', 'Posted', 'bright_green'), ('created', 'Created', 'green'), ('updated', 'Updated', 'cyan'), ('deleted', 'Deleted', 'yellow'), ('failed', 'Failed', 'red'), ('processed', 'Processed', 'blue')]
- render(task)
Render statistics based on configured stats.
- Return type:
Text
- class folio_data_import._progress.RichProgressReporter(enabled=True, show_speed=True, show_time=True)
Bases:
BaseProgressReporterRich terminal-based progress reporter.
Provides a beautiful CLI progress display using the Rich library with support for multiple simultaneous tasks, live updates, and logging.
- __init__(enabled=True, show_speed=True, show_time=True)
Initialize the Rich progress reporter.
- Parameters:
enabled (
bool) – Whether progress reporting is enabledshow_speed (
bool) – Whether to show items/secondshow_time (
bool) – Whether to show elapsed/remaining time
- start_task(name, total=None, description=None)
Start a new progress task.
- Return type:
str
- update_task(task_id, advance=0, total=None, description=None, **stats)
Update an existing task.
- Return type:
None
- finish_task(task_id, status=TaskStatus.COMPLETED)
Finish a task.
- Return type:
None
- __enter__()
Enter context manager.
- Return type:
RichProgressReporter
- __exit__(exc_type, exc_val, exc_tb)
Exit context manager.
- Return type:
None
- class folio_data_import._progress.RedisProgressReporter(enabled=True, redis_url='redis://localhost:6379', session_id=None, ttl=3600)
Bases:
BaseProgressReporterProgress reporter that stores state in Redis for distributed access.
Stores progress updates in Redis that can be accessed by separate processes or API endpoints. Requires redis package to be installed.
Example
>>> reporter = RedisProgressReporter( ... redis_url="redis://localhost:6379", ... session_id="import-123" ... ) >>> with reporter: ... task_id = reporter.start_task("users", total=100) ... reporter.update_task(task_id, advance=10, created=5)
- __init__(enabled=True, redis_url='redis://localhost:6379', session_id=None, ttl=3600)
Initialize the Redis progress reporter.
- Parameters:
enabled (
bool) – Whether progress reporting is enabledredis_url (
str) – Redis connection URLsession_id (
str|None) – Unique identifier for this progress sessionttl (
int) – Time-to-live for session data in seconds (default: 1 hour)
- Raises:
ImportError – If redis package is not installed
- start_task(name, total=None, description=None)
Start a new progress task.
- Return type:
str
- update_task(task_id, advance=0, total=None, description=None, **stats)
Update an existing task.
- Return type:
None
- finish_task(task_id, status=TaskStatus.COMPLETED)
Finish a task.
- Return type:
None
- classmethod get_session(session_id, redis_url='redis://localhost:6379')
Get the current state of a session from Redis.
- Parameters:
session_id (
str) – The session ID to retrieveredis_url (
str) – Redis connection URL
- Return type:
dict[str,Any] |None- Returns:
Session data dictionary or None if not found
- Raises:
ImportError – If redis package is not installed
- classmethod delete_session(session_id, redis_url='redis://localhost:6379')
Delete a session from Redis.
- Parameters:
session_id (
str) – The session ID to deleteredis_url (
str) – Redis connection URL
- Return type:
bool- Returns:
True if deleted, False if not found
- Raises:
ImportError – If redis package is not installed
- __enter__()
Enter context manager.
- Return type:
RedisProgressReporter
- __exit__(exc_type, exc_val, exc_tb)
Exit context manager.
- Return type:
None
- class folio_data_import._progress.NoOpProgressReporter
Bases:
BaseProgressReporterNo-operation progress reporter for when progress display is disabled.
- __init__()
Initialize the no-op reporter.
- start_task(name, total=None, description=None)
Start a task (no-op).
- Return type:
str
- update_task(task_id, advance=0, total=None, description=None, **stats)
Update a task (no-op).
- Return type:
None
- finish_task(task_id, status=TaskStatus.COMPLETED)
Finish a task (no-op).
- Return type:
None
- __enter__()
Enter context manager.
- Return type:
NoOpProgressReporter
- __exit__(exc_type, exc_val, exc_tb)
Exit context manager.
- Return type:
None
MARC Preprocessors#
MARC record preprocessor functions.
- class folio_data_import.marc_preprocessors._preprocessors.MARCPreprocessor(preprocessors, **kwargs)
Bases:
objectA class to preprocess MARC records for data import into FOLIO.
- __init__(preprocessors, **kwargs)
Initialize the MARCPreprocessor with a list of preprocessors.
- Parameters:
preprocessors (
Union[str,List[Callable]]) – A string of comma-separated function names or a list of callable preprocessor functions to apply.
- do_work(record)
Preprocess the MARC record.
- Return type:
Record
- folio_data_import.marc_preprocessors._preprocessors.prepend_prefix_001(record, prefix)
Prepend a prefix to the record’s 001 field.
- Parameters:
record (
Record) – The MARC record to preprocess.prefix (
str) – The prefix to prepend to the 001 field.
- Returns:
The preprocessed MARC record.
- Return type:
Record
- folio_data_import.marc_preprocessors._preprocessors.prepend_ppn_prefix_001(record, **kwargs)
Prepend the PPN prefix to the record’s 001 field. Useful when importing records from the ABES SUDOC catalog
- Parameters:
record (
Record) – The MARC record to preprocess.- Returns:
The preprocessed MARC record.
- Return type:
Record
- folio_data_import.marc_preprocessors._preprocessors.prepend_abes_prefix_001(record, **kwargs)
Prepend the ABES prefix to the record’s 001 field. Useful when importing records from the ABES SUDOC catalog
- Parameters:
record (
Record) – The MARC record to preprocess.- Returns:
The preprocessed MARC record.
- Return type:
Record
- folio_data_import.marc_preprocessors._preprocessors.strip_999_ff_fields(record, **kwargs)
Strip all 999 fields with ff indicators from the record. Useful when importing records exported from another FOLIO system
- Parameters:
record (
Record) – The MARC record to preprocess.- Returns:
The preprocessed MARC record.
- Return type:
Record
- folio_data_import.marc_preprocessors._preprocessors.clean_999_fields(record, **kwargs)
The presence of 999 fields, with or without ff indicators, can cause issues with data import mapping in FOLIO. This function calls strip_999_ff_fields to remove 999 fields with ff indicators and then copies the remaining 999 fields to 945 fields.
- Parameters:
record (
Record) – The MARC record to preprocess.- Returns:
The preprocessed MARC record.
- Return type:
Record
- folio_data_import.marc_preprocessors._preprocessors.clean_non_ff_999_fields(record, **kwargs)
When loading migrated MARC records from folio_migration_tools, the presence of other 999 fields than those set by the migration process can cause the record to fail to load properly. This preprocessor function moves all 999 fields with non-ff indicators to 945 fields with 99 indicators.
- Return type:
Record
- folio_data_import.marc_preprocessors._preprocessors.sudoc_supercede_prep(record, **kwargs)
Preprocesses a record from the ABES SUDOC catalog to copy 035 fields with a $9 subfield value of ‘sudoc’ to 935 fields with a $a subfield prefixed with “(ABES)”. This is useful when importing newly-merged records from the SUDOC catalog when you want the new record to replace the old one in FOLIO. This also applyes the prepend_ppn_prefix_001 function to the record.
- Parameters:
record (
Record) – The MARC record to preprocess.- Returns:
The preprocessed MARC record.
- Return type:
Record
- folio_data_import.marc_preprocessors._preprocessors.clean_empty_fields(record, **kwargs)
Remove empty fields and subfields from the record. These can cause data import mapping issues in FOLIO. Removals are logged at custom log level 26, which is used by folio_migration_tools to populate the data issues report.
- Parameters:
record (
Record) – The MARC record to preprocess.- Returns:
The preprocessed MARC record.
- Return type:
Record
- folio_data_import.marc_preprocessors._preprocessors.fix_bib_leader(record, **kwargs)
Fixes the leader of the record by setting the record status to ‘c’ (modified record) and the type of record to ‘a’ (language material).
- Parameters:
record (
Record) – The MARC record to preprocess.- Returns:
The preprocessed MARC record.
- Return type:
Record
- folio_data_import.marc_preprocessors._preprocessors.move_authority_subfield_9_to_0_all_controllable_fields(record, **kwargs)
Move subfield 9 from authority fields to subfield 0. This is useful when importing records from the ABES SUDOC catalog.
- Parameters:
record (
Record) – The MARC record to preprocess.- Returns:
The preprocessed MARC record.
- Return type:
Record
- folio_data_import.marc_preprocessors._preprocessors.mark_deleted(record, **kwargs)
Mark the record as deleted by setting the record status to ‘d’.
- Parameters:
record (
Record) – The MARC record to preprocess.- Returns:
The preprocessed MARC record.
- Return type:
Record
- folio_data_import.marc_preprocessors._preprocessors.remove_non_numeric_fields(record, **kwargs)
Remove all fields from the record that have non-numeric tags (not matching pattern 001-999). Also removes field 000, which is invalid in MARC records.
- Parameters:
record (
Record) – The MARC record to preprocess.- Returns:
The preprocessed MARC record.
- Return type:
Record
- folio_data_import.marc_preprocessors._preprocessors.ordinal(n)
- Return type:
str