User Guide¶
Introduction¶
About NLNZ Tools Scripts Ingestion¶
NLNZ Tools Scripts Ingestion is a set of scripts related to the processing of SIPs for ingestion into the Rosetta archiving system. The aim is to useful tools to help in that processing.
Contents of this document¶
Following this introduction, this User Guide includes the following sections:
- Fairfax ingestion related scripts - Covers Fairfax ingestion related scripts.
- Reports scripts - Covers reports-related scripts.
- Utilities scripts - Covers useful utility scripts.
- Running requirements - Covers running requirements.
Reports scripts¶
reports/daily-file-usage-report.py¶
Provides a daily usage report of a set of subfolders of a given root folder.
Arguments¶
-h, --help show this help message and exit
--source_folder SOURCE_FOLDER
The root source-folder for the report.
--reports_folder REPORTS_FOLDER
The folder where reports exist and get written.
--number_previous_days NUMBER_PREVIOUS_DAYS
The number of previous days to include in the report.
The default is 0.
--create_reports_folder
Indicates that the reports folder will get created. Otherwise it must already exist.
--include_file_details_in_console_output
Indicates that individual file details will output to the console as well as the reports file.
--calculate_md5_hash Calculate and report the md5 hash of individual files (this is a very intensive I/O operation).
--include_dot_directories
Include first-level root subdirectories that start with a '.'
--ignore_unchanged_directories
Do not report changes for directories that haven't changed.
--verbose Indicates that operations will be done in a verbose manner.
NOTE: This means that no csv report file will be generated.
--debug Indicates that operations will include debug output.
--test Indicates that only tests will be run.
Usage¶
daily-file-usage-report.py [-h] --source_folder SOURCE_FOLDER
--reports_folder REPORTS_FOLDER
[--number_previous_days NUMBER_PREVIOUS_DAYS]
[--create_reports_folder]
[--include_file_details_in_console_output]
[--calculate_md5_hash]
[--include_dot_folders] [--verbose]
[--debug] [--test]
Example usage¶
scriptsFolder="/go/repos-nlnzdigitalpreservation/nlnz-tools-scripts-ingestion/reports"
sourceFolder="/media/legaldep-ftp"
reportsFolder="/media/sf_a-laptop-shared-work/ftp-daily-usage-reports"
${scriptsFolder}/daily-file-usage-report.py \
--source_folder "${sourceFolder}" \
--reports_folder "${reportsFolder}" \
--ignore_unchanged_directories \
--number_previous_days 21
Report output¶
The console output to the report can be used in a csv file. There is also a csv file generated in the reports_folder
that contains a detailed listing .csv of the source folders. This report csv file is then used as input for the
next report, as long as it was generated within the number_previous_days.
Utilities scripts¶
utilities/bulk-file-rename.py¶
Simple utility for renaming files in bulk.
Arguments¶
-h, --help show this help message and exit
--source_folder SOURCE_FOLDER
The root source-folder for the report.
--file_name_portion_to_replace FILE_NAME_PORTION_TO_REPLACE
The portion of the filename that will be replacement.
--file_name_portion_replacement FILE_NAME_PORTION_REPLACEMENT
The replacement portion of the filename. If not specified, then an empty string is used.
--verbose Indicates that operations will be done in a verbose
manner. NOTE: This means that no csv report file will
be generated.
--debug Indicates that operations will include debug output.
--test Indicates that only tests will be run.
Usage¶
usage: bulk-file-rename.py [-h] --source_folder SOURCE_FOLDER \
--file_name_portion_to_replace FILE_NAME_PORTION_TO_REPLACE \
--file_name_portion_replacement FILE_NAME_PORTION_REPLACEMENT \
[--verbose] [--debug] \
[--test]
bulk-file-replace.groovy¶
Replaces a set of files that match a given regex with a replacement file. Use of this script may require editing of the groovy file. Currently the script was used to bulk replace test PDF files with the same hash, but different names.
Arguments¶
targetFolder the target folder containing the files that will be matched.
Note that all the files in the target folder will be checked
(i.e. subdirectories will be searched as well).
replacementFile the file to replace the matched file with. The replacement file will be copied
over the matching file.
Edited values¶
These are values that require editing in the groovy script itself.
regexPattern - the pattern used to match the target file
expectedMd5Hash - the MD5 hash of the target file.
Usage example¶
utilities/bulk-file-replace.groovy /path/to/target/folder utilities/resources/minimal-jhove-acceptable.pdf
Running requirements¶
Python-based scripts¶
Those scripts with a .py extension are Python-based scripts. Currently these scripts run with Python 2.7. None
of the scripts have been upgraded to Python3.
Groovy-based scripts¶
Those scripts with a .groovy extension are Groovy-based. Currently these scripts run with Groovy 2.5.4 or later and
Java OpenJDK 11.
Operating system¶
These scripts have only been tested and run on Ubuntu Linux 18.