Class DerbyConsumer
- All Implemented Interfaces:
IDataConsumer
public class DerbyConsumer extends Object implements IDataConsumer
Derby is an Open Source Apache Software Foundation database. It runs without installation, and has a small footprint, making it a good choice for bundling with applications. For more information see the Derby website.
This class works with an IDataset (e.g. directory) containing one
or more IRecordHolders (e.g. files) containing
IRecords (e.g. rows) of values. It writes these to a
store (in this case a database) as record stores (tables), along with
an appropriate set of metadata tables.
IReportingListener objects may be registered with objects of
this class to receive suitable progress reporting and messaging. In general,
exceptions not dealt with internally are re-thrown for calling objects to
deal with. Messages are user-friendly.
If developers have a specific database and set of table names
they'd like to use, they should first call setStore
and setRecordStoreNames. If these are missing when
initialise is finally called, initialise
will build a directory in the user's space (called after
defaultDatabaseDirectory in application.properties)
within which will be a database named after the dataset or
DEFAULT and tables named after the record holders or
DEFAULTx where x is an integer.
Either way, all developers should call initialise with
a dataset that at least has the fields and fieldtypes in it. This will
build the database structures with headers and set up progress monitoring.
They can then get data written to the database by either calling
bulkLoad with a complete dataset object or
load with a collection of records.
Note that because instance variables will hold a wide variety of information on pervious writes, it is essential that for each new set of files / dataset a new instance of this class is used.
- Author:
- Andy Evans
- Version: 1.0 01 Mar 2021
-
Field Summary
Fields Modifier and Type Field Description private intchunkRecordsByDeprecated.private ConnectionconnectionConnection for SQL requests.private booleanCREATE_SMTConstant for sql statement running.private booleandebugDebugging flag, set by System variable passed in-Ddebug=truerather than setting here / with accessor.private StringDEF_DB_DIRDefault directory to create within store.private StringDEF_FILE_DIRDefault directory for temp files.private booleanINSERT_SMTConstant for sql statement running.private booleanMETADATA_TABLEConstant proveLoaded.private booleanNORMAL_TABLEConstant proveLoaded.private intprogressUsed for progress monitoring.private ArrayList<String>recordStoreNamesRecord store (table) names used.private ArrayList<IReportingListener>reportingListenersListeners interested in updates on progress.private intSANITISE_NAMEConstant for database object name sanitisation.private intSANITISE_VIGOROUSConstant for aggresive sanitisation.private intSANITISE_WEAKConstant for weak sanitisation to remove a few problematic chars.private StringstorePath to database used. -
Constructor Summary
Constructors Constructor Description DerbyConsumer()Default constructor. -
Method Summary
Modifier and Type Method Description voidaddReportingListener(IReportingListener reportingListener)For objects wishing to get progress reports on data reading.private voidbuildMetadataTable(IMetadata metadata, String metadataTableName)Sets up a metadata table for each table.private voidbuildTable(IDataset dataset, int index)Sets up the relevant database table.voidbulkLoad(IDataset dataset)This method bulk-loads a whole dataset into the database the quickest (but memory hungry) way possible - via a CSV file.voidconnectStore()Connects to current store / database.voiddisconnectStore()Disconnects from current store / database and garbage collects.protected StringfindTitle(IMetadata metadata)Finds the title category in an unknown IMetadata object.private voidgapFillLocalisedGUIText()Sets the defaults for warnings and exceptions in English if an appropriate language properties file is missing.ArrayList<String>getRecordStoreNames()Gets the record store names.StringgetStore()Gets the location for the store / database.voidinitialise(IDataset dataset)Sets up the data store (database and tables).voidload(IDataset dataset)Deprecated.This method adds a whole dataset, using INSERT to add a record at a time.voidload(ArrayList<IRecord> records)Adds multiple records to current database.private voidproveLoaded(String tableName, boolean metadataTable)Deprecated.Not needed for this class, but a convenient add-in for proving it works without the need for unit testing or database interrogation software.voidreportMessage(String message)Reports message to reportingListeners.voidreportProgress(int progress, int total)Reports progress to reportingListeners.voidreportProgress(int progress, IDataset dataset)Reports progress to reportingListeners.private voidrunStatement(String sql, String tableName, boolean statementType)Runs SQL statements and closes them.protected Stringsanitise(String string, int level)Sanitises Strings.voidsetRecordStoreNames(ArrayList<String> recordStoreNames)Sets the record store (table) names for the store (database).voidsetStore(String store)Sets the location for the store / database.private voidstoreRecords(ArrayList<IRecord> records)INSERTs a collection of records into the database.
-
Field Details
-
connection
Connection for SQL requests. -
debug
private boolean debugDebugging flag, set by System variable passed in-Ddebug=truerather than setting here / with accessor. -
store
Path to database used. -
recordStoreNames
Record store (table) names used. -
DEF_DB_DIR
Default directory to create within store.- See Also:
- Constant Field Values
-
DEF_FILE_DIR
Default directory for temp files.- See Also:
- Constant Field Values
-
SANITISE_VIGOROUS
private final int SANITISE_VIGOROUSConstant for aggresive sanitisation.- See Also:
- Constant Field Values
-
SANITISE_NAME
private final int SANITISE_NAMEConstant for database object name sanitisation.- See Also:
- Constant Field Values
-
SANITISE_WEAK
private final int SANITISE_WEAKConstant for weak sanitisation to remove a few problematic chars.- See Also:
- Constant Field Values
-
INSERT_SMT
private final boolean INSERT_SMTConstant for sql statement running.- See Also:
- Constant Field Values
-
CREATE_SMT
private final boolean CREATE_SMTConstant for sql statement running.- See Also:
- Constant Field Values
-
METADATA_TABLE
private final boolean METADATA_TABLEConstant proveLoaded.- See Also:
- Constant Field Values
-
NORMAL_TABLE
private final boolean NORMAL_TABLEConstant proveLoaded.- See Also:
- Constant Field Values
-
progress
private int progressUsed for progress monitoring. -
reportingListeners
Listeners interested in updates on progress. -
chunkRecordsBy
Deprecated.Seeload(IDataset dataset)docs for details of deprecation.Used in memory management. Pushing an entire dataset to Derby via INSERTs risks the database holding very large data structures in memory, so we chunk it up where this is the case. This would increase processing time, but there's a time gain from not hitting memory limits; profiling suggests an optimal at ~2000 records for a 8Gb PC, but with a wide enough optimal range that there is little point in, e.g. setting this in-run using a greedy algorithm.- See Also:
load(IDataset dataset)
-
-
Constructor Details
-
DerbyConsumer
public DerbyConsumer()Default constructor.
-
-
Method Details
-
initialise
Sets up the data store (database and tables).If a store (database) path and/or record store (table) names haven't been set using the
setSource/setRecordStoreNamesmethods, this method deals with this. It pulls the titles from the dataset passed in for database name and from the dataset record holders for table names. The default location for the database is a directory within the user's home directory (see class docs). All names and paths are sanitised.It then creates the relevant database and tables, or connects to the database if it already exists. The connection remains until
disconnectStorecalled.To reset objects of this class, be sure to call
setStoreand theninitialise- just callinginitialisewill result in recursion of the store directories.- Specified by:
initialisein interfaceIDataConsumer- Parameters:
dataset- Dataset to store - note that this need not be filled with records as long as it has recordHolders, metadata, and field data.- Throws:
DBCreationException- If issues arise.
-
buildTable
Sets up the relevant database table.- Parameters:
dataset- The dataset containing the table to be built. Record holders can be empty of data at this point, but must contain field information and metadata.index- The index of the table to build inrecordStoreNames.- Throws:
DBCreationException- If there's an issue creating table.
-
buildMetadataTable
private void buildMetadataTable(IMetadata metadata, String metadataTableName) throws DBCreationExceptionSets up a metadata table for each table.- Parameters:
metadata- The metadata to be written to the metadata table.metadataTableName- The name of the metadata table.- Throws:
DBCreationException- If there's an issue making table.
-
bulkLoad
This method bulk-loads a whole dataset into the database the quickest (but memory hungry) way possible - via a CSV file.It will attempt to write the dataset as one or more temp files to the user's home space, and then load the files into the database. For a file of 60000 records, this is an order of magnitude faster than loading by INSERTs, but it does require write access.
initialisemust have been called first to prepare the database.This method calls
disconnectStorewhen done to clean up and garbage collect.- Specified by:
bulkLoadin interfaceIDataConsumer- Parameters:
dataset- The dataset to load.- Throws:
DBCreationException- If there is an issue loading.- To Do:
- Remove code that proves table filled. This is just for this specific application.
-
load
Adds multiple records to current database.This little at the moment than call
storeRecordswith the records, but we keep it as a separate method as it acts as a gateway for data in the supplier/consumer push model and we may want to add functionality to the gateway in the future.Users of the push model should call
disconnectStorewhen done, as the consumer keeps the connection open under this model.- Specified by:
loadin interfaceIDataConsumer- Parameters:
records- ArrayList of records.- Throws:
DBCreationException- Only if there is an issue.
-
storeRecords
INSERTs a collection of records into the database.This is very slow compared with bulk loading, but used judiciously can use less memory. A 60,000 4-value records file loaded 120 records at a time used 10% of the memory of bulk-loading, but took an order of magnitude more time (30 mins).
- Parameters:
records- ArrayList of records to write.- Throws:
DBCreationException- Only if there is an issue.
-
setStore
Sets the location for the store / database.If not set the parameters can be set from the dataset via the
initialisemethod.Sanitises input with SANITISE_WEAK.
- Specified by:
setStorein interfaceIDataConsumer- Parameters:
store- File path of database to connect to.- Throws:
DBCreationException- Not used in this implementation.- See Also:
initialise(IDataset dataset)
-
getStore
Gets the location for the store / database.For testing.
- Returns:
- String Store location.
-
setRecordStoreNames
Sets the record store (table) names for the store (database).If not set the parameters can be set from the dataset via the
initialisemethod.Sanitises input with SANITISE_NAME.
If set here directly (for example, the user enters the name/s) it's the implementers responsibility to make sure the number of record stores named and number supplied in the dataset to
initialisematch up orinitialisewill throw an exception.- Specified by:
setRecordStoreNamesin interfaceIDataConsumer- Parameters:
recordStoreNames- Names of record stores (tables) to make.- Throws:
DBCreationException- Not used in this implementation.- See Also:
initialise(IDataset dataset)
-
getRecordStoreNames
Gets the record store names.For testing.
- Returns:
- ArrayList Record store names.
-
addReportingListener
For objects wishing to get progress reports on data reading.- Specified by:
addReportingListenerin interfaceIDataConsumer- Parameters:
reportingListener- Object wishing to gain reports.- See Also:
IReportingListener
-
findTitle
Finds the title category in an unknown IMetadata object.If it doesn't exist, defaults to "DEFAULT".
- Parameters:
metadata- Metadata object of unknown schema.- Returns:
- String Title, if found.
-
connectStore
Connects to current store / database.- Specified by:
connectStorein interfaceIDataConsumer- Throws:
DBCreationException- If there is an issue.
-
disconnectStore
Disconnects from current store / database and garbage collects.- Specified by:
disconnectStorein interfaceIDataConsumer- Throws:
DBCreationException- Not thrown in this implementation.
-
runStatement
private void runStatement(String sql, String tableName, boolean statementType) throws DBCreationExceptionRuns SQL statements and closes them.If the statement requires creation the method will try to connect to the table, and if it exists it will DROP it and rebuild.
- Parameters:
sql- The SQL statement.tableName- The table involved; this is for reporting, not action as the table should be in the SQL if needed.statementType- For clarity, use INSERT_SMT or CREATE_SMT.- Throws:
DBCreationException- If there's an issue.
-
sanitise
Sanitises Strings.Vigorous sanitisation removes all non-alphanumeric characters in the range
a-z,A-Z,0-9, commas, spaces and tabs.Weak sanitisation removes characters likely to be problematic in databases (
" ' ; () =).Name sanitisation sanitises Strings to potential database object names fairly aggresively (removes spaces, capitalises, makes sure it starts with a alphabetic character and not "ii", removes all but 0 through 9, #, @, and $, and makes sure they're 256 or less bytes*).
*Assumes Java-like representation, i.e. two bytes per char.
- Parameters:
string- String to sanitise.level- One of SANITISE_VIGOROUS, SANITISE_WEAK, SANITISE_NAME.- Returns:
- String Sanitised String.
-
gapFillLocalisedGUIText
private void gapFillLocalisedGUIText()Sets the defaults for warnings and exceptions in English if an appropriate language properties file is missing. -
reportProgress
Reports progress to reportingListeners.Reports if progress is a multiple of total records / 100. If progress is zero or less, reports progress as 0 of 1.
- Parameters:
progress- Progress in record processing.dataset- Dataset to extract estimate of processing to be done.
-
reportProgress
public void reportProgress(int progress, int total)Reports progress to reportingListeners.Reports for an arbitrary progress and total worked towards.
- Parameters:
progress- Value indicating progress through work total.total- Value indicating total work to do.
-
reportMessage
Reports message to reportingListeners.- Parameters:
message- Message to reporting listeners.
-
load
Deprecated.This method adds a whole dataset, using INSERT to add a record at a time. While it chunks the dataset into blocks to reduce memory use on the database side, it is still memory heavy and an order of magnitude slower thanbulkLoad.This method was an early element of the data consumer built on top of the line-by-line reader to test the whole dataset, and INSERTs a whole dataset a line at a time.It is both memory heavy and slow. The only reason for having it here is that it is imaginable that some uses might want to avoid writing the temp file/s needed for bulk loading.
- Specified by:
loadin interfaceIDataConsumer- Parameters:
dataset- Dataset of data.- Throws:
DBCreationException- If is there is an issue.- See Also:
IDataConsumer.load(ArrayList<IRecord> records)
-
proveLoaded
Deprecated.Not needed for this class, but a convenient add-in for proving it works without the need for unit testing or database interrogation software.Messages out to ReportingListeners the first and last entries in the named table.Will disconnect and connect to current store to prove persistence.
- Parameters:
tableName- Table to use.metadataTable- One of METADATA_TABLE or NORMAL_TABLE.- Throws:
DBCreationException- If an issue with connecting.
-
load(IDataset dataset)docs for details of deprecation.