Class DerbyConsumer
- All Implemented Interfaces:
IDataConsumer
public class DerbyConsumer extends Object implements IDataConsumer
Derby is an Open Source Apache Software Foundation database. It runs without installation, and has a small footprint, making it a good choice for bundling with applications. For more information see the Derby website.
This class works with an IDataset
(e.g. directory) containing one
or more IRecordHolders
(e.g. files) containing
IRecords
(e.g. rows) of values. It writes these to a
store (in this case a database) as record stores (tables), along with
an appropriate set of metadata tables.
IReportingListener
objects may be registered with objects of
this class to receive suitable progress reporting and messaging. In general,
exceptions not dealt with internally are re-thrown for calling objects to
deal with. Messages are user-friendly.
If developers have a specific database and set of table names
they'd like to use, they should first call setStore
and setRecordStoreNames
. If these are missing when
initialise
is finally called, initialise
will build a directory in the user's space (called after
defaultDatabaseDirectory
in application.properties
)
within which will be a database named after the dataset or
DEFAULT
and tables named after the record holders or
DEFAULTx
where x
is an integer.
Either way, all developers should call initialise
with
a dataset that at least has the fields and fieldtypes in it. This will
build the database structures with headers and set up progress monitoring.
They can then get data written to the database by either calling
bulkLoad
with a complete dataset object or
load
with a collection of records.
Note that because instance variables will hold a wide variety of information on pervious writes, it is essential that for each new set of files / dataset a new instance of this class is used.
- Author:
- Andy Evans
- Version: 1.0 01 Mar 2021
-
Field Summary
Fields Modifier and Type Field Description private int
chunkRecordsBy
Deprecated.private Connection
connection
Connection for SQL requests.private boolean
CREATE_SMT
Constant for sql statement running.private boolean
debug
Debugging flag, set by System variable passed in-Ddebug=true
rather than setting here / with accessor.private String
DEF_DB_DIR
Default directory to create within store.private String
DEF_FILE_DIR
Default directory for temp files.private boolean
INSERT_SMT
Constant for sql statement running.private boolean
METADATA_TABLE
Constant proveLoaded.private boolean
NORMAL_TABLE
Constant proveLoaded.private int
progress
Used for progress monitoring.private ArrayList<String>
recordStoreNames
Record store (table) names used.private ArrayList<IReportingListener>
reportingListeners
Listeners interested in updates on progress.private int
SANITISE_NAME
Constant for database object name sanitisation.private int
SANITISE_VIGOROUS
Constant for aggresive sanitisation.private int
SANITISE_WEAK
Constant for weak sanitisation to remove a few problematic chars.private String
store
Path to database used. -
Constructor Summary
Constructors Constructor Description DerbyConsumer()
Default constructor. -
Method Summary
Modifier and Type Method Description void
addReportingListener(IReportingListener reportingListener)
For objects wishing to get progress reports on data reading.private void
buildMetadataTable(IMetadata metadata, String metadataTableName)
Sets up a metadata table for each table.private void
buildTable(IDataset dataset, int index)
Sets up the relevant database table.void
bulkLoad(IDataset dataset)
This method bulk-loads a whole dataset into the database the quickest (but memory hungry) way possible - via a CSV file.void
connectStore()
Connects to current store / database.void
disconnectStore()
Disconnects from current store / database and garbage collects.protected String
findTitle(IMetadata metadata)
Finds the title category in an unknown IMetadata object.private void
gapFillLocalisedGUIText()
Sets the defaults for warnings and exceptions in English if an appropriate language properties file is missing.ArrayList<String>
getRecordStoreNames()
Gets the record store names.String
getStore()
Gets the location for the store / database.void
initialise(IDataset dataset)
Sets up the data store (database and tables).void
load(IDataset dataset)
Deprecated.This method adds a whole dataset, using INSERT to add a record at a time.void
load(ArrayList<IRecord> records)
Adds multiple records to current database.private void
proveLoaded(String tableName, boolean metadataTable)
Deprecated.Not needed for this class, but a convenient add-in for proving it works without the need for unit testing or database interrogation software.void
reportMessage(String message)
Reports message to reportingListeners.void
reportProgress(int progress, int total)
Reports progress to reportingListeners.void
reportProgress(int progress, IDataset dataset)
Reports progress to reportingListeners.private void
runStatement(String sql, String tableName, boolean statementType)
Runs SQL statements and closes them.protected String
sanitise(String string, int level)
Sanitises Strings.void
setRecordStoreNames(ArrayList<String> recordStoreNames)
Sets the record store (table) names for the store (database).void
setStore(String store)
Sets the location for the store / database.private void
storeRecords(ArrayList<IRecord> records)
INSERTs a collection of records into the database.
-
Field Details
-
connection
Connection for SQL requests. -
debug
private boolean debugDebugging flag, set by System variable passed in-Ddebug=true
rather than setting here / with accessor. -
store
Path to database used. -
recordStoreNames
Record store (table) names used. -
DEF_DB_DIR
Default directory to create within store.- See Also:
- Constant Field Values
-
DEF_FILE_DIR
Default directory for temp files.- See Also:
- Constant Field Values
-
SANITISE_VIGOROUS
private final int SANITISE_VIGOROUSConstant for aggresive sanitisation.- See Also:
- Constant Field Values
-
SANITISE_NAME
private final int SANITISE_NAMEConstant for database object name sanitisation.- See Also:
- Constant Field Values
-
SANITISE_WEAK
private final int SANITISE_WEAKConstant for weak sanitisation to remove a few problematic chars.- See Also:
- Constant Field Values
-
INSERT_SMT
private final boolean INSERT_SMTConstant for sql statement running.- See Also:
- Constant Field Values
-
CREATE_SMT
private final boolean CREATE_SMTConstant for sql statement running.- See Also:
- Constant Field Values
-
METADATA_TABLE
private final boolean METADATA_TABLEConstant proveLoaded.- See Also:
- Constant Field Values
-
NORMAL_TABLE
private final boolean NORMAL_TABLEConstant proveLoaded.- See Also:
- Constant Field Values
-
progress
private int progressUsed for progress monitoring. -
reportingListeners
Listeners interested in updates on progress. -
chunkRecordsBy
Deprecated.Seeload(IDataset dataset)
docs for details of deprecation.Used in memory management. Pushing an entire dataset to Derby via INSERTs risks the database holding very large data structures in memory, so we chunk it up where this is the case. This would increase processing time, but there's a time gain from not hitting memory limits; profiling suggests an optimal at ~2000 records for a 8Gb PC, but with a wide enough optimal range that there is little point in, e.g. setting this in-run using a greedy algorithm.- See Also:
load(IDataset dataset)
-
-
Constructor Details
-
DerbyConsumer
public DerbyConsumer()Default constructor.
-
-
Method Details
-
initialise
Sets up the data store (database and tables).If a store (database) path and/or record store (table) names haven't been set using the
setSource
/setRecordStoreNames
methods, this method deals with this. It pulls the titles from the dataset passed in for database name and from the dataset record holders for table names. The default location for the database is a directory within the user's home directory (see class docs). All names and paths are sanitised.It then creates the relevant database and tables, or connects to the database if it already exists. The connection remains until
disconnectStore
called.To reset objects of this class, be sure to call
setStore
and theninitialise
- just callinginitialise
will result in recursion of the store directories.- Specified by:
initialise
in interfaceIDataConsumer
- Parameters:
dataset
- Dataset to store - note that this need not be filled with records as long as it has recordHolders, metadata, and field data.- Throws:
DBCreationException
- If issues arise.
-
buildTable
Sets up the relevant database table.- Parameters:
dataset
- The dataset containing the table to be built. Record holders can be empty of data at this point, but must contain field information and metadata.index
- The index of the table to build inrecordStoreNames
.- Throws:
DBCreationException
- If there's an issue creating table.
-
buildMetadataTable
private void buildMetadataTable(IMetadata metadata, String metadataTableName) throws DBCreationExceptionSets up a metadata table for each table.- Parameters:
metadata
- The metadata to be written to the metadata table.metadataTableName
- The name of the metadata table.- Throws:
DBCreationException
- If there's an issue making table.
-
bulkLoad
This method bulk-loads a whole dataset into the database the quickest (but memory hungry) way possible - via a CSV file.It will attempt to write the dataset as one or more temp files to the user's home space, and then load the files into the database. For a file of 60000 records, this is an order of magnitude faster than loading by INSERTs, but it does require write access.
initialise
must have been called first to prepare the database.This method calls
disconnectStore
when done to clean up and garbage collect.- Specified by:
bulkLoad
in interfaceIDataConsumer
- Parameters:
dataset
- The dataset to load.- Throws:
DBCreationException
- If there is an issue loading.- To Do:
- Remove code that proves table filled. This is just for this specific application.
-
load
Adds multiple records to current database.This little at the moment than call
storeRecords
with the records, but we keep it as a separate method as it acts as a gateway for data in the supplier/consumer push model and we may want to add functionality to the gateway in the future.Users of the push model should call
disconnectStore
when done, as the consumer keeps the connection open under this model.- Specified by:
load
in interfaceIDataConsumer
- Parameters:
records
- ArrayList of records.- Throws:
DBCreationException
- Only if there is an issue.
-
storeRecords
INSERTs a collection of records into the database.This is very slow compared with bulk loading, but used judiciously can use less memory. A 60,000 4-value records file loaded 120 records at a time used 10% of the memory of bulk-loading, but took an order of magnitude more time (30 mins).
- Parameters:
records
- ArrayList of records to write.- Throws:
DBCreationException
- Only if there is an issue.
-
setStore
Sets the location for the store / database.If not set the parameters can be set from the dataset via the
initialise
method.Sanitises input with SANITISE_WEAK.
- Specified by:
setStore
in interfaceIDataConsumer
- Parameters:
store
- File path of database to connect to.- Throws:
DBCreationException
- Not used in this implementation.- See Also:
initialise(IDataset dataset)
-
getStore
Gets the location for the store / database.For testing.
- Returns:
- String Store location.
-
setRecordStoreNames
Sets the record store (table) names for the store (database).If not set the parameters can be set from the dataset via the
initialise
method.Sanitises input with SANITISE_NAME.
If set here directly (for example, the user enters the name/s) it's the implementers responsibility to make sure the number of record stores named and number supplied in the dataset to
initialise
match up orinitialise
will throw an exception.- Specified by:
setRecordStoreNames
in interfaceIDataConsumer
- Parameters:
recordStoreNames
- Names of record stores (tables) to make.- Throws:
DBCreationException
- Not used in this implementation.- See Also:
initialise(IDataset dataset)
-
getRecordStoreNames
Gets the record store names.For testing.
- Returns:
- ArrayList Record store names.
-
addReportingListener
For objects wishing to get progress reports on data reading.- Specified by:
addReportingListener
in interfaceIDataConsumer
- Parameters:
reportingListener
- Object wishing to gain reports.- See Also:
IReportingListener
-
findTitle
Finds the title category in an unknown IMetadata object.If it doesn't exist, defaults to "DEFAULT".
- Parameters:
metadata
- Metadata object of unknown schema.- Returns:
- String Title, if found.
-
connectStore
Connects to current store / database.- Specified by:
connectStore
in interfaceIDataConsumer
- Throws:
DBCreationException
- If there is an issue.
-
disconnectStore
Disconnects from current store / database and garbage collects.- Specified by:
disconnectStore
in interfaceIDataConsumer
- Throws:
DBCreationException
- Not thrown in this implementation.
-
runStatement
private void runStatement(String sql, String tableName, boolean statementType) throws DBCreationExceptionRuns SQL statements and closes them.If the statement requires creation the method will try to connect to the table, and if it exists it will DROP it and rebuild.
- Parameters:
sql
- The SQL statement.tableName
- The table involved; this is for reporting, not action as the table should be in the SQL if needed.statementType
- For clarity, use INSERT_SMT or CREATE_SMT.- Throws:
DBCreationException
- If there's an issue.
-
sanitise
Sanitises Strings.Vigorous sanitisation removes all non-alphanumeric characters in the range
a-z
,A-Z
,0-9
, commas, spaces and tabs.Weak sanitisation removes characters likely to be problematic in databases (
" ' ; () =
).Name sanitisation sanitises Strings to potential database object names fairly aggresively (removes spaces, capitalises, makes sure it starts with a alphabetic character and not "ii", removes all but 0 through 9, #, @, and $, and makes sure they're 256 or less bytes*).
*Assumes Java-like representation, i.e. two bytes per char.
- Parameters:
string
- String to sanitise.level
- One of SANITISE_VIGOROUS, SANITISE_WEAK, SANITISE_NAME.- Returns:
- String Sanitised String.
-
gapFillLocalisedGUIText
private void gapFillLocalisedGUIText()Sets the defaults for warnings and exceptions in English if an appropriate language properties file is missing. -
reportProgress
Reports progress to reportingListeners.Reports if progress is a multiple of total records / 100. If progress is zero or less, reports progress as 0 of 1.
- Parameters:
progress
- Progress in record processing.dataset
- Dataset to extract estimate of processing to be done.
-
reportProgress
public void reportProgress(int progress, int total)Reports progress to reportingListeners.Reports for an arbitrary progress and total worked towards.
- Parameters:
progress
- Value indicating progress through work total.total
- Value indicating total work to do.
-
reportMessage
Reports message to reportingListeners.- Parameters:
message
- Message to reporting listeners.
-
load
Deprecated.This method adds a whole dataset, using INSERT to add a record at a time. While it chunks the dataset into blocks to reduce memory use on the database side, it is still memory heavy and an order of magnitude slower thanbulkLoad
.This method was an early element of the data consumer built on top of the line-by-line reader to test the whole dataset, and INSERTs a whole dataset a line at a time.It is both memory heavy and slow. The only reason for having it here is that it is imaginable that some uses might want to avoid writing the temp file/s needed for bulk loading.
- Specified by:
load
in interfaceIDataConsumer
- Parameters:
dataset
- Dataset of data.- Throws:
DBCreationException
- If is there is an issue.- See Also:
IDataConsumer.load(ArrayList<IRecord> records)
-
proveLoaded
Deprecated.Not needed for this class, but a convenient add-in for proving it works without the need for unit testing or database interrogation software.Messages out to ReportingListeners the first and last entries in the named table.Will disconnect and connect to current store to prove persistence.
- Parameters:
tableName
- Table to use.metadataTable
- One of METADATA_TABLE or NORMAL_TABLE.- Throws:
DBCreationException
- If an issue with connecting.
-
load(IDataset dataset)
docs for details of deprecation.