Class DerbyConsumer

java.lang.Object
io.github.ajevans.dbcode.dbconsumers.DerbyConsumer
All Implemented Interfaces:
IDataConsumer

public class DerbyConsumer
extends Object
implements IDataConsumer
Class for connecting to and manipulating a Derby database.

Derby is an Open Source Apache Software Foundation database. It runs without installation, and has a small footprint, making it a good choice for bundling with applications. For more information see the Derby website.

This class works with an IDataset (e.g. directory) containing one or more IRecordHolders (e.g. files) containing IRecords (e.g. rows) of values. It writes these to a store (in this case a database) as record stores (tables), along with an appropriate set of metadata tables.

IReportingListener objects may be registered with objects of this class to receive suitable progress reporting and messaging. In general, exceptions not dealt with internally are re-thrown for calling objects to deal with. Messages are user-friendly.

If developers have a specific database and set of table names they'd like to use, they should first call setStore and setRecordStoreNames. If these are missing when initialise is finally called, initialise will build a directory in the user's space (called after defaultDatabaseDirectory in application.properties) within which will be a database named after the dataset or DEFAULT and tables named after the record holders or DEFAULTx where x is an integer.

Either way, all developers should call initialise with a dataset that at least has the fields and fieldtypes in it. This will build the database structures with headers and set up progress monitoring.

They can then get data written to the database by either calling bulkLoad with a complete dataset object or load with a collection of records.

Note that because instance variables will hold a wide variety of information on pervious writes, it is essential that for each new set of files / dataset a new instance of this class is used.

Author:
Andy Evans
Version: 1.0 01 Mar 2021
  • Field Details

    • connection

      private Connection connection
      Connection for SQL requests.
    • debug

      private boolean debug
      Debugging flag, set by System variable passed in -Ddebug=true rather than setting here / with accessor.
    • store

      private String store
      Path to database used.
    • recordStoreNames

      private ArrayList<String> recordStoreNames
      Record store (table) names used.
    • DEF_DB_DIR

      private final String DEF_DB_DIR
      Default directory to create within store.
      See Also:
      Constant Field Values
    • DEF_FILE_DIR

      private final String DEF_FILE_DIR
      Default directory for temp files.
      See Also:
      Constant Field Values
    • SANITISE_VIGOROUS

      private final int SANITISE_VIGOROUS
      Constant for aggresive sanitisation.
      See Also:
      Constant Field Values
    • SANITISE_NAME

      private final int SANITISE_NAME
      Constant for database object name sanitisation.
      See Also:
      Constant Field Values
    • SANITISE_WEAK

      private final int SANITISE_WEAK
      Constant for weak sanitisation to remove a few problematic chars.
      See Also:
      Constant Field Values
    • INSERT_SMT

      private final boolean INSERT_SMT
      Constant for sql statement running.
      See Also:
      Constant Field Values
    • CREATE_SMT

      private final boolean CREATE_SMT
      Constant for sql statement running.
      See Also:
      Constant Field Values
    • METADATA_TABLE

      private final boolean METADATA_TABLE
      Constant proveLoaded.
      See Also:
      Constant Field Values
    • NORMAL_TABLE

      private final boolean NORMAL_TABLE
      Constant proveLoaded.
      See Also:
      Constant Field Values
    • progress

      private int progress
      Used for progress monitoring.
    • reportingListeners

      private ArrayList<IReportingListener> reportingListeners
      Listeners interested in updates on progress.
    • chunkRecordsBy

      @Deprecated private int chunkRecordsBy
      Deprecated.
      See load(IDataset dataset) docs for details of deprecation.
      Used in memory management. Pushing an entire dataset to Derby via INSERTs risks the database holding very large data structures in memory, so we chunk it up where this is the case. This would increase processing time, but there's a time gain from not hitting memory limits; profiling suggests an optimal at ~2000 records for a 8Gb PC, but with a wide enough optimal range that there is little point in, e.g. setting this in-run using a greedy algorithm.
      See Also:
      load(IDataset dataset)
  • Constructor Details

    • DerbyConsumer

      public DerbyConsumer()
      Default constructor.
  • Method Details

    • initialise

      public void initialise​(IDataset dataset) throws DBCreationException
      Sets up the data store (database and tables).

      If a store (database) path and/or record store (table) names haven't been set using the setSource / setRecordStoreNames methods, this method deals with this. It pulls the titles from the dataset passed in for database name and from the dataset record holders for table names. The default location for the database is a directory within the user's home directory (see class docs). All names and paths are sanitised.

      It then creates the relevant database and tables, or connects to the database if it already exists. The connection remains until disconnectStore called.

      To reset objects of this class, be sure to call setStore and then initialise - just calling initialise will result in recursion of the store directories.

      Specified by:
      initialise in interface IDataConsumer
      Parameters:
      dataset - Dataset to store - note that this need not be filled with records as long as it has recordHolders, metadata, and field data.
      Throws:
      DBCreationException - If issues arise.
    • buildTable

      private void buildTable​(IDataset dataset, int index) throws DBCreationException
      Sets up the relevant database table.
      Parameters:
      dataset - The dataset containing the table to be built. Record holders can be empty of data at this point, but must contain field information and metadata.
      index - The index of the table to build in recordStoreNames.
      Throws:
      DBCreationException - If there's an issue creating table.
    • buildMetadataTable

      private void buildMetadataTable​(IMetadata metadata, String metadataTableName) throws DBCreationException
      Sets up a metadata table for each table.
      Parameters:
      metadata - The metadata to be written to the metadata table.
      metadataTableName - The name of the metadata table.
      Throws:
      DBCreationException - If there's an issue making table.
    • bulkLoad

      public void bulkLoad​(IDataset dataset) throws DBCreationException
      This method bulk-loads a whole dataset into the database the quickest (but memory hungry) way possible - via a CSV file.

      It will attempt to write the dataset as one or more temp files to the user's home space, and then load the files into the database. For a file of 60000 records, this is an order of magnitude faster than loading by INSERTs, but it does require write access. initialise must have been called first to prepare the database.

      This method calls disconnectStore when done to clean up and garbage collect.

      Specified by:
      bulkLoad in interface IDataConsumer
      Parameters:
      dataset - The dataset to load.
      Throws:
      DBCreationException - If there is an issue loading.
      To Do:
      Remove code that proves table filled. This is just for this specific application.
    • load

      public void load​(ArrayList<IRecord> records) throws DBCreationException
      Adds multiple records to current database.

      This little at the moment than call storeRecords with the records, but we keep it as a separate method as it acts as a gateway for data in the supplier/consumer push model and we may want to add functionality to the gateway in the future.

      Users of the push model should call disconnectStore when done, as the consumer keeps the connection open under this model.

      Specified by:
      load in interface IDataConsumer
      Parameters:
      records - ArrayList of records.
      Throws:
      DBCreationException - Only if there is an issue.
    • storeRecords

      private void storeRecords​(ArrayList<IRecord> records) throws DBCreationException
      INSERTs a collection of records into the database.

      This is very slow compared with bulk loading, but used judiciously can use less memory. A 60,000 4-value records file loaded 120 records at a time used 10% of the memory of bulk-loading, but took an order of magnitude more time (30 mins).

      Parameters:
      records - ArrayList of records to write.
      Throws:
      DBCreationException - Only if there is an issue.
    • setStore

      public void setStore​(String store) throws DBCreationException
      Sets the location for the store / database.

      If not set the parameters can be set from the dataset via the initialise method.

      Sanitises input with SANITISE_WEAK.

      Specified by:
      setStore in interface IDataConsumer
      Parameters:
      store - File path of database to connect to.
      Throws:
      DBCreationException - Not used in this implementation.
      See Also:
      initialise(IDataset dataset)
    • getStore

      public String getStore()
      Gets the location for the store / database.

      For testing.

      Returns:
      String Store location.
    • setRecordStoreNames

      public void setRecordStoreNames​(ArrayList<String> recordStoreNames) throws DBCreationException
      Sets the record store (table) names for the store (database).

      If not set the parameters can be set from the dataset via the initialise method.

      Sanitises input with SANITISE_NAME.

      If set here directly (for example, the user enters the name/s) it's the implementers responsibility to make sure the number of record stores named and number supplied in the dataset to initialise match up or initialise will throw an exception.

      Specified by:
      setRecordStoreNames in interface IDataConsumer
      Parameters:
      recordStoreNames - Names of record stores (tables) to make.
      Throws:
      DBCreationException - Not used in this implementation.
      See Also:
      initialise(IDataset dataset)
    • getRecordStoreNames

      public ArrayList<String> getRecordStoreNames()
      Gets the record store names.

      For testing.

      Returns:
      ArrayList Record store names.
    • addReportingListener

      public void addReportingListener​(IReportingListener reportingListener)
      For objects wishing to get progress reports on data reading.
      Specified by:
      addReportingListener in interface IDataConsumer
      Parameters:
      reportingListener - Object wishing to gain reports.
      See Also:
      IReportingListener
    • findTitle

      protected String findTitle​(IMetadata metadata)
      Finds the title category in an unknown IMetadata object.

      If it doesn't exist, defaults to "DEFAULT".

      Parameters:
      metadata - Metadata object of unknown schema.
      Returns:
      String Title, if found.
    • connectStore

      public void connectStore() throws DBCreationException
      Connects to current store / database.
      Specified by:
      connectStore in interface IDataConsumer
      Throws:
      DBCreationException - If there is an issue.
    • disconnectStore

      public void disconnectStore() throws DBCreationException
      Disconnects from current store / database and garbage collects.
      Specified by:
      disconnectStore in interface IDataConsumer
      Throws:
      DBCreationException - Not thrown in this implementation.
    • runStatement

      private void runStatement​(String sql, String tableName, boolean statementType) throws DBCreationException
      Runs SQL statements and closes them.

      If the statement requires creation the method will try to connect to the table, and if it exists it will DROP it and rebuild.

      Parameters:
      sql - The SQL statement.
      tableName - The table involved; this is for reporting, not action as the table should be in the SQL if needed.
      statementType - For clarity, use INSERT_SMT or CREATE_SMT.
      Throws:
      DBCreationException - If there's an issue.
    • sanitise

      protected String sanitise​(String string, int level)
      Sanitises Strings.

      Vigorous sanitisation removes all non-alphanumeric characters in the range a-z, A-Z, 0-9, commas, spaces and tabs.

      Weak sanitisation removes characters likely to be problematic in databases (" ' ; () =).

      Name sanitisation sanitises Strings to potential database object names fairly aggresively (removes spaces, capitalises, makes sure it starts with a alphabetic character and not "ii", removes all but 0 through 9, #, @, and $, and makes sure they're 256 or less bytes*).

      *Assumes Java-like representation, i.e. two bytes per char.

      Parameters:
      string - String to sanitise.
      level - One of SANITISE_VIGOROUS, SANITISE_WEAK, SANITISE_NAME.
      Returns:
      String Sanitised String.
    • gapFillLocalisedGUIText

      private void gapFillLocalisedGUIText()
      Sets the defaults for warnings and exceptions in English if an appropriate language properties file is missing.
    • reportProgress

      public void reportProgress​(int progress, IDataset dataset)
      Reports progress to reportingListeners.

      Reports if progress is a multiple of total records / 100. If progress is zero or less, reports progress as 0 of 1.

      Parameters:
      progress - Progress in record processing.
      dataset - Dataset to extract estimate of processing to be done.
    • reportProgress

      public void reportProgress​(int progress, int total)
      Reports progress to reportingListeners.

      Reports for an arbitrary progress and total worked towards.

      Parameters:
      progress - Value indicating progress through work total.
      total - Value indicating total work to do.
    • reportMessage

      public void reportMessage​(String message)
      Reports message to reportingListeners.
      Parameters:
      message - Message to reporting listeners.
    • load

      @Deprecated public void load​(IDataset dataset) throws DBCreationException
      Deprecated.
      This method adds a whole dataset, using INSERT to add a record at a time. While it chunks the dataset into blocks to reduce memory use on the database side, it is still memory heavy and an order of magnitude slower than bulkLoad.
      This method was an early element of the data consumer built on top of the line-by-line reader to test the whole dataset, and INSERTs a whole dataset a line at a time.

      It is both memory heavy and slow. The only reason for having it here is that it is imaginable that some uses might want to avoid writing the temp file/s needed for bulk loading.

      Specified by:
      load in interface IDataConsumer
      Parameters:
      dataset - Dataset of data.
      Throws:
      DBCreationException - If is there is an issue.
      See Also:
      IDataConsumer.load(ArrayList<IRecord> records)
    • proveLoaded

      private void proveLoaded​(String tableName, boolean metadataTable) throws DBCreationException
      Deprecated.
      Not needed for this class, but a convenient add-in for proving it works without the need for unit testing or database interrogation software.
      Messages out to ReportingListeners the first and last entries in the named table.

      Will disconnect and connect to current store to prove persistence.

      Parameters:
      tableName - Table to use.
      metadataTable - One of METADATA_TABLE or NORMAL_TABLE.
      Throws:
      DBCreationException - If an issue with connecting.