Class CruTs2pt1Supplier

java.lang.Object
io.github.ajevans.dbcode.filesuppliers.CruTs2pt1Supplier
All Implemented Interfaces:
IDataSupplier

@Deprecated
public class CruTs2pt1Supplier
extends Object
implements IDataSupplier
Deprecated.
This data has been superseded (see this page), and it's therefore likely this class will be removed at some point.
Parser for Climate Research Unit Time Series 2.1 files gridded files.

The class reads a set of recordHolders (files) in a source (directory) and provides the recordHolders as an IDataset containing IRecordHolder objects, each containing IRecord objects (rows). Each dataset and recordHolder comes with an IMetadata object.

IReportingListener objects may be registered with objects of this class to receive suitable progress reporting and messaging. In general, exceptions not dealt with internally are re-thrown for calling objects to deal with. Messages are user-friendly.

Note that because instance variables will hold a wide variety of information on pervious writes, it is essential that for each new set of files / dataset a new instance of this class is used.

This parser works on climate data files produced by Dr Tim Mitchell (archived homepage) while at the Tyndall Centre for Climate Change Research and released 23rd January 2004. It is designed for CRU TS 2.1, but will work for CRU TS 2.0 (indeed, some 2.1 is distributed in 2.0 files).

The data comprises climate records for the global land surface interpolated from real observations to a 0.5 degree grid at a monthly time series1. Data for nine observed and derived variables are available (temperature, diurnal temperature range, daily minimum and maximum temperatures, precipitation, wet-day frequency, frost-day frequency, vapour pressure, and cloud cover), but each file contains only one.

The data format is bespoke2. A description can be found via the data homepage.

NB: The data has been superseded, most recently by CRU TS 4.04 (24 April 2020: data, associated papers, and GoogleEarth visualisations).

Notes:

1Timothy D. Mitchell and Philip D. Jones (2005) An improved method of constructing a database of monthly climate observations and associated high-resolution grids. International Journal of Climatology, 25 (6), 693-712 [online] https://doi.org/10.1002/joc.1181 (alternative). Accessed 7th February 2021.

2The headers note these are "grim" files, and there is some belief online that they are "GPS Receiver Interface Module (GRIM)" files, but this is likely to be a typo for "grid" files, which is how they are described by Mitchell elsewhere. They are similar to other multi-channel raster formats and flat ACSII grid formats like ESRI's ARCINFO GRID format.

Author:
Andy Evans
To Do:
It's likely we could make an more abstract file parser at some point, set up by a properties file., Localise metadata notes?
Version: 1.0 01 Mar 2021
  • Field Details

    • debug

      private boolean debug
      Deprecated.
      Debugging flag, set by System variable passed in -Ddebug=true rather than setting here / with accessor.
    • source

      private File source
      Deprecated.
      Source directory for reading.
    • recordHolderNames

      private ArrayList<String> recordHolderNames
      Deprecated.
      Source filenames for reading.
    • tabulatedDataset

      private TabulatedDataset tabulatedDataset
      Deprecated.
      Store for all tables in this dataset.
    • fieldNames

      private ArrayList<String> fieldNames
      Deprecated.
      Output field names for this file type.
    • fieldTypes

      private ArrayList<Class> fieldTypes
      Deprecated.
      Output field classes for this file type.
    • buffer

      private BufferedReader buffer
      Deprecated.
      Main file connection.
    • listeners

      private ArrayList<IDataConsumer> listeners
      Deprecated.
      Register for data consumers wishing to listen for pushed data.
    • reportingListeners

      private ArrayList<IReportingListener> reportingListeners
      Deprecated.
      Listeners interested in updates on progress.
    • numberOfHeaderLines

      private int numberOfHeaderLines
      Deprecated.
      Number of lines to read for data source header.
    • metadataDatePattern

      private String metadataDatePattern
      Deprecated.
      Format for any dates in source header.
    • startYear

      private int startYear
      Deprecated.
      As the data isn't marked up for date beyond info in the header.
    • endYear

      private int endYear
      Deprecated.
      As the data isn't marked up for date beyond info in the header.
    • years

      private int years
      Deprecated.
      Could calculate this when needed locally, but better to do it once.
    • valuesPerYear

      private int valuesPerYear
      Deprecated.
      As the data isn't marked up for date beyond info in the header.
      To Do:
      Calculate this from file.
    • dataTokenWidth

      private int dataTokenWidth
      Deprecated.
      Width of a data column in width-delimited data.
    • progress

      private int progress
      Deprecated.
      For monitoring progress at reading.
  • Constructor Details

    • CruTs2pt1Supplier

      public CruTs2pt1Supplier()
      Deprecated.
      Generic constructor.
  • Method Details

    • initialise

      public void initialise() throws ParseFailedException
      Deprecated.
      Sets up the data supplier ready to read data.

      It creates the relevant internal data structures in preparation for reading the data, including reading in and parsing file headers.

      Note that this method is kept separate from the setSource and recordHolderNames methods to enable piped data processing implementations where a series of suppliers and consumers set up prior to activation. However, setSource and recordHolderNames must be called prior to this method being called so it has something to initialise.

      If a source path and record holder names haven't been set using the setSource / setRecordHolderNames methods, this method throws a ParseFailedException.

      Specified by:
      initialise in interface IDataSupplier
      Throws:
      ParseFailedException - Usually if there is an issue reading a file; e.g. the wrong file type, the file has no data, or the source file is missing. It makes sense for callers to cancel further attempts at reading at this point.
    • initialiseFields

      private void initialiseFields()
      Deprecated.
      Sets up the fields for this data type.

      For this data type, fields, and their type in the system, are:

      Xref
      java.math.BigDecimal
      Yref
      java.math.BigDecimal
      Date
      java.util.GregorianCalendar
      Value
      java.math.BigDecimal
    • setupDataset

      private void setupDataset()
      Deprecated.
      Sets up a data structure ready for the data.
    • estimateRecordCount

      private int estimateRecordCount​(int index)
      Deprecated.
      This estimates the records in the file.

      It is an accurate estimate here, but generally this estimate within this system should only be used for progress measurement and reporting no data files - not an accurate count of records actually read.

      Parameters:
      index - The position of the data to connect to in recordHolderNames.
      Returns:
      int Estimate of record count.
    • readLines

      private ArrayList<String> readLines​(int numberOfLines) throws ParseFailedException
      Deprecated.
      Reads a set of lines and returns them as an unparsed ArrayList of Strings.

      Returns null only if all lines pulled by numberOfLines are null. It's therefore possible to get a smaller than expected ArrayList at the end of a file whose size to read is not "% numberOfLines == 0". However, the next call will return null.

      Parameters:
      numberOfLines - Number of lines to read.
      Returns:
      ArrayList Strings, one per line, or null at the end of the file.
      Throws:
      ParseFailedException - If there is an issue.
    • readData

      public void readData() throws ParseFailedException
      Deprecated.
      Fills the dataset with data.

      Primitives are boxed.

      Specified by:
      readData in interface IDataSupplier
      Throws:
      ParseFailedException - If there is an issue.
    • readTable

      public void readTable​(Table table) throws ParseFailedException
      Deprecated.
      Fills a table with data.
      Parameters:
      table - Table to add rows to.
      Throws:
      ParseFailedException - If there is an issue.
    • parseHeader

      private void parseHeader​(int index) throws ParseFailedException
      Deprecated.
      Parses the header of the data source.

      The data is used for internal data parsing, but is also written to the dataset metadata "notes" category.

      Probably the most significant things this method does is set the dataset metadata tag "title" to the third line of the header, which should be the data type "CRU TS 2.1" (if you read in 2.0 files or a mix it will be whatever is read in last). This becomes the dataset name when processed. It also sets each record holder (file / table) "title" to the second line, which should be the shortened observation type, for example ".pre = precipitation (mm)" becomes "pre", adding the following information:

      • start year
      • end year
      • number of the file read, starting with one
      ...just incase there's more than one file of the same type read. This becomes the record holder name when processed.
      Parameters:
      index - The position of the table to connect to in recordHolderNames.
      Throws:
      ParseFailedException - This exception should get passed back to the caller of initialise to end attempts at reading. Contains the message "Having difficulty reading this file. Are you sure it is CRU TS 2.x format?"
      To Do:
      Detailed reporting of poor quality header information., Need to get time metadata from the files.
    • getParsedDataBlockAsRows

      private ArrayList<IRecord> getParsedDataBlockAsRows​(Table table) throws ParseFailedException
      Deprecated.
      Reads a data block and turns it into records.

      In this file format a data block is a Xref/Yref header plus a set of rows representing years. Values across a row are monthly. We therefore read a block at a time rather than a row at a time.

      Reports progress to any ReportingListeners.

      Parameters:
      table - This is used to connect rows with parent tables.
      Returns:
      ArrayList An ArrayList of rows, each row containing data in the appropriate field order.
      Throws:
      ParseFailedException - If there's an issue.
      See Also:
      getFieldNames(), getFieldTypes()
    • pushData

      public void pushData() throws ParseFailedException
      Deprecated.
      Pushes data to consumers registered as data listeners.

      The method reads a data block at a time and pushes it to registered data consumers for processing by calling their load(ArrayList<IRecords> records) method when reading completed.

      Garbage collects at the end of each push.

      Specified by:
      pushData in interface IDataSupplier
      Throws:
      ParseFailedException - If there is an issue.
      See Also:
      addDataListener(IDataConsumer consumer)
    • setSource

      public void setSource​(File source) throws ParseFailedException
      Deprecated.
      Connect to a File.

      Reading begun under initialisation.

      Specified by:
      setSource in interface IDataSupplier
      Parameters:
      source - Source file to read.
      Throws:
      ParseFailedException - Not used in this implementation.
    • getSource

      public File getSource()
      Deprecated.
      Gets the source file.
      Specified by:
      getSource in interface IDataSupplier
      Returns:
      File The source file.
    • setRecordHolderNames

      public void setRecordHolderNames​(ArrayList<String> recordHolderNames) throws ParseFailedException
      Deprecated.
      Sets the names of files to read.
      Specified by:
      setRecordHolderNames in interface IDataSupplier
      Parameters:
      recordHolderNames - ArrayList of names.
      Throws:
      ParseFailedException - Not used in this implementation.
      See Also:
      IDataSupplier.initialise()
    • getRecordHolderNames

      public ArrayList<String> getRecordHolderNames()
      Deprecated.
      Gets the names of files to read.
      Specified by:
      getRecordHolderNames in interface IDataSupplier
      Returns:
      recordHolderNames ArrayList of names.
    • getFieldNames

      public ArrayList<String> getFieldNames()
      Deprecated.
      Gets the names of fields.
      Returns:
      ArrayList ArrayList of names.
    • getFieldTypes

      public ArrayList<Class> getFieldTypes()
      Deprecated.
      Gets the type of fields.

      Primitives are boxed.

      Returns:
      ArrayList ArrayList of Classes.
    • getDataset

      public IDataset getDataset()
      Deprecated.
      Gets the dataset.

      Note that the dataset will not be implemented and filled with fields and metadata until initialise called. It will not be filled with data until readData called.

      Specified by:
      getDataset in interface IDataSupplier
      Returns:
      IDataset The dataset.
      See Also:
      IDataSupplier.pushData()
    • addDataListener

      public void addDataListener​(IDataConsumer consumer)
      Deprecated.
      Register for data pushes.
      Specified by:
      addDataListener in interface IDataSupplier
      Parameters:
      consumer - Data consumer.
      See Also:
      IDataConsumer.load(ArrayList<IRecord> records), IDataSupplier.pushData()
    • addReportingListener

      public void addReportingListener​(IReportingListener reportingListener)
      Deprecated.
      For objects wishing to get progress reports on data reading.
      Specified by:
      addReportingListener in interface IDataSupplier
      Parameters:
      reportingListener - Object wishing to gain reports.
      See Also:
      IReportingListener
    • connectSource

      public void connectSource​(int index) throws ParseFailedException
      Deprecated.
      Connects to a record holder (e.g. file) in the current source (directory).
      Specified by:
      connectSource in interface IDataSupplier
      Parameters:
      index - Index of record holder to connect to in collection set using setRecordHolderNames.
      Throws:
      ParseFailedException - Only if there is an issue.
    • disconnectSource

      public void disconnectSource() throws ParseFailedException
      Deprecated.
      Disconnects from current source and any file.

      Forces a garbage collection.

      Specified by:
      disconnectSource in interface IDataSupplier
      Throws:
      ParseFailedException - Not thrown in this implementation.
    • gapFillLocalisedGUIText

      private void gapFillLocalisedGUIText()
      Deprecated.
      Sets the defaults for warnings and exceptions in English if an appropriate language properties file is missing.
    • reportProgress

      public void reportProgress​(int progress, IDataset dataset)
      Deprecated.
      Reports progress to reportingListeners.

      Reports if progress is a multiple of total records / 100. If progress is zero or less, reports progress as 0 of 1.

      Parameters:
      progress - Progress in record processing.
      dataset - Dataset to extract estimate of processing to be done.
    • reportProgress

      public void reportProgress​(int progress, int total)
      Deprecated.
      Reports progress to reportingListeners.

      Reports for an arbitrary progress and total worked towards.

      Parameters:
      progress - Value indicating progress through work total.
      total - Value indicating total work to do.
    • reportMessage

      public void reportMessage​(String message)
      Deprecated.
      Reports message to reportingListeners.
      Parameters:
      message - Message to reporting listeners.