Class PersistentDataStoreFactory

  • All Implemented Interfaces:
    DataStoreFactory

    public class PersistentDataStoreFactory
    extends java.lang.Object
    implements DataStoreFactory
    DataStoreFactory implementation that will store cached columns in the file system. These files are not cleared up, so will persist between JVMs.

    Use with caution, since this may leave large files in a cache directory. To mitigate this, a note about persistent files that were written is written to stderr on JVM shutdown.

    An instance of this class is safe for use from concurrent threads. It is also safe to use multiple PersistentDataStoreFactory instances using the same cache directory (since ColumnStorage uses MoveFileByteStores to ensure that cached columns only appear in the cache when they are fully populated). However, multiple PersistentDataStoreFactory instances may end up doing work to cache the same input data at the same time, which is not maximally efficient.

    Since:
    7 Jan 2020
    Author:
    Mark Taylor
    • Constructor Detail

      • PersistentDataStoreFactory

        public PersistentDataStoreFactory​(DiskCache cache,
                                          TupleRunner tupleRunner)
        Constructor.
        Parameters:
        cache - persistent storage cache; if null, a default instance will be used
        tupleRunner - tuple runner; if null, a default instance will be used
      • PersistentDataStoreFactory

        public PersistentDataStoreFactory()
        Default constructor.
    • Method Detail

      • readDataStore

        public DataStore readDataStore​(DataSpec[] dataSpecs,
                                       DataStore store0)
                                throws java.io.IOException
        Description copied from interface: DataStoreFactory
        Generates a DataStore capable of supplying the data for a given list of DataSpec objects. The prevStore argument may optionally supply the result of a previous invocation of this method. The implementation may choose to make use of the internal state of such an instance for efficiency, for instance by re-using data that has already been read.

        Since the bulk data is managed by the DataStore object, care should be taken about what happens to the DataStore objects supplied to and returned from this method. In particular, code both invoking and implementing this method should usually make sure not to keep a reference to the prevStore argument.

        This method may perform the actual reading, and therefore take time. It is not intended to be invoked on the event dispatch thread.

        Specified by:
        readDataStore in interface DataStoreFactory
        Parameters:
        dataSpecs - data specifications; some elements may be null
        store0 - previously obtained DataStore, or null
        Returns:
        new data store
        Throws:
        java.io.IOException
      • toCacheDir

        public static java.io.File toCacheDir​(java.io.File baseDir)
        Returns a suitable cache directory for use with this class, given a base directory.
        Parameters:
        baseDir - base directory; if null, java.io.tmpdir is used
        Returns:
        directory to which cache files can be written