Skip to content

parquet module

Module with ParquetData.


ParquetData class

ParquetData(
    wrapper,
    data,
    single_key=True,
    classes=None,
    level_name=None,
    fetch_kwargs=None,
    returned_kwargs=None,
    last_index=None,
    delisted=None,
    tz_localize=None,
    tz_convert=None,
    missing_index=None,
    missing_columns=None,
    **kwargs
)

Data class for fetching Parquet data using PyArrow or FastParquet.

Superclasses

Inherited members


fetch_feature class method

ParquetData.fetch_feature(
    feature,
    **kwargs
)

Fetch the Parquet file of a feature.

Uses ParquetData.fetch_key().


fetch_key class method

ParquetData.fetch_key(
    key,
    path=None,
    tz=None,
    squeeze=None,
    keep_partition_cols=None,
    engine=None,
    **read_kwargs
)

Fetch the Parquet file of a feature or symbol.

Args

key : hashable
Feature or symbol.
path : str

Path.

If path is None, uses key as the path to the Parquet file.

tz : any

Target timezone.

See to_timezone().

squeeze : int
Whether to squeeze a DataFrame with one column into a Series.
keep_partition_cols : bool

Whether to return partitioning columns (if any).

If None, will remove any partitioning column that is "group" or "group_{index}".

Retrieves the list of partitioning columns with ParquetData.list_partition_cols().

engine : str
See pd.read_parquet.
**read_kwargs
Other keyword arguments passed to pd.read_parquet.

See https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html for other arguments.

For defaults, see custom.parquet in data.


fetch_symbol class method

ParquetData.fetch_symbol(
    symbol,
    **kwargs
)

Fetch the Parquet file of a symbol.

Uses ParquetData.fetch_key().


is_default_partition_col class method

ParquetData.is_default_partition_col(
    level
)

Return whether a partitioning column is a default partitioning column.


is_parquet_dir class method

ParquetData.is_parquet_dir(
    path
)

Return whether the path is a directory that is a group itself or contains groups of Parquet partitions.


is_parquet_file class method

ParquetData.is_parquet_file(
    path
)

Return whether the path is a Parquet file.


is_parquet_group_dir class method

ParquetData.is_parquet_group_dir(
    path
)

Return whether the path is a directory that is a group of Parquet partitions.

Note

Assumes the Hive partitioning scheme.


list_partition_cols class method

ParquetData.list_partition_cols(
    path
)

List partitioning columns under a path.

Note

Assumes the Hive partitioning scheme.


update_feature method

ParquetData.update_feature(
    feature,
    **kwargs
)

Update data of a feature.

Uses ParquetData.update_key() with key_is_feature=True.


update_key method

ParquetData.update_key(
    key,
    key_is_feature=False,
    **kwargs
)

Update data of a feature or symbol.


update_symbol method

ParquetData.update_symbol(
    symbol,
    **kwargs
)

Update data for a symbol.

Uses ParquetData.update_key() with key_is_feature=False.