parquet module ¶

ParquetData class ¶

ParquetData(
    wrapper,
    data,
    single_key=True,
    classes=None,
    level_name=None,
    fetch_kwargs=None,
    returned_kwargs=None,
    last_index=None,
    delisted=None,
    tz_localize=None,
    tz_convert=None,
    missing_index=None,
    missing_columns=None,
    **kwargs
)

Data class for fetching Parquet data using PyArrow or FastParquet.

Superclasses

Inherited members

fetch_feature class method ¶

ParquetData.fetch_feature(
    feature,
    **kwargs
)

Fetch the Parquet file of a feature.

Uses ParquetData.fetch_key().

fetch_key class method ¶

ParquetData.fetch_key(
    key,
    path=None,
    tz=None,
    squeeze=None,
    keep_partition_cols=None,
    engine=None,
    **read_kwargs
)

Fetch the Parquet file of a feature or symbol.

Args

key : hashable

Feature or symbol.

path : str

Path.

If path is None, uses key as the path to the Parquet file.

tz : any

Target timezone.

See to_timezone().

squeeze : int

Whether to squeeze a DataFrame with one column into a Series.

keep_partition_cols : bool

Whether to return partitioning columns (if any).

If None, will remove any partitioning column that is "group" or "group_{index}".

Retrieves the list of partitioning columns with ParquetData.list_partition_cols().

engine : str

See pd.read_parquet.

**read_kwargs

Other keyword arguments passed to pd.read_parquet.

See https://pandas.pydata.org/docs/reference/api/pandas.read_parquet.html for other arguments.

For defaults, see custom.parquet in data.

fetch_symbol class method ¶

ParquetData.fetch_symbol(
    symbol,
    **kwargs
)

Fetch the Parquet file of a symbol.

Uses ParquetData.fetch_key().

is_default_partition_col class method ¶

ParquetData.is_default_partition_col(
    level
)

Return whether a partitioning column is a default partitioning column.

is_parquet_dir class method ¶

ParquetData.is_parquet_dir(
    path
)

Return whether the path is a directory that is a group itself or contains groups of Parquet partitions.

is_parquet_file class method ¶

ParquetData.is_parquet_file(
    path
)

Return whether the path is a Parquet file.

is_parquet_group_dir class method ¶

ParquetData.is_parquet_group_dir(
    path
)

Return whether the path is a directory that is a group of Parquet partitions.

Note

Assumes the Hive partitioning scheme.

list_partition_cols class method ¶

ParquetData.list_partition_cols(
    path
)

List partitioning columns under a path.

Note

Assumes the Hive partitioning scheme.

update_feature method ¶

ParquetData.update_feature(
    feature,
    **kwargs
)

Update data of a feature.

Uses ParquetData.update_key() with key_is_feature=True.

update_key method ¶

ParquetData.update_key(
    key,
    key_is_feature=False,
    **kwargs
)

Update data of a feature or symbol.

update_symbol method ¶

ParquetData.update_symbol(
    symbol,
    **kwargs
)

Update data for a symbol.

Uses ParquetData.update_key() with key_is_feature=False.

parquet module¶

ParquetData class¶

fetch_feature class method¶

fetch_key class method¶

fetch_symbol class method¶

is_default_partition_col class method¶

is_parquet_dir class method¶

is_parquet_file class method¶

is_parquet_group_dir class method¶

list_partition_cols class method¶

update_feature method¶

update_key method¶

update_symbol method¶