mapped_array module¶
Base class for working with mapped arrays.
This class takes the mapped array and the corresponding column and (optionally) index arrays, and offers features to directly process the mapped array without converting it to pandas; for example, to compute various statistics by column, such as standard deviation.
Consider the following example:
>>> from vectorbtpro import *
>>> a = np.array([10., 11., 12., 13., 14., 15., 16., 17., 18.])
>>> col_arr = np.array([0, 0, 0, 1, 1, 1, 2, 2, 2])
>>> idx_arr = np.array([0, 1, 2, 0, 1, 2, 0, 1, 2])
>>> wrapper = vbt.ArrayWrapper(index=['x', 'y', 'z'],
... columns=['a', 'b', 'c'], ndim=2, freq='1 day')
>>> ma = vbt.MappedArray(wrapper, a, col_arr, idx_arr=idx_arr)
Reducing¶
Using MappedArray, we can then reduce by column as follows:
- Use already provided reducers such as MappedArray.mean():
- Use MappedArray.to_pd() to map to pandas and then reduce manually (expensive):
- Use MappedArray.reduce() to reduce using a custom function:
>>> # Reduce to a scalar
>>> @njit
... def pow_mean_reduce_nb(a, pow):
... return np.mean(a ** pow)
>>> ma.reduce(pow_mean_reduce_nb, 2)
a 121.666667
b 196.666667
c 289.666667
dtype: float64
>>> # Reduce to an array
>>> @njit
... def min_max_reduce_nb(a):
... return np.array([np.min(a), np.max(a)])
>>> ma.reduce(min_max_reduce_nb, returns_array=True,
... wrap_kwargs=dict(name_or_index=['min', 'max']))
a b c
min 10.0 13.0 16.0
max 12.0 15.0 18.0
>>> # Reduce to an array of indices
>>> @njit
... def idxmin_idxmax_reduce_nb(a):
... return np.array([np.argmin(a), np.argmax(a)])
>>> ma.reduce(idxmin_idxmax_reduce_nb, returns_array=True,
... returns_idx=True, wrap_kwargs=dict(name_or_index=['idxmin', 'idxmax']))
a b c
idxmin x x x
idxmax z z z
>>> # Reduce using a meta function to combine multiple mapped arrays
>>> @njit
... def mean_ratio_reduce_meta_nb(idxs, col, a, b):
... return np.mean(a[idxs]) / np.mean(b[idxs])
>>> vbt.MappedArray.reduce(mean_ratio_reduce_meta_nb,
... ma.values - 1, ma.values + 1, col_mapper=ma.col_mapper)
a 0.833333
b 0.866667
c 0.888889
Name: reduce, dtype: float64
Mapping¶
Use MappedArray.apply() to apply a function on each column/group:
>>> @njit
... def cumsum_apply_nb(a):
... return np.cumsum(a)
>>> ma.apply(cumsum_apply_nb)
<vectorbtpro.records.mapped_array.MappedArray at 0x7ff061382198>
>>> ma.apply(cumsum_apply_nb).values
array([10., 21., 33., 13., 27., 42., 16., 33., 51.])
>>> group_by = np.array(['first', 'first', 'second'])
>>> ma.apply(cumsum_apply_nb, group_by=group_by, apply_per_group=True).values
array([10., 21., 33., 46., 60., 75., 16., 33., 51.])
>>> # Apply using a meta function
>>> @njit
... def cumsum_apply_meta_nb(ridxs, col, a):
... return np.cumsum(a[ridxs])
>>> vbt.MappedArray.apply(cumsum_apply_meta_nb, ma.values, col_mapper=ma.col_mapper).values
array([10., 21., 33., 13., 27., 42., 16., 33., 51.])
Notice how cumsum resets at each column in the first example and at each group in the second example.
Conversion¶
We can unstack any MappedArray instance to pandas:
- Given
idx_arrwas provided:
Note
Will throw a warning if there are multiple values pointing to the same position.
- In case
group_bywas provided, index can be ignored, or there are position conflicts:
>>> ma.to_pd(group_by=np.array(['first', 'first', 'second']), ignore_index=True)
first second
0 10.0 16.0
1 11.0 17.0
2 12.0 18.0
3 13.0 NaN
4 14.0 NaN
5 15.0 NaN
Resolving conflicts¶
Sometimes, we may encounter multiple values for each index and column combination. In such case, we can use MappedArray.reduce_segments() to aggregate "duplicate" elements. For example, let's sum up duplicate values per each index and column combination:
>>> ma_conf = ma.replace(idx_arr=np.array([0, 0, 0, 1, 1, 1, 2, 2, 2]))
>>> ma_conf.to_pd()
UserWarning: Multiple values are pointing to the same position. Only the latest value is used.
a b c
x 12.0 NaN NaN
y NaN 15.0 NaN
z NaN NaN 18.0
>>> @njit
... def sum_reduce_nb(a):
... return np.sum(a)
>>> ma_no_conf = ma_conf.reduce_segments(
... (ma_conf.idx_arr, ma_conf.col_arr),
... sum_reduce_nb
... )
>>> ma_no_conf.to_pd()
a b c
x 33.0 NaN NaN
y NaN 42.0 NaN
z NaN NaN 51.0
Filtering¶
Use MappedArray.apply_mask() to filter elements per column/group:
>>> mask = [True, False, True, False, True, False, True, False, True]
>>> filtered_ma = ma.apply_mask(mask)
>>> filtered_ma.count()
a 2
b 1
c 2
dtype: int64
>>> filtered_ma.id_arr
array([0, 2, 4, 6, 8])
Grouping¶
One of the key features of MappedArray is that we can perform reducing operations on a group of columns as if they were a single column. Groups can be specified by group_by, which can be anything from positions or names of column levels, to a NumPy array with actual groups.
There are multiple ways of define grouping:
- When creating MappedArray, pass
group_byto ArrayWrapper:
>>> group_by = np.array(['first', 'first', 'second'])
>>> grouped_wrapper = wrapper.replace(group_by=group_by)
>>> grouped_ma = vbt.MappedArray(grouped_wrapper, a, col_arr, idx_arr=idx_arr)
>>> grouped_ma.mean()
first 12.5
second 17.0
dtype: float64
- Regroup an existing MappedArray:
- Pass
group_bydirectly to the reducing method:
By the same way we can disable or modify any existing grouping:
Note
Grouping applies only to reducing operations, there is no change to the arrays.
Operators¶
MappedArray implements arithmetic, comparison, and logical operators. We can perform basic operations (such as addition) on mapped arrays as if they were NumPy arrays.
>>> ma ** 2
<vectorbtpro.records.mapped_array.MappedArray at 0x7f97bfc49358>
>>> ma * np.array([1, 2, 3, 4, 5, 6])
<vectorbtpro.records.mapped_array.MappedArray at 0x7f97bfc65e80>
>>> ma + ma
<vectorbtpro.records.mapped_array.MappedArray at 0x7fd638004d30>
Note
Ensure that your MappedArray operand is on the left if the other operand is an array.
If two MappedArray operands have different metadata, will copy metadata from the first one, but at least their id_arr and col_arr must match.
Indexing¶
Like any other class subclassing Wrapping, we can do pandas indexing on a MappedArray instance, which forwards indexing operation to each object with columns:
>>> ma['a'].values
array([10., 11., 12.])
>>> grouped_ma['first'].values
array([10., 11., 12., 13., 14., 15.])
Note
Changing index (time axis) is not supported. The object should be treated as a Series rather than a DataFrame; for example, use some_field.iloc[0] instead of some_field.iloc[:, 0] to get the first column.
Indexing behavior depends solely upon ArrayWrapper. For example, if group_select is enabled indexing will be performed on groups, otherwise on single columns.
Caching¶
MappedArray supports caching. If a method or a property requires heavy computation, it's wrapped with cached_method() and cached_property respectively. Caching can be disabled globally in caching.
Note
Because of caching, class is meant to be immutable and all properties are read-only. To change any attribute, use the MappedArray.replace() method and pass changes as keyword arguments.
Saving and loading¶
Like any other class subclassing Pickleable, we can save a MappedArray instance to the disk with Pickleable.save() and load it with Pickleable.load().
Stats¶
Hint
Metric for mapped arrays are similar to that for GenericAccessor.
>>> ma.stats(column='a')
Start x
End z
Period 3 days 00:00:00
Count 3
Mean 11.0
Std 1.0
Min 10.0
Median 11.0
Max 12.0
Min Index x
Max Index z
Name: a, dtype: object
The main difference unfolds once the mapped array has a mapping: values are then considered as categorical and usual statistics are meaningless to compute. For this case, StatsBuilderMixin.stats() returns the value counts:
>>> mapping = {v: "test_" + str(v) for v in np.unique(ma.values)}
>>> ma.stats(column='a', settings=dict(mapping=mapping))
Start x
End z
Period 3 days 00:00:00
Count 3
Value Counts: test_10.0 1
Value Counts: test_11.0 1
Value Counts: test_12.0 1
Value Counts: test_13.0 0
Value Counts: test_14.0 0
Value Counts: test_15.0 0
Value Counts: test_16.0 0
Value Counts: test_17.0 0
Value Counts: test_18.0 0
Name: a, dtype: object
`MappedArray.stats` also supports (re-)grouping:
```pycon
>>> grouped_ma.stats(column='first')
Start x
End z
Period 3 days 00:00:00
Count 6
Mean 12.5
Std 1.870829
Min 10.0
Median 12.5
Max 15.0
Min Index x
Max Index z
Name: first, dtype: object
Plots¶
We can build histograms and boxplots of MappedArray directly:
To use scatterplots or any other plots that require index, convert to pandas first:
Hint
MappedArray class has a single subplot based on MappedArray.to_pd() and GenericAccessor.plot().
combine_mapped_with_other function¶
Combine MappedArray with other compatible object.
If other object is also MappedArray, their id_arr and col_arr must match.
MappedArray class¶
MappedArray(
wrapper,
mapped_arr,
col_arr,
idx_arr=None,
id_arr=None,
mapping=None,
col_mapper=None,
**kwargs
)
Exposes methods for reducing, converting, and plotting arrays mapped by Records class.
Args
wrapper:ArrayWrapper-
Array wrapper.
See ArrayWrapper.
mapped_arr:array_like- A one-dimensional array of mapped record values.
col_arr:array_like-
A one-dimensional column array.
Must be of the same size as
mapped_arr. id_arr:array_like-
A one-dimensional id array. Defaults to simple range.
Must be of the same size as
mapped_arr. idx_arr:array_like-
A one-dimensional index array. Optional.
Must be of the same size as
mapped_arr. mapping:namedtuple,dictorcallable- Mapping.
col_mapper:ColumnMapper-
Column mapper if already known.
Note
It depends upon
wrapperandcol_arr, so make sure to invalidatecol_mapperupon creating a MappedArray instance with a modifiedwrapperor `col_arr.MappedArray.replace() does it automatically.
**kwargs-
Custom keyword arguments passed to the config.
Useful if any subclass wants to extend the config.
Superclasses
- Analyzable
- AttrResolverMixin
- Cacheable
- Chainable
- Comparable
- Configured
- ExtPandasIndexer
- HasSettings
- IndexApplier
- IndexingBase
- Itemable
- PandasIndexer
- Paramable
- Pickleable
- PlotsBuilderMixin
- Prettified
- StatsBuilderMixin
- Wrapping
Inherited members
- Analyzable.cls_dir
- Analyzable.column_only_select
- Analyzable.config
- Analyzable.group_select
- Analyzable.iloc
- Analyzable.indexing_kwargs
- Analyzable.loc
- Analyzable.range_only_select
- Analyzable.rec_state
- Analyzable.self_aliases
- Analyzable.wrapper
- Analyzable.xloc
- AttrResolverMixin.deep_getattr()
- AttrResolverMixin.post_resolve_attr()
- AttrResolverMixin.pre_resolve_attr()
- AttrResolverMixin.resolve_attr()
- AttrResolverMixin.resolve_shortcut_attr()
- Cacheable.get_ca_setup()
- Chainable.pipe()
- Configured.copy()
- Configured.equals()
- Configured.get_writeable_attrs()
- Configured.prettify()
- Configured.resolve_merge_kwargs()
- Configured.update_config()
- HasSettings.get_path_setting()
- HasSettings.get_path_settings()
- HasSettings.get_setting()
- HasSettings.get_settings()
- HasSettings.has_path_setting()
- HasSettings.has_path_settings()
- HasSettings.has_setting()
- HasSettings.has_settings()
- HasSettings.reset_settings()
- HasSettings.resolve_setting()
- HasSettings.resolve_settings_paths()
- HasSettings.set_settings()
- IndexApplier.add_levels()
- IndexApplier.drop_duplicate_levels()
- IndexApplier.drop_levels()
- IndexApplier.drop_redundant_levels()
- IndexApplier.rename_levels()
- IndexApplier.select_levels()
- IndexingBase.indexing_setter_func()
- PandasIndexer.xs()
- Pickleable.decode_config()
- Pickleable.decode_config_node()
- Pickleable.dumps()
- Pickleable.encode_config()
- Pickleable.encode_config_node()
- Pickleable.file_exists()
- Pickleable.getsize()
- Pickleable.load()
- Pickleable.loads()
- Pickleable.modify_state()
- Pickleable.resolve_file_path()
- Pickleable.save()
- PlotsBuilderMixin.build_subplots_doc()
- PlotsBuilderMixin.override_subplots_doc()
- PlotsBuilderMixin.plots()
- StatsBuilderMixin.build_metrics_doc()
- StatsBuilderMixin.override_metrics_doc()
- StatsBuilderMixin.stats()
- Wrapping.apply_to_index()
- Wrapping.as_param()
- Wrapping.items()
- Wrapping.regroup()
- Wrapping.resolve_column_stack_kwargs()
- Wrapping.resolve_row_stack_kwargs()
- Wrapping.resolve_self()
- Wrapping.resolve_stack_kwargs()
- Wrapping.select_col()
- Wrapping.select_col_from_obj()
- Wrapping.split()
- Wrapping.split_apply()
apply class method¶
MappedArray.apply(
apply_func_nb,
*args,
group_by=None,
apply_per_group=False,
dtype=None,
jitted=None,
chunked=None,
col_mapper=None,
**kwargs
)
Apply function on mapped array per column/group. Returns a new mapped array.
Applies per group of columns if apply_per_group is True.
See apply_nb().
For details on the meta version, see apply_meta_nb().
**kwargs are passed to MappedArray.replace().
apply_mapping method¶
Apply mapping on each element.
apply_mask method¶
Return a new class instance, filtered by mask.
**kwargs are passed to MappedArray.replace().
bottom_n method¶
Filter bottom N elements from each column/group.
bottom_n_mask method¶
Return mask of bottom N elements in each column/group.
boxplot method¶
Plot box plot by column/group.
col_arr property¶
Column array.
col_mapper property¶
Column mapper.
See ColumnMapper.
column_stack class method¶
Stack multiple MappedArray instances along columns.
Uses ArrayWrapper.column_stack() to stack the wrappers.
get_indexer_kwargs are passed to pandas.Index.get_indexer to translate old indices to new ones after the reindexing operation.
Note
Will produce a column-sorted array.
count method¶
Return number of values by column/group.
coverage_map method¶
describe method¶
MappedArray.describe(
percentiles=None,
ddof=1,
group_by=None,
jitted=None,
chunked=None,
wrap_kwargs=None,
**kwargs
)
Return statistics by column/group.
get_pd_mask method¶
Get mask in form of a Series/DataFrame from row and column indices.
has_conflicts method¶
See mapped_has_conflicts_nb().
histplot method¶
Plot histogram by column/group.
id_arr property¶
Id array.
idx_arr property¶
Index array.
idxmax method¶
Return index of max by column/group.
idxmin method¶
Return index of min by column/group.
indexing_func method¶
Perform indexing on MappedArray.
indexing_func_meta method¶
Perform indexing on MappedArray and return metadata.
is_sorted method¶
Check whether mapped array is sorted.
mapped_arr property¶
Mapped array.
mapped_readable property¶
MappedArray.to_readable() with default arguments.
mapping property¶
Mapping.
max method¶
Return max by column/group.
mean method¶
Return mean by column/group.
median method¶
Return median by column/group.
metrics class variable¶
Metrics supported by MappedArray.
HybridConfig(
start_index=dict(
title='Start Index',
calc_func=<function MappedArray.<lambda> at 0x132577880>,
agg_func=None,
tags='wrapper'
),
end_index=dict(
title='End Index',
calc_func=<function MappedArray.<lambda> at 0x132577920>,
agg_func=None,
tags='wrapper'
),
total_duration=dict(
title='Total Duration',
calc_func=<function MappedArray.<lambda> at 0x1325779c0>,
apply_to_timedelta=True,
agg_func=None,
tags='wrapper'
),
count=dict(
title='Count',
calc_func='count',
tags='mapped_array'
),
mean=dict(
title='Mean',
calc_func='mean',
inv_check_has_mapping=True,
tags=[
'mapped_array',
'describe'
]
),
std=dict(
title='Std',
calc_func='std',
inv_check_has_mapping=True,
tags=[
'mapped_array',
'describe'
]
),
min=dict(
title='Min',
calc_func='min',
inv_check_has_mapping=True,
tags=[
'mapped_array',
'describe'
]
),
median=dict(
title='Median',
calc_func='median',
inv_check_has_mapping=True,
tags=[
'mapped_array',
'describe'
]
),
max=dict(
title='Max',
calc_func='max',
inv_check_has_mapping=True,
tags=[
'mapped_array',
'describe'
]
),
idx_min=dict(
title='Min Index',
calc_func='idxmin',
inv_check_has_mapping=True,
agg_func=None,
tags=[
'mapped_array',
'index'
]
),
idx_max=dict(
title='Max Index',
calc_func='idxmax',
inv_check_has_mapping=True,
agg_func=None,
tags=[
'mapped_array',
'index'
]
),
value_counts=dict(
title='Value Counts',
calc_func=<function MappedArray.<lambda> at 0x132577a60>,
resolve_value_counts=True,
check_has_mapping=True,
tags=[
'mapped_array',
'value_counts'
]
)
)
Returns MappedArray._metrics, which gets (hybrid-) copied upon creation of each instance. Thus, changing this config won't affect the class.
To change metrics, you can either change the config in-place, override this property, or overwrite the instance variable MappedArray._metrics.
min method¶
Return min by column/group.
nth method¶
Return n-th element of each column/group.
nth_index method¶
Return index of n-th element of each column/group.
pd_mask property¶
MappedArray.get_pd_mask() with default arguments.
plots_defaults property¶
Defaults for PlotsBuilderMixin.plots().
Merges PlotsBuilderMixin.plots_defaults and plots from mapped_array.
readable property¶
MappedArray.to_readable() with default arguments.
reduce class method¶
MappedArray.reduce(
reduce_func_nb,
*args,
idx_arr=None,
returns_array=False,
returns_idx=False,
to_index=True,
fill_value=nan,
jitted=None,
chunked=None,
col_mapper=None,
group_by=None,
wrap_kwargs=None
)
Reduce mapped array by column/group.
Set returns_array to True if reduce_func_nb returns an array.
Set returns_idx to True if reduce_func_nb returns row index/position. Must pass idx_arr.
Set to_index to True to return labels instead of positions.
Use fill_value to set the default value.
For implementation details, see
- reduce_mapped_nb() if
returns_arrayis False andreturns_idxis False - reduce_mapped_to_idx_nb() if
returns_arrayis False andreturns_idxis True - reduce_mapped_to_array_nb() if
returns_arrayis True andreturns_idxis False - reduce_mapped_to_idx_array_nb() if
returns_arrayis True andreturns_idxis True
For implementation details on the meta versions, see
- reduce_mapped_meta_nb() if
returns_arrayis False andreturns_idxis False - reduce_mapped_to_idx_meta_nb() if
returns_arrayis False andreturns_idxis True - reduce_mapped_to_array_meta_nb() if
returns_arrayis True andreturns_idxis False - reduce_mapped_to_idx_array_meta_nb() if
returns_arrayis True andreturns_idxis True
reduce_segments method¶
MappedArray.reduce_segments(
segment_arr,
reduce_func_nb,
*args,
idx_arr=None,
group_by=None,
apply_per_group=False,
dtype=None,
jitted=None,
chunked=None,
**kwargs
)
Reduce each segment of values in mapped array. Returns a new mapped array.
segment_arr must be an array of integers increasing per column, each indicating a segment. It must have the same length as the mapped array. You can also pass a list of such arrays. In this case, each unique combination of values will be considered a single segment. Can also pass the string "idx" to use the index array.
reduce_func_nb can be a string denoting the suffix of a reducing function from vectorbtpro.generic.nb. For example, "sum" will refer to "sum_reduce_nb".
Warning
Each segment or combination of segments in segment_arr is assumed to be coherent and non-repeating. That is, np.array([0, 1, 0]) for a single column annotates three different segments, not two. See index_repeating_rows_nb().
Hint
Use MappedArray.sort() to bring the mapped array to the desired order, if required.
Applies per group of columns if apply_per_group is True.
See reduce_mapped_segments_nb().
**kwargs are passed to MappedArray.replace().
replace method¶
See Configured.replace().
Also, makes sure that MappedArray.col_mapper is not passed to the new instance.
resample method¶
Perform resampling on MappedArray.
resample_meta method¶
Perform resampling on MappedArray and return metadata.
resolve_mapping method¶
Resolve mapping.
Set mapping to False to disable mapping completely.
row_stack class method¶
Stack multiple MappedArray instances along rows.
Uses ArrayWrapper.row_stack() to stack the wrappers.
Note
Will produce a column-sorted array.
sort method¶
Sort mapped array by column array (primary) and id array (secondary, optional).
**kwargs are passed to MappedArray.replace().
stats_defaults property¶
Defaults for StatsBuilderMixin.stats().
Merges StatsBuilderMixin.stats_defaults and stats from mapped_array.
std method¶
Return std by column/group.
subplots class variable¶
Subplots supported by MappedArray.
HybridConfig(
to_pd_plot=dict(
check_is_not_grouped=True,
plot_func='to_pd.vbt.plot',
pass_trace_names=False,
tags='mapped_array'
)
)
Returns MappedArray._subplots, which gets (hybrid-) copied upon creation of each instance. Thus, changing this config won't affect the class.
To change subplots, you can either change the config in-place, override this property, or overwrite the instance variable MappedArray._subplots.
sum method¶
MappedArray.sum(
fill_value=0.0,
group_by=None,
jitted=None,
chunked=None,
wrap_kwargs=None,
**kwargs
)
Return sum by column/group.
to_columns method¶
Convert to columns.
to_index method¶
Convert to index.
If minus_one_to_zero is True, index -1 will automatically become 0. Otherwise, will throw an error.
to_pd method¶
MappedArray.to_pd(
idx_arr=None,
reduce_func_nb=None,
reduce_args=None,
dtype=None,
ignore_index=False,
repeat_index=False,
fill_value=nan,
mapping=False,
mapping_kwargs=None,
group_by=None,
jitted=None,
chunked=None,
wrap_kwargs=None,
silence_warnings=False
)
Unstack mapped array to a Series/DataFrame.
If reduce_func_nb is not None, will use it to reduce conflicting index segments using MappedArray.reduce_segments().
- If
ignore_index, will ignore the index and place values on top of each other in every column/group. See ignore_unstack_mapped_nb(). - If
repeat_index, will repeat any index pointed from multiple values. Otherwise, in case of positional conflicts, will throw a warning and use the latest value. See repeat_unstack_mapped_nb(). - Otherwise, see unstack_mapped_nb().
Note
Will raise an error if there are multiple values pointing to the same position. Set ignore_index to True in this case.
Warning
Mapped arrays represent information in the most memory-friendly format. Mapping back to pandas may occupy lots of memory if records are sparse.
to_readable method¶
Get values in a human-readable format.
top_n method¶
Filter top N elements from each column/group.
top_n_mask method¶
Return mask of top N elements in each column/group.
value_counts method¶
MappedArray.value_counts(
axis=1,
idx_arr=None,
normalize=False,
sort_uniques=True,
sort=False,
ascending=False,
dropna=False,
group_by=None,
mapping=None,
incl_all_keys=False,
jitted=None,
chunked=None,
wrap_kwargs=None,
**kwargs
)
See GenericAccessor.value_counts().
values property¶
Mapped array.