base module¶
Base class for working with records.
vectorbt works with two different representations of data: matrices and records.
A matrix, in this context, is just an array of one-dimensional arrays, each corresponding to a separate feature. The matrix itself holds only one kind of information (one attribute). For example, one can create a matrix for entry signals, with columns being different strategy configurations. But what if the matrix is huge and sparse? What if there is more information we would like to represent by each element? Creating multiple matrices would be a waste of memory.
Records make possible representing complex, sparse information in a dense format. They are just an array of one-dimensional arrays of a fixed schema, where each element holds a different kind of information. You can imagine records being a DataFrame, where each row represents a record and each column represents a specific attribute. Read more on structured arrays here.
For example, let's represent two DataFrames as a single record array:
a b
0 1.0 5.0
attr1 = 1 2.0 NaN
2 NaN 7.0
3 4.0 8.0
a b
0 9.0 13.0
attr2 = 1 10.0 NaN
2 NaN 15.0
3 12.0 16.0
|
v
id col idx attr1 attr2
0 0 0 0 1 9
1 1 0 1 2 10
2 2 0 3 4 12
3 0 1 0 5 13
4 1 1 2 7 15
5 2 1 3 8 16
Another advantage of records is that they are not constrained by size. Multiple records can map to a single element in a matrix. For example, one can define multiple orders at the same timestamp, which is impossible to represent in a matrix form without duplicating index entries or using complex data types.
Consider the following example:
>>> from vectorbtpro import *
>>> example_dt = np.dtype([
... ('id', np.int_),
... ('col', np.int_),
... ('idx', np.int_),
... ('some_field', np.float_)
... ])
>>> records_arr = np.array([
... (0, 0, 0, 10.),
... (1, 0, 1, 11.),
... (2, 0, 2, 12.),
... (0, 1, 0, 13.),
... (1, 1, 1, 14.),
... (2, 1, 2, 15.),
... (0, 2, 0, 16.),
... (1, 2, 1, 17.),
... (2, 2, 2, 18.)
... ], dtype=example_dt)
>>> wrapper = vbt.ArrayWrapper(index=['x', 'y', 'z'],
... columns=['a', 'b', 'c'], ndim=2, freq='1 day')
>>> records = vbt.Records(wrapper, records_arr)
Printing¶
There are two ways to print records:
- Raw dataframe that preserves field names and data types:
>>> records.records
id col idx some_field
0 0 0 0 10.0
1 1 0 1 11.0
2 2 0 2 12.0
3 0 1 0 13.0
4 1 1 1 14.0
5 2 1 2 15.0
6 0 2 0 16.0
7 1 2 1 17.0
8 2 2 2 18.0
- Readable dataframe that takes into consideration Records.field_config:
>>> records.readable
Id Column Timestamp some_field
0 0 a x 10.0
1 1 a y 11.0
2 2 a z 12.0
3 0 b x 13.0
4 1 b y 14.0
5 2 b z 15.0
6 0 c x 16.0
7 1 c y 17.0
8 2 c z 18.0
Mapping¶
Records are just structured arrays with a bunch of methods and properties for processing them. Their main feature is to map the records array and to reduce it by column (similar to the MapReduce paradigm). The main advantage is that it all happens without conversion to the matrix form and wasting memory resources.
Records can be mapped to MappedArray in several ways:
- Use Records.map_field() to map a record field:
>>> records.map_field('some_field')
<vectorbtpro.records.mapped_array.MappedArray at 0x7ff49bd31a58>
>>> records.map_field('some_field').values
array([10., 11., 12., 13., 14., 15., 16., 17., 18.])
- Use Records.map() to map records using a custom function.
>>> @njit
... def power_map_nb(record, pow):
... return record.some_field ** pow
>>> records.map(power_map_nb, 2)
<vectorbtpro.records.mapped_array.MappedArray at 0x7ff49c990cf8>
>>> records.map(power_map_nb, 2).values
array([100., 121., 144., 169., 196., 225., 256., 289., 324.])
>>> # Map using a meta function
>>> @njit
... def power_map_meta_nb(ridx, records, pow):
... return records[ridx].some_field ** pow
>>> vbt.Records.map(power_map_meta_nb, records.values, 2, col_mapper=records.col_mapper).values
array([100., 121., 144., 169., 196., 225., 256., 289., 324.])
- Use Records.map_array() to convert an array to MappedArray.
>>> records.map_array(records_arr['some_field'] ** 2)
<vectorbtpro.records.mapped_array.MappedArray object at 0x7fe9bccf2978>
>>> records.map_array(records_arr['some_field'] ** 2).values
array([100., 121., 144., 169., 196., 225., 256., 289., 324.])
- Use Records.apply() to apply a function on each column/group:
>>> @njit
... def cumsum_apply_nb(records):
... return np.cumsum(records.some_field)
>>> records.apply(cumsum_apply_nb)
<vectorbtpro.records.mapped_array.MappedArray at 0x7ff49c990cf8>
>>> records.apply(cumsum_apply_nb).values
array([10., 21., 33., 13., 27., 42., 16., 33., 51.])
>>> group_by = np.array(['first', 'first', 'second'])
>>> records.apply(cumsum_apply_nb, group_by=group_by, apply_per_group=True).values
array([10., 21., 33., 46., 60., 75., 16., 33., 51.])
>>> # Apply using a meta function
>>> @njit
... def cumsum_apply_meta_nb(idxs, col, records):
... return np.cumsum(records[idxs].some_field)
>>> vbt.Records.apply(cumsum_apply_meta_nb, records.values, col_mapper=records.col_mapper).values
array([10., 21., 33., 13., 27., 42., 16., 33., 51.])
Notice how cumsum resets at each column in the first example and at each group in the second example.
Filtering¶
Use Records.apply_mask() to filter elements per column/group:
>>> mask = [True, False, True, False, True, False, True, False, True]
>>> filtered_records = records.apply_mask(mask)
>>> filtered_records.records
id col idx some_field
0 0 0 0 10.0
1 2 0 2 12.0
2 1 1 1 14.0
3 0 2 0 16.0
4 2 2 2 18.0
Grouping¶
One of the key features of Records is that you can perform reducing operations on a group of columns as if they were a single column. Groups can be specified by group_by, which can be anything from positions or names of column levels, to a NumPy array with actual groups.
There are multiple ways of define grouping:
- When creating Records, pass
group_byto ArrayWrapper:
>>> group_by = np.array(['first', 'first', 'second'])
>>> grouped_wrapper = wrapper.replace(group_by=group_by)
>>> grouped_records = vbt.Records(grouped_wrapper, records_arr)
>>> grouped_records.map_field('some_field').mean()
first 12.5
second 17.0
dtype: float64
- Regroup an existing Records:
- Pass
group_bydirectly to the mapping method:
- Pass
group_bydirectly to the reducing method:
Note
Grouping applies only to reducing operations, there is no change to the arrays.
Indexing¶
Like any other class subclassing Wrapping, we can do pandas indexing on a Records instance, which forwards indexing operation to each object with columns:
>>> records['a'].records
id col idx some_field
0 0 0 0 10.0
1 1 0 1 11.0
2 2 0 2 12.0
>>> grouped_records['first'].records
id col idx some_field
0 0 0 0 10.0
1 1 0 1 11.0
2 2 0 2 12.0
3 0 1 0 13.0
4 1 1 1 14.0
5 2 1 2 15.0
Note
Changing index (time axis) is not supported. The object should be treated as a Series rather than a DataFrame; for example, use some_field.iloc[0] instead of some_field.iloc[:, 0] to get the first column.
Indexing behavior depends solely upon ArrayWrapper. For example, if group_select is enabled indexing will be performed on groups when grouped, otherwise on single columns.
Caching¶
Records supports caching. If a method or a property requires heavy computation, it's wrapped with cached_method() and cached_property respectively. Caching can be disabled globally via caching.
Note
Because of caching, class is meant to be immutable and all properties are read-only. To change any attribute, use the Records.replace() method and pass changes as keyword arguments.
Saving and loading¶
Like any other class subclassing Pickleable, we can save a Records instance to the disk with Pickleable.save() and load it with Pickleable.load().
Stats¶
Hint
See StatsBuilderMixin.stats() and Records.metrics.
>>> records.stats(column='a')
Start x
End z
Period 3 days 00:00:00
Total Records 3
Name: a, dtype: object
StatsBuilderMixin.stats() also supports (re-)grouping:
>>> grouped_records.stats(column='first')
Start x
End z
Period 3 days 00:00:00
Total Records 6
Name: first, dtype: object
Plots¶
Hint
This class is too generic to have any subplots, but feel free to add custom subplots to your subclass.
Extending¶
Records class can be extended by subclassing.
In case some of our fields have the same meaning but different naming (such as the base field idx) or other properties, we can override field_config using override_field_config(). It will look for configs of all base classes and merge our config on top of them. This preserves any base class property that is not explicitly listed in our config.
>>> from vectorbtpro.records.decorators import override_field_config
>>> my_dt = np.dtype([
... ('my_id', np.int_),
... ('my_col', np.int_),
... ('my_idx', np.int_)
... ])
>>> my_fields_config = dict(
... dtype=my_dt,
... settings=dict(
... id=dict(name='my_id'),
... col=dict(name='my_col'),
... idx=dict(name='my_idx')
... )
... )
>>> @override_field_config(my_fields_config)
... class MyRecords(vbt.Records):
... pass
>>> records_arr = np.array([
... (0, 0, 0),
... (1, 0, 1),
... (0, 1, 0),
... (1, 1, 1)
... ], dtype=my_dt)
>>> wrapper = vbt.ArrayWrapper(index=['x', 'y'],
... columns=['a', 'b'], ndim=2, freq='1 day')
>>> my_records = MyRecords(wrapper, records_arr)
>>> my_records.id_arr
array([0, 1, 0, 1])
>>> my_records.col_arr
array([0, 0, 1, 1])
>>> my_records.idx_arr
array([0, 1, 0, 1])
Alternatively, we can override the _field_config class attribute.
>>> @override_field_config
... class MyRecords(vbt.Records):
... _field_config = dict(
... dtype=my_dt,
... settings=dict(
... id=dict(name='my_id'),
... idx=dict(name='my_idx'),
... col=dict(name='my_col')
... )
... )
Note
Don't forget to decorate the class with @override_field_config to inherit configs from base classes.
You can stop inheritance by not decorating or passing merge_configs=False to the decorator.
MetaFields class¶
Meta class that exposes a read-only class property MetaFields.field_config.
Superclasses
builtins.type
Subclasses
field_config property¶
Field config.
MetaRecords class¶
Meta class that exposes a read-only class property StatsBuilderMixin.metrics.
Superclasses
- MetaAnalyzable
- MetaFields
- MetaPlotsBuilderMixin
- MetaStatsBuilderMixin
builtins.type
Inherited members
Records class¶
Wraps the actual records array (such as trades) and exposes methods for mapping it to some array of values (such as PnL of each trade).
Args
wrapper:ArrayWrapper-
Array wrapper.
See ArrayWrapper.
records_arr:array_like-
A structured NumPy array of records.
Must have the fields
id(record index) andcol(column index). col_mapper:ColumnMapper-
Column mapper if already known.
Note
It depends on
records_arr, so make sure to invalidatecol_mapperupon creating a Records instance with a modifiedrecords_arr.Records.replace() does it automatically.
**kwargs-
Custom keyword arguments passed to the config.
Useful if any subclass wants to extend the config.
Superclasses
- Analyzable
- AttrResolverMixin
- Cacheable
- Chainable
- Comparable
- Configured
- ExtPandasIndexer
- HasSettings
- IndexApplier
- IndexingBase
- Itemable
- PandasIndexer
- Paramable
- Pickleable
- PlotsBuilderMixin
- Prettified
- RecordsWithFields
- StatsBuilderMixin
- Wrapping
Inherited members
- Analyzable.cls_dir
- Analyzable.column_only_select
- Analyzable.config
- Analyzable.group_select
- Analyzable.iloc
- Analyzable.indexing_kwargs
- Analyzable.loc
- Analyzable.range_only_select
- Analyzable.rec_state
- Analyzable.self_aliases
- Analyzable.wrapper
- Analyzable.xloc
- AttrResolverMixin.deep_getattr()
- AttrResolverMixin.post_resolve_attr()
- AttrResolverMixin.pre_resolve_attr()
- AttrResolverMixin.resolve_attr()
- AttrResolverMixin.resolve_shortcut_attr()
- Cacheable.get_ca_setup()
- Chainable.pipe()
- Configured.copy()
- Configured.equals()
- Configured.get_writeable_attrs()
- Configured.prettify()
- Configured.resolve_merge_kwargs()
- Configured.update_config()
- HasSettings.get_path_setting()
- HasSettings.get_path_settings()
- HasSettings.get_setting()
- HasSettings.get_settings()
- HasSettings.has_path_setting()
- HasSettings.has_path_settings()
- HasSettings.has_setting()
- HasSettings.has_settings()
- HasSettings.reset_settings()
- HasSettings.resolve_setting()
- HasSettings.resolve_settings_paths()
- HasSettings.set_settings()
- IndexApplier.add_levels()
- IndexApplier.drop_duplicate_levels()
- IndexApplier.drop_levels()
- IndexApplier.drop_redundant_levels()
- IndexApplier.rename_levels()
- IndexApplier.select_levels()
- IndexingBase.indexing_setter_func()
- PandasIndexer.xs()
- Pickleable.decode_config()
- Pickleable.decode_config_node()
- Pickleable.dumps()
- Pickleable.encode_config()
- Pickleable.encode_config_node()
- Pickleable.file_exists()
- Pickleable.getsize()
- Pickleable.load()
- Pickleable.loads()
- Pickleable.modify_state()
- Pickleable.resolve_file_path()
- Pickleable.save()
- PlotsBuilderMixin.build_subplots_doc()
- PlotsBuilderMixin.override_subplots_doc()
- PlotsBuilderMixin.plots()
- StatsBuilderMixin.build_metrics_doc()
- StatsBuilderMixin.override_metrics_doc()
- StatsBuilderMixin.stats()
- Wrapping.apply_to_index()
- Wrapping.as_param()
- Wrapping.items()
- Wrapping.regroup()
- Wrapping.resolve_column_stack_kwargs()
- Wrapping.resolve_row_stack_kwargs()
- Wrapping.resolve_self()
- Wrapping.resolve_stack_kwargs()
- Wrapping.select_col()
- Wrapping.select_col_from_obj()
- Wrapping.split()
- Wrapping.split_apply()
Subclasses
apply class method¶
Records.apply(
apply_func_nb,
*args,
group_by=None,
apply_per_group=False,
dtype=None,
jitted=None,
chunked=None,
col_mapper=None,
**kwargs
)
Apply function on records per column/group. Returns mapped array.
Applies per group if apply_per_group is True.
See apply_nb().
For details on the meta version, see apply_meta_nb().
**kwargs are passed to Records.map_array().
apply_mask method¶
Return a new class instance, filtered by mask.
build_field_config_doc class method¶
Build field config documentation.
col_arr property¶
Get column array.
col_mapper property¶
Column mapper.
See ColumnMapper.
column_stack class method¶
Stack multiple Records instances along columns.
Uses ArrayWrapper.column_stack() to stack the wrappers and Records.column_stack_records_arrs() to stack the record arrays.
get_indexer_kwargs are passed to pandas.Index.get_indexer to translate old indices to new ones after the reindexing operation.
Note
Will produce a column-sorted array.
column_stack_records_arrs class method¶
Stack multiple record arrays along columns.
count method¶
Get count by column.
coverage_map method¶
See MappedArray.coverage_map().
field_config class variable¶
Field config of Records.
HybridConfig(
dtype=None,
settings=dict(
id=dict(
name='id',
title='Id',
mapping='ids'
),
col=dict(
name='col',
title='Column',
mapping='columns',
as_customdata=False
),
idx=dict(
name='idx',
title='Index',
mapping='index'
)
)
)
Returns Records._field_config, which gets (hybrid-) copied upon creation of each instance. Thus, changing this config won't affect the class.
To change fields, you can either change the config in-place, override this property, or overwrite the instance variable Records._field_config.
field_names property¶
Field names.
first_n method¶
Return the first N records in each column.
get_apply_mapping_arr method¶
Get the mapped array on the field, with mapping applied. Uses Records.field_config.
get_apply_mapping_str_arr method¶
Get the mapped array on the field, with mapping applied and stringified. Uses Records.field_config.
get_column_stack_record_indices class method¶
Get the indices that map concatenated record arrays into the column-stacked record array.
get_field_arr method¶
Get the array of the field. Uses Records.field_config.
get_field_mapping method¶
Get the mapping of the field. Uses Records.field_config.
get_field_name method¶
Get the name of the field. Uses Records.field_config..
get_field_setting method¶
Get any setting of the field. Uses Records.field_config.
get_field_title method¶
Get the title of the field. Uses Records.field_config.
get_map_field method¶
Get the mapped array of the field. Uses Records.field_config.
get_map_field_to_columns method¶
Get the mapped array on the field, with columns applied. Uses Records.field_config.
get_map_field_to_index method¶
Get the mapped array on the field, with index applied. Uses Records.field_config.
get_pd_mask method¶
Get mask in form of a Series/DataFrame from row and column indices.
get_row_stack_record_indices class method¶
Get the indices that map concatenated record arrays into the row-stacked record array.
has_conflicts method¶
See MappedArray.has_conflicts().
id_arr property¶
Get id array.
idx_arr property¶
Get index array.
indexing_func method¶
Perform indexing on Records.
indexing_func_meta method¶
Perform indexing on Records and return metadata.
By default, all fields that are mapped to index are indexed. To avoid indexing on some fields, set their setting noindex to True.
is_sorted method¶
Check whether records are sorted.
last_n method¶
Return the last N records in each column.
map class method¶
Map each record to a scalar value. Returns mapped array.
See map_records_nb().
For details on the meta version, see map_records_meta_nb().
**kwargs are passed to Records.map_array().
map_array method¶
Convert array to mapped array.
The length of the array must match that of the records.
map_field method¶
Convert field to mapped array.
**kwargs are passed to Records.map_array().
metrics class variable¶
Metrics supported by Records.
HybridConfig(
start_index=dict(
title='Start Index',
calc_func=<function Records.<lambda> at 0x132587c40>,
agg_func=None,
tags='wrapper'
),
end_index=dict(
title='End Index',
calc_func=<function Records.<lambda> at 0x132587ce0>,
agg_func=None,
tags='wrapper'
),
total_duration=dict(
title='Total Duration',
calc_func=<function Records.<lambda> at 0x132587d80>,
apply_to_timedelta=True,
agg_func=None,
tags='wrapper'
),
count=dict(
title='Count',
calc_func='count',
tags='records'
)
)
Returns Records._metrics, which gets (hybrid-) copied upon creation of each instance. Thus, changing this config won't affect the class.
To change metrics, you can either change the config in-place, override this property, or overwrite the instance variable Records._metrics.
override_field_config_doc class method¶
Call this method on each subclass that overrides Records.field_config.
pd_mask property¶
MappedArray.get_pd_mask with default arguments.
plots_defaults property¶
Defaults for PlotsBuilderMixin.plots().
Merges PlotsBuilderMixin.plots_defaults and plots from records.
prepare_customdata method¶
Prepare customdata and hoverinfo for Plotly.
Will display all fields in the data type or only those in incl_fields, unless any of them has the field config setting as_customdata disabled, or it's listed in excl_fields. Additionally, you can define hovertemplate in the field config such as by using Sub where title is substituted by the title and index is substituted by (final) index in the customdata. If provided as a string, will be wrapped with Sub. Defaults to "$title: %{{customdata[$index]}}". Mapped fields will be stringified automatically.
To append one or more custom arrays, provide append_info as a list of tuples, each consisting of a 1-dim NumPy array, title, and optionally hoverinfo. If the array's data type is object, will treat it as strings, otherwise as numbers.
random_n method¶
Return random N records in each column.
readable property¶
Records.to_readable() with default arguments.
recarray property¶
records property¶
Records.
records_arr property¶
Records array.
records_readable property¶
Records.to_readable() with default arguments.
replace method¶
See Configured.replace().
Also, makes sure that Records.col_mapper is not passed to the new instance.
resample method¶
Perform resampling on Records.
resample_meta method¶
Perform resampling on Records and return metadata.
resample_records_arr method¶
Perform resampling on the record array.
row_stack class method¶
Stack multiple Records instances along rows.
Uses ArrayWrapper.row_stack() to stack the wrappers and Records.row_stack_records_arrs() to stack the record arrays.
Note
Will produce a column-sorted array.
row_stack_records_arrs class method¶
Stack multiple record arrays along rows.
select_cols method¶
Select columns.
Returns indices and new record array. Automatically decides whether to use column lengths or column map.
sort method¶
Sort records by columns (primary) and ids (secondary, optional).
Note
Sorting is expensive. A better approach is to append records already in the correct order.
stats_defaults property¶
Defaults for StatsBuilderMixin.stats().
Merges StatsBuilderMixin.stats_defaults and stats from records.
subplots class variable¶
Subplots supported by Records.
Returns Records._subplots, which gets (hybrid-) copied upon creation of each instance. Thus, changing this config won't affect the class.
To change subplots, you can either change the config in-place, override this property, or overwrite the instance variable Records._subplots.
to_readable method¶
Get records in a human-readable format.
values property¶
Records array.
RecordsWithFields class¶
Class exposes a read-only class property RecordsWithFields.field_config.
Subclasses
field_config function¶
Field config of ${cls_name}.