Skip to content

decorators module

Decorators for splitting.


cv_split function

cv_split(
    *args,
    parameterized_kwargs=None,
    selection='max',
    return_grid=False,
    skip_errored=False,
    raise_no_results=True,
    template_context=None,
    **split_kwargs
)

Decorator that combines split() and parameterized() for cross-validation.

Creates a new apply function that is going to be decorated with split() and thus applied at each single range using Splitter.apply(). Inside this apply function, there is a test whether the current range belongs to the first (training) set. If yes, parameterizes the underlying function and runs it on the entire grid of parameters. The returned results are then stored in a global list. These results are then read by the other (testing) sets in the same split. If selection is a template, it can evaluate the grid results (available as grid_results) and return the best parameter combination. This parameter combination is then executed by each set (including training).

Argument selection also accepts "min" for np.argmin and "max" for np.argmax.

Keyword arguments parameterized_kwargs will be passed to parameterized() and will have their templates substituted with a context that will also include the split-related context (including split_idx, set_idx, etc., see Splitter.apply()).

If return_grid is True or 'first', returns both the grid and the selection. If return_grid is 'all', executes the grid on each set and returns along with the selection. Otherwise, returns only the selection.

If NoResultsException is raised or skip_errored is True and any exception is raised, will skip the current iteration and remove it from the final index.

Usage

  • Permutate a series and pick the first value. Make the seed parameterizable. Cross-validate based on the highest picked value:
>>> from vectorbtpro import *

>>> @vbt.cv_split(
...     splitter="from_n_rolling",
...     splitter_kwargs=dict(n=3, split=0.5),
...     takeable_args=["sr"],
...     merge_func="concat",
... )
... def f(sr, seed):
...     np.random.seed(seed)
...     return np.random.permutation(sr)[0]

>>> index = pd.date_range("2020-01-01", "2020-02-01")
>>> np.random.seed(0)
>>> sr = pd.Series(np.random.permutation(np.arange(len(index))), index=index)
>>> f(sr, vbt.Param([41, 42, 43]))
split  set    seed
0      set_0  41      22
       set_1  41      28
1      set_0  43       8
       set_1  43      31
2      set_0  43      19
       set_1  43       0
dtype: int64
  • Extend the example above to also return the grid results of each set:
>>> f(sr, vbt.Param([41, 42, 43]), _return_grid="all")
(split  set    seed
 0      set_0  41      22
               42      22
               43       2
        set_1  41      28
               42      28
               43      20
 1      set_0  41       5
               42       5
               43       8
        set_1  41      23
               42      23
               43      31
 2      set_0  41      18
               42      18
               43      19
        set_1  41      27
               42      27
               43       0
 dtype: int64,
 split  set    seed
 0      set_0  41      22
        set_1  41      28
 1      set_0  43       8
        set_1  43      31
 2      set_0  43      19
        set_1  43       0
 dtype: int64)

split function

split(
    *args,
    splitter=None,
    splitter_cls=None,
    splitter_kwargs=None,
    index=None,
    index_from=None,
    takeable_args=None,
    template_context=None,
    forward_kwargs_as=None,
    return_splitter=False,
    apply_kwargs=None,
    **var_kwargs
)

Decorator that splits the inputs of a function.

Does the following:

  1. Resolves the splitter of the type Splitter using the argument splitter. It can be either an already provided splitter instance, the name of a factory method (such as "from_n_rolling"), or the factory method itself. If splitter is None, the right method will be guessed based on the supplied arguments using Splitter.guess_method(). To construct a splitter, it will pass index and **splitter_kwargs. Index is getting resolved either using an already provided index, by parsing the argument under a name/position provided in index_from, or by parsing the first argument from takeable_args (in this order).
  2. Wraps arguments in takeable_args with Takeable
  3. Runs Splitter.apply() with arguments passed to the function as args and kwargs, but also apply_kwargs (the ones passed to the decorator)

Keyword arguments splitter_kwargs are passed to the factory method. Keyword arguments apply_kwargs are passed to Splitter.apply(). If variable keyword arguments are provided, they will be used as splitter_kwargs if apply_kwargs is already set, and vice versa. If splitter_kwargs and apply_kwargs aren't set, they will be used as splitter_kwargs if a splitter instance hasn't been built yet, otherwise as apply_kwargs. If both arguments are set, will raise an error.

Usage

  • Split a Series and return its sum:
>>> from vectorbtpro import *

>>> @vbt.split(
...     splitter="from_n_rolling",
...     splitter_kwargs=dict(n=2),
...     takeable_args=["sr"]
... )
... def f(sr):
...     return sr.sum()

>>> index = pd.date_range("2020-01-01", "2020-01-06")
>>> sr = pd.Series(np.arange(len(index)), index=index)
>>> f(sr)
split
0     3
1    12
dtype: int64
  • Perform a split manually:
>>> @vbt.split(
...     splitter="from_n_rolling",
...     splitter_kwargs=dict(n=2),
...     takeable_args=["index"]
... )
... def f(index, sr):
...     return sr[index].sum()

>>> f(index, sr)
split
0     3
1    12
dtype: int64
  • Construct splitter and mark arguments as "takeable" manually:
>>> splitter = vbt.Splitter.from_n_rolling(index, n=2)
>>> @vbt.split(splitter=splitter)
... def f(sr):
...     return sr.sum()

>>> f(vbt.Takeable(sr))
split
0     3
1    12
dtype: int64
  • Split multiple timeframes using a custom index:
>>> @vbt.split(
...     splitter="from_n_rolling",
...     splitter_kwargs=dict(n=2),
...     index=index,
...     takeable_args=["h12_sr", "d2_sr"]
... )
... def f(h12_sr, d2_sr):
...     return h12_sr.sum() + d2_sr.sum()

>>> h12_index = pd.date_range("2020-01-01", "2020-01-06", freq="12H")
>>> d2_index = pd.date_range("2020-01-01", "2020-01-06", freq="2D")
>>> h12_sr = pd.Series(np.arange(len(h12_index)), index=h12_index)
>>> d2_sr = pd.Series(np.arange(len(d2_index)), index=d2_index)
>>> f(h12_sr, d2_sr)
split
0    15
1    42
dtype: int64