Optimization improvements
Version 1.8.12.99
of backtrader includes an improvement in how
data feeds and results are managed during multiprocessing.
Note
The behavior for both has been made
The behavior of these options can be controlled through two new Cerebro parameters:
-
optdatas
(default:True
)If
True
and optimizing (and the system canpreload
and userunonce
, data preloading will be done only once in the main process to save time and resources. -
optreturn
(default:True
)If
True
the optimization results will not be fullStrategy
objects (and all datas, indicators, observers …) but and object with the following attributes (same as inStrategy
):-
params
(orp
) the strategy had for the execution -
analyzers
the strategy has executed
In most occassions, only the analyzers and with which params are the things needed to evaluate a the performance of a strategy. If detailed analysis of the generated values for (for example) indicators is needed, turn this off
-
Data Feed Management
In a Optimization scenario this is a likely combination of Cerebro parameters:
-
preload=True
(default)Data Feeeds will be preloaded before running any backtesting code
-
runonce=True
(default)Indicators will be calculated in batch mode a tight for loop, instead of step by step.
If both conditions are True
and optdatas=True
, then:
- The Data Feeds will be preloaded in the main process before spawning new subprocesses (the ones in charge of executing the backtesting)
Results management
In a Optimization scenario two things should play the most important role when evaluating the different parameters with which each Strategy was run:
-
strategy.params
(orstrategy.p
)The actual set of values used for the backtesting
-
strategy.analyzers
The objects in charge of providing the evaluation of how the Strategy has actually performed. Example:
SharpeRatio_A
(the annualized SharpeRatio)
When optreturn=True
, instead of returning full strategy instances,
placeholder objects will be created which carry the two attributes
aforementioned to let the evaluation take place.
This avoids passing back lots of generated data like for example the values generated by indicators during the backtesting
Should the full strategy objects be wished, simply set optreturn=False
during cerebro instantiation or when doing cerebro.run
.
Some test runs
The optimization sample in the backtrader sources has been extended to add
control for optdatas
and optreturn
(actually to disable them)
Single Core Run
As a reference what happens when the amount of CPUs is limited to 1
and the
multiprocessing
module is not used:
$ ./optimization.py --maxcpus 1 ================================================== ************************************************** -------------------------------------------------- OrderedDict([(u'smaperiod', 10), (u'macdperiod1', 12), (u'macdperiod2', 26), (u'macdperiod3', 9)]) ************************************************** -------------------------------------------------- OrderedDict([(u'smaperiod', 10), (u'macdperiod1', 13), (u'macdperiod2', 26), (u'macdperiod3', 9)]) ... ... OrderedDict([(u'smaperiod', 29), (u'macdperiod1', 19), (u'macdperiod2', 29), (u'macdperiod3', 14)]) ================================================== Time used: 184.922727833
Multiple Core Runs
Without limiting the number of CPUs, the Python multiprocessing
module will
try to use all of them. optdatas
and optreturn
will be disabled
Both optdata
and optreturn
active
The default behavior:
$ ./optimization.py ... ... ... ================================================== Time used: 56.5889185394
The total improvement by having multicore and the data feed and results
improvements means going down from 184.92
to 56.58
seconds.
Take into account that the sample is using 252
bars and the indicators
generate only values with a length of 252
points. This is just an example.
The real question is how much of this is attributable to the new behavior.
optreturn
deactivated
Let’s pass full strategy objects back to the caller:
$ ./optimization.py --no-optreturn ... ... ... ================================================== Time used: 67.056914007
The execution time has increased 18.50%
(or a speed-up of 15.62%
) is in
place.
optdatas
deactivated
Each subproccess is forced to load its own set of values for the data feeds:
$ ./optimization.py --no-optdatas ... ... ... ================================================== Time used: 72.7238112637
The execution time has increased 28.52%
(or a speed-up of 22.19%
) is in
place.
Both deactivated
Still using multicore but with the old non-improved behavior:
$ ./optimization.py --no-optdatas --no-optreturn ... ... ... ================================================== Time used: 83.6246643786
The execution time has increased 47.79%
(or a speed-up of 32.34%
) is in
place.
This shows that the used of multiple cores is the major contributor to the time improvement.
Note
The executions have been done in a Laptop with a i7-4710HQ
(4-core / 8
logical) with 16 GBytes of RAM under Windows 10 64bit. The mileage may vary
under other conditions
Concluding
-
The greatest factor in time reduction during optimization is the use of the multiple cores
-
The sample runs with
optdatas
andoptreturn
show speed-ups of around22.19%
and15.62%
each (32.34%
both together in the test)
Sample Usage
$ ./optimization.py --help usage: optimization.py [-h] [--data DATA] [--fromdate FROMDATE] [--todate TODATE] [--maxcpus MAXCPUS] [--no-runonce] [--exactbars EXACTBARS] [--no-optdatas] [--no-optreturn] [--ma_low MA_LOW] [--ma_high MA_HIGH] [--m1_low M1_LOW] [--m1_high M1_HIGH] [--m2_low M2_LOW] [--m2_high M2_HIGH] [--m3_low M3_LOW] [--m3_high M3_HIGH] Optimization optional arguments: -h, --help show this help message and exit --data DATA, -d DATA data to add to the system --fromdate FROMDATE, -f FROMDATE Starting date in YYYY-MM-DD format --todate TODATE, -t TODATE Starting date in YYYY-MM-DD format --maxcpus MAXCPUS, -m MAXCPUS Number of CPUs to use in the optimization - 0 (default): use all available CPUs - 1 -> n: use as many as specified --no-runonce Run in next mode --exactbars EXACTBARS Use the specified exactbars still compatible with preload 0 No memory savings -1 Moderate memory savings -2 Less moderate memory savings --no-optdatas Do not optimize data preloading in optimization --no-optreturn Do not optimize the returned values to save time --ma_low MA_LOW SMA range low to optimize --ma_high MA_HIGH SMA range high to optimize --m1_low M1_LOW MACD Fast MA range low to optimize --m1_high M1_HIGH MACD Fast MA range high to optimize --m2_low M2_LOW MACD Slow MA range low to optimize --m2_high M2_HIGH MACD Slow MA range high to optimize --m3_low M3_LOW MACD Signal range low to optimize --m3_high M3_HIGH MACD Signal range high to optimize