Datafeed Development
Adding a new CSV based data feed is easy. The existing base class CSVDataBase provides the framework taking most of the work off the subclasses which in most cases can simply do:
def _loadline(self, linetokens): # parse the linetokens here and put them in self.lines.close, # self.lines.high, etc return True # if data was parsed, else ... return False
This was shown in CSV Data Feed Development.
The base class takes care of the parameters, initialization, opening of files,
reading lines, splitting the lines in tokens and additional things like skipping
lines which don’t fit into the date range (fromdate
, todate
) which the
en user may have defined.
Developing a non-CSV datafeed follows the same pattern without going down to the already splitted line tokens.
Things to do:
-
Derive from backtrader.feed.DataBase
-
Add any parameters you may need
-
Should initialization be needed, override
__init__(self)
and/orstart(self)
-
Should any clean-up code be needed, override
stop(self)
-
The work happens inside the method which MUST always be overriden:
_load(self)
Let’s the parameters already provided by backtrader.feed.DataBase
:
class DataBase(six.with_metaclass(MetaDataBase, dataseries.OHLCDateTime)): params = (('dataname', None), ('fromdate', datetime.datetime.min), ('todate', datetime.datetime.max), ('name', ''), ('compression', 1), ('timeframe', TimeFrame.Days), ('sessionend', None))
Having the following meanings:
-
dataname
is what allows the data feed to identify how to fetch the data. In the case of theCSVDataBase
this parameter is meant to be a path to a file or already a file-like object. -
fromdate
andtodate
define the date range which will be passed to strategies. Any value provided by the feed outside of this range will be ignored -
name
is cosmetic for plotting purposes -
timeframe
andcompression
are cosmetic and informative. They really play a role in Data Resampling and Data Replaying. -
sessionend
if passed (a datetime.time object) will be added to the datafeeddatetime
line which allows identifying the end of the session
Sample binary datafeed
backtrader
already defines a CSV datafeed (VChartCSVData
) for the
exports of VisualChart, but it is also possible to
directly read the binary data files.
Let’s do it (full data feed code can be found at the bottom)
Initialization
The binary VisualChart data files can contain either daily (.fd extension) or
intraday data (.min extension). Here the informative parameter timeframe
will be used to distinguish which type of file is being read.
During __init__
constants which differ for each type are set up.
def __init__(self): super(VChartData, self).__init__() # Use the informative "timeframe" parameter to understand if the # code passed as "dataname" refers to an intraday or daily feed if self.p.timeframe >= TimeFrame.Days: self.barsize = 28 self.dtsize = 1 self.barfmt = 'IffffII' else: self.dtsize = 2 self.barsize = 32 self.barfmt = 'IIffffII'
Start
The Datafeed will be started when backtesting commences (it can actually be started several times during optimizations)
In the start
method the binary file is open unless a file-like object has
been passed.
def start(self): # the feed must start ... get the file open (or see if it was open) self.f = None if hasattr(self.p.dataname, 'read'): # A file has been passed in (ex: from a GUI) self.f = self.p.dataname else: # Let an exception propagate self.f = open(self.p.dataname, 'rb')
Stop
Called when backtesting is finished.
If a file was open, it will be closed
def stop(self): # Close the file if any if self.f is not None: self.f.close() self.f = None
Actual Loading
The actual work is done in _load
. Called to load the next set of data, in
this case the next : datetime, open, high, low, close, volume, openinterest. In
backtrader
the “actual” moment corresponds to index 0.
A number of bytes will be read from the open file (determined by the constants
set up during __init__
), parsed with the struct
module, further
processed if needed (like with divmod operations for date and time) and stored
in the lines
of the data feed: datetime, open, high, low, close, volume,
openinterest.
If no data can be read from the file it is assumed that the End Of File (EOF) has been reached
False
is returned to indicate the fact no more data is available
Else if data has been loaded and parsed:
True
is returned to indicate the loading of the data set was a success
def _load(self): if self.f is None: # if no file ... no parsing return False # Read the needed amount of binary data bardata = self.f.read(self.barsize) if not bardata: # if no data was read ... game over say "False" return False # use struct to unpack the data bdata = struct.unpack(self.barfmt, bardata) # Years are stored as if they had 500 days y, md = divmod(bdata[0], 500) # Months are stored as if they had 32 days m, d = divmod(md, 32) # put y, m, d in a datetime dt = datetime.datetime(y, m, d) if self.dtsize > 1: # Minute Bars # Daily Time is stored in seconds hhmm, ss = divmod(bdata[1], 60) hh, mm = divmod(hhmm, 60) # add the time to the existing atetime dt = dt.replace(hour=hh, minute=mm, second=ss) self.lines.datetime[0] = date2num(dt) # Get the rest of the unpacked data o, h, l, c, v, oi = bdata[self.dtsize:] self.lines.open[0] = o self.lines.high[0] = h self.lines.low[0] = l self.lines.close[0] = c self.lines.volume[0] = v self.lines.openinterest[0] = oi # Say success return True
Other Binary Formats
The same model can be applied to any other binary source:
-
Database
-
Hierarchical data storage
-
Online source
The steps again:
-
__init__
-> Any init code for the instance, only once -
start
-> start of backtesting (one or more times if optimization will be run)This would for example open the connection to the database or a socket to an online service
-
stop
-> clean-up like closing the database connection or open sockets -
_load
-> query the database or online source for the next set of data and load it into thelines
of the object. The standard fields being: datetime, open, high, low, close, volume, openinterest
VChartData Test
The VCharData
loading data from a local “.fd” file for Google for the
year 2006.
It’s only about loading the data, so not even a subclass of Strategy
is
needed.
from __future__ import (absolute_import, division, print_function, unicode_literals) import datetime import backtrader as bt from vchart import VChartData if __name__ == '__main__': # Create a cerebro entity cerebro = bt.Cerebro(stdstats=False) # Add a strategy cerebro.addstrategy(bt.Strategy) # Create a Data Feed datapath = '../datas/goog.fd' data = VChartData( dataname=datapath, fromdate=datetime.datetime(2006, 1, 1), todate=datetime.datetime(2006, 12, 31), timeframe=bt.TimeFrame.Days ) # Add the Data Feed to Cerebro cerebro.adddata(data) # Run over everything cerebro.run() # Plot the result cerebro.plot(style='bar')
VChartData Full Code
from __future__ import (absolute_import, division, print_function, unicode_literals) import datetime import struct from backtrader.feed import DataBase from backtrader import date2num from backtrader import TimeFrame class VChartData(DataBase): def __init__(self): super(VChartData, self).__init__() # Use the informative "timeframe" parameter to understand if the # code passed as "dataname" refers to an intraday or daily feed if self.p.timeframe >= TimeFrame.Days: self.barsize = 28 self.dtsize = 1 self.barfmt = 'IffffII' else: self.dtsize = 2 self.barsize = 32 self.barfmt = 'IIffffII' def start(self): # the feed must start ... get the file open (or see if it was open) self.f = None if hasattr(self.p.dataname, 'read'): # A file has been passed in (ex: from a GUI) self.f = self.p.dataname else: # Let an exception propagate self.f = open(self.p.dataname, 'rb') def stop(self): # Close the file if any if self.f is not None: self.f.close() self.f = None def _load(self): if self.f is None: # if no file ... no parsing return False # Read the needed amount of binary data bardata = self.f.read(self.barsize) if not bardata: # if no data was read ... game over say "False" return False # use struct to unpack the data bdata = struct.unpack(self.barfmt, bardata) # Years are stored as if they had 500 days y, md = divmod(bdata[0], 500) # Months are stored as if they had 32 days m, d = divmod(md, 32) # put y, m, d in a datetime dt = datetime.datetime(y, m, d) if self.dtsize > 1: # Minute Bars # Daily Time is stored in seconds hhmm, ss = divmod(bdata[1], 60) hh, mm = divmod(hhmm, 60) # add the time to the existing atetime dt = dt.replace(hour=hh, minute=mm, second=ss) self.lines.datetime[0] = date2num(dt) # Get the rest of the unpacked data o, h, l, c, v, oi = bdata[self.dtsize:] self.lines.open[0] = o self.lines.high[0] = h self.lines.low[0] = l self.lines.close[0] = c self.lines.volume[0] = v self.lines.openinterest[0] = oi # Say success return True