MoreNotesToSelf

Technology, Finance, and Life

Posts Tagged ‘matplotlib’

Use python and sqlite3 to build a database

Posted by DK on August 5, 2009

The previous post showed how matplotlib can pull market data from Yahoo. Using some python-fu, you can easily create CSV files for given stocks. CSV files are great as they are easy to parse and don’t require a lot of overhead (in terms of setting things up, you can just open the file directly). Of course, Yahoo, Google, and others offer decent tools to view stock prices, so creating CSV files might be a bit redundant. Furthermore, if you really want to go whole hog and pull prices for thousands of assets, I imagine one would have to think carefully about an appropriate filesystem.

On the other hand, python includes a built in wrapper for sqlite, the ubiquitous file-based database that’s embedded in so many mobile and desktop applications. Sqlite can be run on any platform and is a completely open source, flyweight piece of software (which explains it’s popularity). This post will outline how to create a simple two table sqlite database to store stock prices.

First if all, sqlite has it’s own interpreter you can access from the command line, so if you’re an sql cowboy, you can instantly execute sql statements from the prompt to explore a given database. Here are a few examples:

The .help command lists some of the sqlite specific commands available at the command line.
sqlite_help.jpg
The .schema command prints the tables in the database (just as you might think).
sqlite_schema.jpg
You can also input sql statements directly.
sqlite_select.jpg
Anyway, enough about sqlite itself, let’s get to the python that built the database above. The attached script used four functions to package data for a defined set of tickers, using quotes_historical_yahoo(), and inserts the data into an sqlite database. The key is using a dictionary of tuples to pass around data.

stocks = {'data':[('GOOG', 'stock'), ('AAPL','stock')],
 'headers':('ticker', 'tag')}

In the example above, the ‘data’ key references a list of tuples that provide information on the ticker and type of asset in the tuple. The ‘header’ key references the table field names (as defined in the database schema) associated with the ‘data’ key. The same structure is used to write market data to the database, where the ‘data’ key refers to numeric data (e.g., foreign keys, date string, open, close, etc.) and the ‘header’ key refers to the formal fields to which each piece of data is associated.

raw_quotes = quotes_historical_yahoo(ticker, start, end) #list of tuples
data = []
for quote in raw_quotes:
 date_raw = datetime.datetime.fromordinal(int(quote[0]))
 year, month, day = date_raw.year, date_raw.month, date_raw.day
 date_string = str(year)+'-'+str(month)+'-'+str(day)
 record = (f_key, quote[0], date_string, year, month, day,
 quote[1], quote[2], quote[3], quote[4], quote[5])
 data.append(record)    

headers = ('asset_id',
 'gregorian_day',
 'date_string',
 'year',
 'month',
 'day',
 'open',
 'close',
 'high',
 'low',
 'volume')
return {'data':data, 'headers':headers}

I use a fair amount of string substitution in the code, which isn’t strictly recommended, but I’m not particularly worried about security. Anyway, the write2sql() function needs to know how many fields are associated with the data. Rather than create two separate functions for adding stocks to the asset table and market data to the prices table, I used string operations to expand or contract the substitution marks to match the number of headers. I’m sure there are more elegant ways to do it, but, hey, it works.

Finally, I’ll plug the Sqlite Manager add-on for Firefox. It’s a basic and easy way to inspect sqlite files.
SQLiteManager.jpg

Here’s the complete code:


'''
YahooSqlite.py

This module pulls stock price history for designated tickers using matplotlib.
Matplotlib.finance returns a tuple (d, open, close, high, low, volume).

The functions below write data to a sqlite database using the sqlite3 module.
'''

from matplotlib.finance import quotes_historical_yahoo
import sqlite3
import datetime
import os

date1 = datetime.datetime(2009,1,1)
date2 = datetime.datetime.now()
path = os.path.expanduser('~') + '/Dev/Data/AssetPrices/'
db = path + 'SQLite/assetprices.sqlite'

stocks = {'data':[('GOOG', 'stock'), ('AAPL','stock')],
 'headers':('ticker', 'tag')}
etfs = {'data':[('DBP', 'etf'), ('TIP','etf'),('DLS','etf'),('BND','etf'),
 ('EWC','etf'),('VBR','etf'),('RWX','etf'),('SPY','etf')],
 'headers':('ticker','tag')}

#Create a function to create a csv file for a given ticker.

def package_data(db=None, ticker=None, start=None, end=None):
 '''
 package_data() almost the same as write2csv() without the CSV section.
 It uses quotes_historical_yahoo() to create a data set for a given stock's
 price history. Date_string, Year, month, and day fields are included for
 added flexibility. Returns a dictionary of tuples.
 '''
 con = sqlite3.connect(db)
 c = con.cursor()
 sql = "select asset_id from assets where ticker='%s'" % ticker #change this!
 c.execute(sql)
 id_list = c.fetchall()
 if len(id_list)==1:
 f_key = id_list[0][0]
 else:
 print 'Error: asset has %s IDs' % str(len(key_list)) 

 raw_quotes = quotes_historical_yahoo(ticker, start, end) #list of tuples
 data = []
 for quote in raw_quotes:
 date_raw = datetime.datetime.fromordinal(int(quote[0]))
 year, month, day = date_raw.year, date_raw.month, date_raw.day
 date_string = str(year)+'-'+str(month)+'-'+str(day)
 record = (f_key, quote[0], date_string, year, month, day,
 quote[1], quote[2], quote[3], quote[4], quote[5])
 data.append(record)    

 headers = ('asset_id',
 'gregorian_day',
 'date_string',
 'year',
 'month',
 'day',
 'open',
 'close',
 'high',
 'low',
 'volume')
 return {'data':data, 'headers':headers}

def write2sqlite(db=None, table_name=None, data_dict=None):
 '''
 write2sqlite() takes a given dictionary of tuples and writes them to a
 designated sqlite database table.
 '''
 header_string = ', '.join([header for header in data_dict['headers']])
 marks = len(data_dict['headers'])*'?,'
 marks_string = marks[:-1]
 con = sqlite3.connect(db)
 c = con.cursor()
 c.executemany('insert into ' + table_name + ' (' + header_string + ') '
 'values (' + marks_string +')', data_dict['data'])
 con.commit()
 # don't forget to add 'date modified' field at some pt
 c.close()
 return

#Need to create a database to work with.

def build_database(db=None, assets=None, start=None, end=None):
 '''
 build_database() creates an sqlite database populated by stocks defined in
 the code below. Change date1 to adjust how much price history you want.
 '''
 con = sqlite3.connect(db)
 c = con.cursor()
 c.execute('''CREATE TABLE assets
 (asset_id integer not null primary key,
 ticker text,
 tag text)
 ''')

 c.execute('''CREATE TABLE prices
 (price_id integer not null primary key,
 asset_id integer not null,
 gregorian_day integer,
 date_string date,
 year integer,
 month integer,
 day integer,
 open real,
 close real,
 high real,
 low real,
 volume integer,
 FOREIGN KEY (asset_id) REFERENCES assets(asset_id))
 ''')
 con.commit()
 c.close()

 #add stocks to asset table
 write2sqlite(db=db, table_name='assets', data_dict=assets)

 #package price data and write to price table
 for asset in assets['data']:
 prices = package_data(db=db, ticker=asset[0], start=start, end=end)
 write2sqlite(db=db, table_name='prices', data_dict=prices)
 return

def remove_existing_db(path=None):
 if os.path.exists(path) == True:
 os.remove(path)
 print 'old db removed'
 else:
 pass

remove_existing_db(path=db)
build_database(db=db, assets=stocks, start=date1, end=date2)

if (__name__ == '__main__'):
 main()

Posted in Python, Tech | Tagged: , , , , | 1 Comment »

quotes_historical_yahoo from matplotlib.finance

Posted by DK on August 3, 2009

I thought matplotlib was purely a visualization tool,  but the rabbit hole is deeper than I thought. One handy module I recently ran across is matplotlib.finance. It isn’t featured in the documentation (as far as I know), but contains functions that allow the user to pull stock prices from yahoo as a list of tuples or as array objects.

The ‘quotes_historical_yahoo’ function pulls price and volume data given a ticker and date range. Here’s an iPython example:

In [1]: from matplotlib.finance import quotes_historical_yahoo
In [2]: import datetime
In [3]: ticker = 'SPY'
In [4]: start_date = datetime.datetime(2009, 7, 1)
In [5]: end_date = datetime.datetime(2009, 7, 30)
In [6]: SPYlist = quotes_historical_yahoo(ticker, start_date, end_date)

SPYlist now contains a list of tuples that represent daily price data (date, open, close, high, low, volume). Let’s just take a look at the first two records using standard Python slicing syntax:

In [8]: SPYlist[0:2]
Out[8]:
[(733589.0,
92.340000000000003,
92.329999999999998,
93.230000000000004,
92.209999999999994,
173041100),
(733590.0,
91.129999999999995,
89.810000000000002,
92.359999999999999,
89.760000000000005,
212309900)]

The first number in each tuple is the date, but matplotlib pulls the date as a gregorian ordinal number. To covert it back to a datetime object, you need to use datetime.datetime.fromordinal. Note that python expects the ordinal to be an integer, not a float (as generated by matplotlib). The function will still work, but you’ll get a warning.

In [10]: datetime.datetime.fromordinal(int(SPYlist[0][0]))
Out[10]: datetime.datetime(2009, 7, 1, 0, 0)

It’s also possible to use the ‘asobject’ optional parameter to pull the data as array objects. This essentially splits the data from rows(tuples) to columns(arrays). Line 12 below shows the different attributes of the SPYobjects variable. As you can see, there are now array objects (e.g. SPYobjects.close) for each field of data.

In [11]: SPYobjects = quotes_historical_yahoo(ticker, start_date, end_date, asobject=True)
In [12]: SPYobjects.
SPYobjects.__class__   SPYobjects.__init__    SPYobjects.close       SPYobjects.high        SPYobjects.open
SPYobjects.__doc__     SPYobjects.__module__  SPYobjects.date        SPYobjects.low         SPYobjects.volume

In [12]: SPYobjects.close
Out[12]:
array([ 92.33,  89.81,  89.8 ,  88.06,  88.  ,  88.17,  87.96,  90.1 ,
90.61,  93.26,  93.11,  94.13,  95.13,  95.57,  95.55,  97.66,
98.06,  98.35,  97.89,  97.65,  98.67])

Very convenient! Arrays, of course, can also be sliced.

In [13]: SPYobjects.close[0:5]
Out[13]: array([ 92.33,  89.81,  89.8 ,  88.06,  88.  ])

There are all sorts of ‘easter eggs’ in matplotlib!

Posted in Python, Tech | Tagged: , , | Leave a Comment »