Sunday, April 5

M1: surge in money supply - meaningful or meaningless

The last couple of months have seen a sudden jump in liquid money supply (M1).





Saturday, February 21

Load ABS files to MySQL

When I am working in R, I tend to have my working data in a MySQL database. I found that R did not always play nicely (and quickly) with complex Microsoft Excel files.

Previously, I had quite a complex bit of python code to read Excel files and upload them to MySQL. I have now retooled the way in which I load files from the Australian Bureau of Statistics (ABS) to MySQL using Python pandas. The code is much simpler.

First, I store my MySQL database username and password in a file MysqlConnect.py (it is used by a number of different programs). It lives in the bin directory (../bin from where I do my work). And, just in case you are wondering: no, it is not my password.

host     = 'localhost'
user     = 'root'
password = 'BigRedCar'
database = 'dbase1'

Now let's move on to the function to load ABS files into MySQL. It lives in the bin directory (../bin from where I do my work), in a file named LoadABSToMySQL.py

import pandas as pd
import pymysql
from sqlalchemy import create_engine
import os.path
import re

# local imports - a file that contains database login details
import MysqlConnect as MSC

def LoadABSToMySQL(pathName):
    """ Read an Excel file from the Australian Bureau of Statistics
        and load it into a MySQL database"""

    # --- 1 --- open MySQL
    s = 'mysql+pymysql://'+MSC.user+':'+MSC.password+'@'+MSC.host+'/'+MSC.database
    engine = create_engine(s)

    # --- 2 --- identify proposed table name from file name
    (head,tail) = os.path.split(pathName)
    tail = re.split('\.', tail)
    tablename = tail[0]

    # --- 3 --- open the XL file
    wb = pd.ExcelFile(pathName)

    # --- 4 --- load XL workbooks into a single DataFrame
    df = pd.DataFrame()
    for name in wb.sheet_names:

        # -- ignore junk
        if not 'Data' in name:
            continue

        # -- read
        tmp = wb.parse(sheetname=name, header=9, index_col=0, na_values=['', '-', ' '])

        # -- amalgamate
        df = pd.merge(left=df, right=tmp, how='outer', left_index=True, right_index=True)
        tmp = None

    # --- 5 --- write this DataFrame to MySQL
    df.to_sql(tablename, engine, index=True, if_exists='replace')

Finally, an example code snippet to load some of the ABS National Account files to MySQL. This files sits in my national accounts directory and has the rather unimaginative name py-load.py. The ABS Microsoft Excel files live in the ./raw-data sub-directory.

import sys
sys.path.append( '../bin' )

from LoadABSToMySQL import LoadABSToMySQL

dataDirectory = './raw-data/'
dataFiles = [
    '5206001_key_aggregates',
    '5206002_expenditure_volume_measures',
    '5206003_expenditure_current_price',
    '5206004_expenditure_price_indexes',
    '5206006_industry_gva',
    '5206008_household_final_consumption_expenditure',
    '5206022_taxes',
    '5206023_social_assistance_benefits',
    '5206024_selected_analytical_series'
]
dataSuffix = '.xls'

for f in dataFiles :
    LoadABSToMySQL(dataDirectory + f + dataSuffix)

To run this python load file, I have a BASH shell script, which I use on my iMac. It has another unimaginative name: run-load.sh.

#!/bin/bash

# mac os x fix ...
cd "$(dirname "$0")"

python ./py-load.py

Friday, February 13

Wednesday, February 11

An Extra Dry Baltic Index

The Baltic Dry Index provides a window on world trade. The view out the window ain't that good at the moment.



The Baltic Dry Index was at 663 on 5 December 2008 (its post-GFC low point).

It was at 554 on 9 February 2015. But no need to worry, it rebounded back up to 556 on 10 February.

Sunday, December 28

Updated cheat sheets: python, pandas and matplotlib

I have updated my cheat sheets for python, pandas and matplotlib.

You can find them here: bit.ly/python_cs

Sunday, October 5

Sunday, August 3

Pandas 0.14.1

I have just upgraded to Pandas 0.14.1.

It was a pain. At first, none of my graphics programs worked. It looks like a change to the API for parsing Microsoft Excel files was the problem. I am not sure whether my previous approach was wrong (but worked serendipitously), or the API was deliberately changed to break old code (an unusual practice for a point release). If someone knows, I'd appreciate something in the comments below.

What follows are the classes I use to upload Australian Bureau of Statistics (ABS) and Reserve Bank of Australia (RBA) data, with the updates to the parsing stage commented.

And yes, I use Python 2.7, not Python 3 (it's what comes with the Apple Mac).

### ABSExcelLoader.py
### written in python 2.7 and pandas 0.14.1

import pandas as pd
assert( pd.__version__ >= '0.14.1' )

class ABSExcelLoader:

    def load(self, pathName, freq='M', index_name=None, verbose=False):
        """return a dictionary of pandas DataFrame objects for
           each Data work sheet in an ABS Excel spreadsheet"""

        wb = pd.ExcelFile(pathName)
        returnDict = {}

        for name in wb.sheet_names:
            if not 'Data' in name:
                continue

            # ExcelFile.parse: API behaviour change with pandas 14.1
            #df = wb.parse(sheetname=name, skiprows=8, header=9, index_col=0, na_values=['', '-', ' '])
            df = wb.parse(sheetname=name, skiprows=9, header=0, index_col=0, na_values=['', '-', ' '])

            periods = pd.PeriodIndex(pd.Series(df.index), freq=freq)
            df.set_index(keys=periods, inplace=True)
            df.index.name = index_name
            returnDict[name] = df

            if verbose:
                print ("\nFile: '{}', sheet: '{}'".format(pathName, name))
                print (df.iloc[:min(5, len(df)), :min(5, len(df.columns))])

        return returnDict

### ABSExcelLoader.py
### written in python 2.7 and pandas 0.14.1

import pandas as pd
assert( pd.__version__ >= '0.14.1' )

class RBAExcelLoader:

    def load(self, pathName, freq='M', index_name=None, verbose=False):
        """return a pandas DataFrame for an RBA Excel spreadsheet"""
        wb = pd.ExcelFile(pathName)
        sheetname = 'Data'

        # ExcelFile.parse: API behaviour change with pandas 14.1
        #df = wb.parse(sheetname, skiprows=9, header=10, index_col=0, na_values=['', '-', ' '])
        df = wb.parse(sheetname, skiprows=10, header=0, index_col=0, na_values=['', '-', ' '])

        periods = pd.PeriodIndex(pd.Series(df.index), freq=freq)
        df.set_index(keys=periods, inplace=True)

        if verbose:
            print "\nFile: '{}', sheet: '{}'".format(pathName, sheetname)
            print 'Columns: {}'.format(df.columns.tolist())
            print 'Top left hand corner ...'
            print '------------------------'
            print df.iloc[:min(5, len(df)), :min(5, len(df.columns))]

        return df