Previously, I had quite a complex bit of python code to read Excel files and upload them to MySQL. I have now retooled the way in which I load files from the Australian Bureau of Statistics (ABS) to MySQL using Python pandas. The code is much simpler.
First, I store my MySQL database username and password in a file MysqlConnect.py (it is used by a number of different programs). It lives in the bin directory (../bin from where I do my work). And, just in case you are wondering: no, it is not my password.
host = 'localhost' user = 'root' password = 'BigRedCar' database = 'dbase1'
Now let's move on to the function to load ABS files into MySQL. It lives in the bin directory (../bin from where I do my work), in a file named LoadABSToMySQL.py.
import pandas as pd
import pymysql
from sqlalchemy import create_engine
import os.path
import re
# local imports - a file that contains database login details
import MysqlConnect as MSC
def LoadABSToMySQL(pathName):
""" Read an Excel file from the Australian Bureau of Statistics
and load it into a MySQL database"""
# --- 1 --- open MySQL
s = 'mysql+pymysql://'+MSC.user+':'+MSC.password+'@'+MSC.host+'/'+MSC.database
engine = create_engine(s)
# --- 2 --- identify proposed table name from file name
(head,tail) = os.path.split(pathName)
tail = re.split('\.', tail)
tablename = tail[0]
# --- 3 --- open the XL file
wb = pd.ExcelFile(pathName)
# --- 4 --- load XL workbooks into a single DataFrame
df = pd.DataFrame()
for name in wb.sheet_names:
# -- ignore junk
if not 'Data' in name:
continue
# -- read
tmp = wb.parse(sheetname=name, header=9, index_col=0, na_values=['', '-', ' '])
# -- amalgamate
df = pd.merge(left=df, right=tmp, how='outer', left_index=True, right_index=True)
tmp = None
# --- 5 --- write this DataFrame to MySQL
df.to_sql(tablename, engine, index=True, if_exists='replace')
Finally, an example code snippet to load some of the ABS National Account files to MySQL. This files sits in my national accounts directory and has the rather unimaginative name py-load.py. The ABS Microsoft Excel files live in the ./raw-data sub-directory.
import sys
sys.path.append( '../bin' )
from LoadABSToMySQL import LoadABSToMySQL
dataDirectory = './raw-data/'
dataFiles = [
'5206001_key_aggregates',
'5206002_expenditure_volume_measures',
'5206003_expenditure_current_price',
'5206004_expenditure_price_indexes',
'5206006_industry_gva',
'5206008_household_final_consumption_expenditure',
'5206022_taxes',
'5206023_social_assistance_benefits',
'5206024_selected_analytical_series'
]
dataSuffix = '.xls'
for f in dataFiles :
LoadABSToMySQL(dataDirectory + f + dataSuffix)
To run this python load file, I have a BASH shell script, which I use on my iMac. It has another unimaginative name: run-load.sh.
#!/bin/bash # mac os x fix ... cd "$(dirname "$0")" python ./py-load.py
No comments:
Post a Comment