Previously, I had quite a complex bit of python code to read Excel files and upload them to MySQL. I have now retooled the way in which I load files from the Australian Bureau of Statistics (ABS) to MySQL using Python pandas. The code is much simpler.
First, I store my MySQL database username and password in a file MysqlConnect.py (it is used by a number of different programs). It lives in the bin directory (../bin from where I do my work). And, just in case you are wondering: no, it is not my password.
host = 'localhost' user = 'root' password = 'BigRedCar' database = 'dbase1'
Now let's move on to the function to load ABS files into MySQL. It lives in the bin directory (../bin from where I do my work), in a file named LoadABSToMySQL.py.
import pandas as pd import pymysql from sqlalchemy import create_engine import os.path import re # local imports - a file that contains database login details import MysqlConnect as MSC def LoadABSToMySQL(pathName): """ Read an Excel file from the Australian Bureau of Statistics and load it into a MySQL database""" # --- 1 --- open MySQL s = 'mysql+pymysql://'+MSC.user+':'+MSC.password+'@'+MSC.host+'/'+MSC.database engine = create_engine(s) # --- 2 --- identify proposed table name from file name (head,tail) = os.path.split(pathName) tail = re.split('\.', tail) tablename = tail[0] # --- 3 --- open the XL file wb = pd.ExcelFile(pathName) # --- 4 --- load XL workbooks into a single DataFrame df = pd.DataFrame() for name in wb.sheet_names: # -- ignore junk if not 'Data' in name: continue # -- read tmp = wb.parse(sheetname=name, header=9, index_col=0, na_values=['', '-', ' ']) # -- amalgamate df = pd.merge(left=df, right=tmp, how='outer', left_index=True, right_index=True) tmp = None # --- 5 --- write this DataFrame to MySQL df.to_sql(tablename, engine, index=True, if_exists='replace')
Finally, an example code snippet to load some of the ABS National Account files to MySQL. This files sits in my national accounts directory and has the rather unimaginative name py-load.py. The ABS Microsoft Excel files live in the ./raw-data sub-directory.
import sys sys.path.append( '../bin' ) from LoadABSToMySQL import LoadABSToMySQL dataDirectory = './raw-data/' dataFiles = [ '5206001_key_aggregates', '5206002_expenditure_volume_measures', '5206003_expenditure_current_price', '5206004_expenditure_price_indexes', '5206006_industry_gva', '5206008_household_final_consumption_expenditure', '5206022_taxes', '5206023_social_assistance_benefits', '5206024_selected_analytical_series' ] dataSuffix = '.xls' for f in dataFiles : LoadABSToMySQL(dataDirectory + f + dataSuffix)
To run this python load file, I have a BASH shell script, which I use on my iMac. It has another unimaginative name: run-load.sh.
#!/bin/bash # mac os x fix ... cd "$(dirname "$0")" python ./py-load.py
No comments:
Post a Comment