Extract Excel attributes to CSV
This is a python solution to pull specific cells from a bunch of excel files into one CSV file. The source excel files have a consistent format – that is, a given cell in every excel file contains the same information.
I wrote this script becauseĀ I needed one compiled table with spatial information for geocoding but the information existed across dozens of excel files. This script was used to pull project information including name and location coordinates out of each individual project information excel file.
So, it was either open each excel file and copy the information needed to a new table that I could geocode, or write a script to pull the information into one table so that I could geocode them all at simultaneously. With dozens of excel files, I decided to script it.
The script depends on the xlrd and glob modules. The xlrd library can be downloaded here.
The script below is commented thoroughly so I wont go into much detail, comment if you have specific questions.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | #import modules needed import glob import xlrd #target csv to wrtie lines to text=r"C:\Users\brian\Desktop\xls\00_combined.csv" #folder with source excel files dir = r"C:\Users\brian\Desktop\xls" # create list of excel files in source folder to pass to excel reader xls = glob.glob(dir+"/*.xls*") #create empty linefull = [] #populates the linefull list with the attributes needed from each excel file - iterates through each excel file in the xls list for path in xls: #new empty list for each excel file's attributes line = [] #open the current excel file book = xlrd.open_workbook(path) #target the first sheet first_sheet = book.sheet_by_index(0) #list the rows corresponding to the data needed in each excel file, 0-based index, ie row 5 is 4 rows = [9,20,21,22,44,45] #all values in same column (2 is C, A is 0, B is 1, C is 2) col = 2 #one attribute is not in column 2, that attribute is pulled in the following line line.append(first_sheet.cell_value(38,8)) #this row pulls the attribute for each cell in column 2 at each row in the rows list for row in rows: #pulls the attribute for the current row in the rows list line.append(first_sheet.cell_value(int(row),int(col))) #appends each list of attributes for the current excel file to the master linefull list linefull.append(line) #this loop formats and writes each line of attributes from the linefull list with open(text,"w") as text_file: #writes the column headers text_file.write("number,project,area,state,country,latitude,longitude") #loops through each list of attributes within the list of all excel file attributes for x in linefull: #new empty list that will contain each list of attributes in csv format newlist = [] #loops through each individual attribute for each excel file's list of attributes for y in x: #formats the attribute as string z = str(y) #passes the string conversion of each attribute to the newlist newlist.append(z) #creates a comma separated string of attributes for the current excel file t = ",".join(newlist) #print to confirm formatting print t #variable to write a new line and the comma separated attributes to the csv file ln = "\n"+t #writes the line to the csv/text file defined at the beginning of the script text_file.write(ln) |
Here’s the code without the comments, a little clearer to read the code.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | import glob import xlrd text=r"C:\Users\greer\Desktop\xls\00_combined.csv" dir = r"C:\Users\greer\Desktop\xls" xls = glob.glob(dir+"/*.xls*") linefull = [] for path in xls: line = [] book = xlrd.open_workbook(path) first_sheet = book.sheet_by_index(0) rows = [9,20,21,22,44,45] col = 2 line.append(first_sheet.cell_value(38,8)) for row in rows: line.append(first_sheet.cell_value(int(row),int(col))) linefull.append(line) with open(text,"w") as text_file: text_file.write("number,project,area,state,country,latitude,longitude") for x in linefull: newlist = [] for y in x: z = str(y) newlist.append(z) t = ",".join(newlist) print t ln = "\n"+t text_file.write(ln) |
Thanks!