Shared Chat•
Default (GPT-3.5)

Data Rearrangement Code (Python)

August 9, 2023
I have a large number of .xls files laid out in multiple directories. I need some code that will read all the data, rearrange it, and output the data in the desired arrangement in a format that Google Sheets can read, like .xls or .csv. I'd like the code to be in Go, Python, or Ruby. My data is currently organized as follows. I have directories named 2006-2007 through 2020-2021. Each directory has three files, instructional.xls, support.xls, and all.xls. In each file, after some header rows, column 2 consists of school district names. Each school district name is repeated 6 times. Every 3rd occurrence of a district name is in a row where column 4 says "Per Funded Pupil Count". What I would like is a unique sheet per expenditure type, where each sheet has a grid of all the school districts and all the school years (top level directory names) and the values in the grids are the $ amount spent per funded pupil for the given district, year, and expenditure type. The expenditure types are in rows 1-7, columns G and beyond, in each input sheet. For example, in sheets titled "all.xls", column H, rows 1-4 are blank, but row 5 says "Total", row 6 says "Support", and row 7 says "Services" so this represents the "Total Support Services" expenditure type. Would you kindly provide code that does this. If anything is unclear or ambiguous, please ask clarifying questions.
ChatGPT

Sure, I can provide you with Python code to achieve the data rearrangement and export it in a format that Google Sheets can read, such as .csv files. You'll need to install the pandas library, which is a powerful data manipulation library in Python.

python
import os import pandas as pd def process_files(directory): # Initialize a dictionary to store data for each expenditure type expenditure_data = {} for root, _, files in os.walk(directory): for file in files: if file.endswith('.xls'): full_path = os.path.join(root, file) # Read the Excel file into a DataFrame df = pd.read_excel(full_path) # Extract the year from the directory name year = os.path.basename(root) # Find the columns with the expenditure types expenditure_cols = df.columns[df.iloc[6] == "Services"].tolist() # Loop through each expenditure type column for col in expenditure_cols: expenditure_type = df.iloc[5][col] # Filter rows where the expenditure type is "Per Funded Pupil Count" filtered_df = df[df.iloc[:, 3] == "Per Funded Pupil Count"] # Create a unique key for the expenditure type key = f"{year}_{expenditure_type}" # Get the school districts and corresponding expenditure values districts = filtered_df.iloc[:, 1].values expenditures = filtered_df.iloc[:, col].values # Store the data in the dictionary if key not in expenditure_data: expenditure_data[key] = {} expenditure_data[key][year] = dict(zip(districts, expenditures)) return expenditure_data def export_to_csv(expenditure_data): for expenditure_type, data in expenditure_data.items(): # Create a DataFrame with the data df = pd.DataFrame(data) # Export the DataFrame to a CSV file output_file = f"{expenditure_type}.csv" df.to_csv(output_file) if __name__ == "__main__": input_directory = "/path/to/your/input/directory" expenditure_data = process_files(input_directory) export_to_csv(expenditure_data)

Before running the code, make sure to replace /path/to/your/input/directory with the actual path to the directory containing your .xls files organized as you described. The code will read all the .xls files, process the data, and create individual .csv files for each expenditure type with the desired arrangement that Google Sheets can read.

Please note that this code assumes that the expenditure types are always listed in rows 1-7, columns G and beyond, as you described. If the layout of the input files changes, some modifications might be necessary.

Continue this conversation
|Terms of use|Privacy policy