Analyze bank statements in .xlsx format with Python and openpyxl

Disclaimer

Achtung! This case was written by a beginner: the solutions given can be overly crude, and the syntactic solutions are lacking in elegance.





Nowhere in the practice of a lawyer is there such an urgent need for data analysis as in bankruptcy cases: in such cases, it is sometimes necessary to analyze large amounts of information from bank statements as soon as possible in order to find suspicious transactions or restore destroyed / hidden / corrected accounting records.





Since most of the statements are provided by banks in the format of good old excel tables, there was a desire to automate the search for the necessary information in them. It was required to develop such a tool that allows:





  1. Open the required excel-table and carry out line-by-line and sheet-by-sheet search for the value by 1-3 keywords, without bothering with the sorting and filtering tools of MO Excel itself. A line-by-line search is preferred to allow the entire transaction of interest to be displayed and search across all columns.





  2. Having found rows with keywords, transfer their values ​​to a new table along with the number of the corresponding row.





The Python module openpyxl was chosen as a tool for such operations.





import openpyxl 
from openpyxl import Workbook

bankstatetment = input('     ') #      
#     
obj1= input('   ') 
obj2= input('   ')
obj3= input('   ')
wb = openpyxl.load_workbook(bankstatetment) #   /
results_string_list = list() # ,        
results_stringrow_list = list() # ,        
      
      



, openpyxl - , . , , , " 1", " 2", " 3".





. : results_string_list results_stringrow_list . , , .. .





sheet = wb['1'] #    .
for row in sheet: #   
    string = ''
    for cell in row:
        string = string + str(cell.value) + ' ' #    
        string_row = str(cell.row)+ ' '#  
    if obj1 in string: 
        results_string_list.append (string) #     
        results_stringrow_list.append (string_row) #     
    if obj2 in string:
        results_string_list.append (string)
        results_stringrow_list.append (string_row)
    if obj3 in string:
        results_string_list.append (string)
        results_stringrow_list.append (string_row)
      
      



. :





wb = Workbook() #  
ws = wb.active #   
a1 = ws['A1']
a1.value = ' ' #    ""
b1 = ws['B1']
b1.value = '   ' #    "B"
      
      



, : , , , openpyxl , . , , .





a2 = ws['A2']
a3 = ws['A3']
a4 = ws['A4']
a5 = ws['A5']
...
b2 = ws['B2']
b3 = ws['B3']
b4 = ws['B4']
b5 = ws['B5']
      
      



for "" .





for i in results_string_list[0:1]:
    a2.value = i
for i in results_string_list[1:2]:
    a3.value = i
for i in results_string_list[2:3]:
    a4.value = i
for i in results_string_list[3:4]:
    a5.value = i
...
for i in results_stringrow_list[0:1]:
    b2.value = i
for i in results_stringrow_list[1:2]:
    b3.value = i
for i in results_stringrow_list[2:3]:
    b4.value = i
for i in results_stringrow_list[3:4]:
    b5.value = i
...
wb.save(' .xlsx') 
      
      



, openpyxl, , , .





PS When using openpyxl, the search for keywords was not case sensitive, so this problem did not have to be solved.








All Articles