Sunday, March 29, 2015

Python (ArcPy) - Find Non-Matching rows in-between 2 text files


Lets say we have generated a 2 csv from different runs from the mentioned link.

This will create 2 csv file and each csv file will have missing rows GDB, FD, FC from comparison between 2 files one at a time.

Here is the Python (ArcPy) code....


import arcpy, os

fileA = arcpy.GetParameterAsText(0)
fileB = arcpy.GetParameterAsText(1)
outputDir = arcpy.GetParameterAsText(2)

fileADict = {}
fileAUniqueList = []
fileBUniqueList = []

# Loop through fileA rows, adding each to a dictionary
fileAObj = open(fileA, 'r')
for fileARow in fileAObj:
    fileADict[fileARow] = False
fileAObj.close()

# Loop through fileB, checking against dictionary
fileBObj = open(fileB, 'r')
for fileBRow in fileBObj:
    # If match in fileA dictionary, set dictionary match value to true
    if fileBRow in fileADict.keys():
        fileADict[fileBRow] = True
    # If no match, then unique to B.
    else:
        fileBUniqueList.append(fileBRow)
fileBObj.close()

# Loop through fileA dictionary, checking match value. Unique to A if match is false 
for key, match in fileADict.iteritems():
    if not match:
        fileAUniqueList.append(key)

# Write files containing unique values
fileAUnique = open(outputDir + os.sep + "uniquetoA.txt", "w")
for fileAItem in fileAUniqueList:
    fileAUnique.write(fileAItem)
fileAUnique.close()

fileBUnique = open(outputDir + os.sep + "uniquetoB.txt", "w")
for fileBItem in fileBUniqueList:
    fileBUnique.write(fileBItem)

fileBUnique.close()

----------------------------------------------------------------------------------------------------------------------------------------------

#Note that in the above, arcpy isn't actually necessary for the core code. I use it here because I like the ArcGIS Toolbox method for obtaining initial parameters (and, in fact, the way the code is written it only runs properly through an ArcGIS Toolbox). However, if you prefer, you could always change it to use simple command line or even something like tkinter to get the input files and output location.
#Also note that when iterating through the dictionary items I used iteritems() because I'm using a 2.X version of python. If using python 3.X I believe you may need to change that to items().


import arcpy, os

fileA = r'T:\SanFrancisco\Legacy\Projects\CHSR\B-P\GIS\GDBs\GDB2csv_SFO2.txt'
fileB = r'I:\GDBs\GDB2csv_SFO2.txt'
outputDir = r'I:\GDBs'

fileADict = {}
fileAUniqueList = []
fileBUniqueList = []

# Loop through fileA rows, adding each to a dictionary
fileAObj = open(fileA, 'r')
for fileARow in fileAObj:
    fileADict[fileARow] = False
fileAObj.close()

# Loop through fileB, checking against dictionary
fileBObj = open(fileB, 'r')
for fileBRow in fileBObj:
    # If match in fileA dictionary, set dictionary match value to true
    if fileBRow in fileADict.keys():
        fileADict[fileBRow] = True
    # If no match, then unique to B.
    else:
        fileBUniqueList.append(fileBRow)
fileBObj.close()

# Loop through fileA dictionary, checking match value. Unique to A if match is false 
for key, match in fileADict.iteritems():
    if not match:
        fileAUniqueList.append(key)

# Write files containing unique values
fileAUnique = open(outputDir + os.sep + "uniquetoA.txt", "w")
for fileAItem in fileAUniqueList:
    fileAUnique.write(fileAItem)
fileAUnique.close()

fileBUnique = open(outputDir + os.sep + "uniquetoB.txt", "w")
for fileBItem in fileBUniqueList:
    fileBUnique.write(fileBItem)

fileBUnique.close()

No comments:

Post a Comment