Creating a new list when a condition with a pair of values ββand a list is met
The very first question ever here. I've tried to find a solution for about a week now, but I finally have to ask. I am also open to suggestions on the title of this question.
I am using python3
I have a csv file (legend.csv) that contains 2 headers (keys), one for numbers and one for abbreviations.
Each abbr has a corresponding number and this is represented in the csv file.
I also have a list of names (list.txt), the first part of the names is usually an abbreviation of some type.
Program idea: I want to parse a csv file and add a number corresponding to abbr in the names from list.txt. The output should be a new text file if possible.
example of list.txt:
addg-stuff
cbdd-stuff
abc-stuff
add-stuff
example of legend.csv:
number,abbr
0001,addg
0002,cbdd
0003,abc
0004,add
example of desired output:
0003-abc-stuff
0001-addg-stuff
0004-add-stuff
0002-cbdd-stuff
the following finds abbr, but I'm stuck on how to add the corresponding number to the name. Easiest way to cross-reference a CSV file with a text file for common strings
The link above is where I found how to pull the relevant lines, but not sure where to go from here.
import csv
with open("legend.csv") as csvfile:
reader = csv.reader(csvfile)
searchstring = {row[1] for row in reader}
num = {row[0] for row in reader}
with open("list.txt") as txtfile:
for names in txtfile:
for i in searchstrings:
if i in name:
matching = (name) #not sure where to go from here. If matching is printed, the names are found that contain the abbr.
Definitely new to this, just started messing around with python for a month or so. Any help would be much appreciated, especially if you have good resources for situations like this or python in general.
source to share
You can try this:
import csv
f1 = open('legend.csv')
f1 = csv.reader(f1) #splitting at instances of commas
f1 = list(f1)
f2 = open('list.txt').read().splitlines() #reading every line in the txt file
for i in f2:
for b in f1[1:]:
if i.split("-")[0] == b[1]:
print str(b[0])+"-"+i
Output:
0001-addg-stuff
0002-cbdd-stuff
0003-abc-stuff
0004-add-stuff
In a double for-loop, the algorithm takes a line from a txt file and then a line from a csv file. Note that f1[1:]
this is sorting the list. This means that we start after the header in the csv file, which doesn't help us in solving the problem. From there, the algorithm tries to determine if the abbreviation is contained in the first part of the string, in which case it is stored as i
. If so, the number and line are printed in the style of the desired output.
source to share
sets
do not have any implicit ordering. When you create sets, you lose match between the indices. Assuming your acronyms are unique, you can create a list <name : number>
.
lookup = {row[1] : row[0] for row in reader }
This also has the added benefit of only reading your csv once.
You can easily check the membership of the dictionary using the keyword in
. Your code for looking up names just becomes:
matches = []
with open("list.txt") as txtfile:
for name in txtfile:
if name in lookup:
matches.append((name, lookup[name])) # this will append (name, num) pairs
If you want to condense your code even more, you can use a list comprehension, for example:
with open("list.txt") as txtfile:
matches = [(name.rstrip(), lookup[name.split('-')[0]]) for name in txtfile if name.split('-')[0] in lookup]
Quite printing it gives:
[('addg-stuff', '0001'),
('cbdd-stuff', '0002'),
('abc-stuff', '0003'),
('add-stuff', '0004')]
source to share