Python 3.4 filtering text file into lists

Question

Python 3.4 filtering text file into lists

I am having trouble filtering a .txt file into sublists, which I can then turn into a directory. Sample from text.txt
A2.-B4-...C4-.-.D3-..E1.F4..-.G3--.H4....75--...85---..95----.05-----.6.-.-.-,6--..--?6..--..!5..--.

with no spaces or lines, it is basically a line of text.
A2.- means that the character "A" has 2 characters in the morse -code, and they are .- etc.

What I would like to do is split this long string into sublists so that I can then zip it into a directory, which I can then use to create a morse-code translator. What I would like to do: make a keyList containing the keys A, B, C, ...,?,.,
And another list of valuesList, which contains the values for the keys.
However, since the keys are not letters that have problems filtering all over the file.
What I have tried:

import re
r = open("text.txt", "r")
ss = r.read()    
p = re.compile('\w'+'\w')
keyList = p.findall(ss)
ValueList = p.split(ss)
print(keyList)
print(ValueList)

keyList = ['A2', 'B4', 'C4', 'D3',..., '75', '85', '95', '05']
ValueList = ['', '.-', '-...', '-.-.', '-..', space , !5..--.']

As you can see, the valueist will not split properly because '\ w' + '\ w' will only match alpha numeric characters. I tried changing the argument to re.compile but couldn't find anything that worked. Any help? is re.compiled the best way to do this or is there another way to filter through text?

EDIT: expected / desired output:

keyList = ['A','B','C','D',...,'.','?',',']
ValueList = ['.-','-...','-.-.','-..',...,'.-.-.-','..--..','--..--']

+3

python

kroneckersdelta 01 dec. 14 at 17:46

source to share

4 answers

Cuadue · Answer 1 · 2014-12-01T18:55:55+0000

To create an encoder / decoder, you probably want to use dictionaries, not lists.

When it comes to parsing, a straight forward naive approach is probably best suited here.

result = {}
with open('morse.txt', 'r') as f:    
    while True:    
        key = f.read(1)                                                   
        length_str = f.read(1)                                            

        if len(key) != 1 or len(length_str) != 1:                         
            break                                                         

        try:                                                              
            length = int(length_str)                                      
        except ValueError:                                                
            break                                                         

        value = f.read(length)                                            

        if len(value) == length:                                          
            result[key] = value                                           

for k, v in result.items():
    print k, v

leads to:

Pierre alex · Answer 2 · 2014-12-01T18:59:07+0000

You can try this:

items = re.findall(r'(.\d)([\.-]+)', ss)
keys = [s[0][0] for s in items]
values = [s[1] for s in items]

I got:

>>> keys
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', '7', '8', '9', '0', ',', '?', '!']
values
['.-', '-...', '-.-.', '-..', '.', '..-.', '--.', '....', '--...', '---..', '----.', '-----.', '--..--', '..--..', '..--.']

darthbith · Answer 3 · 2014-12-01T19:10:21+0000

Like Cuadue's answer , I would use a loop to parse it, but I would do it in reverse:

morse_str = 'A2.-B4-...C4-.-.D3-..E1.F4..-.G3--.H4....75--...85---..95----.05-----.6.-.-.-,6--..--?6..--..!5..--.'
morse_list = list(morse_str)
morse_dict = {}
while morse_list:
    morse = ''
    while True:
        sym = morse_list.pop()
        try:
            int(sym)
        except ValueError:
            morse += sym
        else:
            key = morse_list.pop()
            morse_dict[key] = morse[::-1]
            break

Kasramvd · Answer 4 · 2014-12-01T19:17:20+0000

You can use positive prediction in regular expressions to find keys :

>>> s = 'A2.-B4-...C4-.-.D3-..E1.F4..-.G3--.H4....75--...85---..95----.05-----.6.-.-.-,6--..--?6..--..!5..--.'
>>> keys = re.findall(r'[\w|\W](?=\d\W)',s)
['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', '7', '8', '9', '0', '.', ',', '?', '!']

Regular expression visualization

Demo Debuggex

Since you have no alphabetic characters, such as !

and ,.

in your keys and values, you can not use jut one re

to obtain the expected values, you can use this template split()

to split the string based on your keys so that you get the expected value with one digit in beginning and then remove that digit with re.sub()

:

>>> values = [re.sub('\d','',i) for i in re.split(r'[\w|\W](?=\d)',s) if len(i)]
['.-', '-...', '-.-.', '-..', '.', '..-.', '--.', '...', '--..', '---.', '----', '-----', '.-.-.-', '--..--', '..--..', '..--.']

so it is important that you have the same len

for keys

and values

:

>>> len(keys)
16
>>> len(values)
16

and finally zip them:

>>> dict(zip(keys,values))
{'A': '.-', '!': '..--.', 'C': '-.-.', 'B': '-...', 'E': '.', 'D': '-..', 'G': '--.', 'F': '..-.', 'H': '...', ',': '--..--', '.': '.-.-.-', '0': '-----', '7': '--..', '9': '----', '8': '---.', '?': '..--..'}

Python 3.4 filtering text file into lists

More articles: