How do I handle a small data grid in Python?
I need to process all MP3 files in a folder by reading / changing some ID3 tags as well as getting file size specifics etc. The end goal is to create an RSS file so that this MP3 is a regular podcast. I see maybe there could be up to 200 files (rows?) And 5 or 6 pieces of data (columns?) For each file. You need to read all the data, use the data to determine the sort order, and create an rss / xml file. Unclear best approach in Python regarding the way data is processed.
Saw this idea of ββcode for a dictionary of dictionaries, but it looks a little awkward?
mydict = {'MP3_File_1.mp3':
{'SIZE': '123456789','MODDATE': '20120508', 'TRKNUM': '152'},
'MP3_File_2.mp3':
{'SIZE': '45689654', 'MODDATE': '20120515', 'TRKNUM': '003'},
'MP3_File_3.mp3':
{'SIZE': '98754651', 'MODDATE': '20130101', 'TRKNUM': '062'}}
Either a real database or pyTables seems like overkill. I'm also considering creating a custom class, but I don't have enough experience in Python (yet). Is there a module / best practice that I am missing?
source to share
I would create a custom class containing one MP3 file, one variable for each field. This way, you can easily write functions to change these fields. Then I would construct one object per file (using the filename as a parameter to the constructor and using the constructor to populate the fields) and put all the objects in a list. This class will contain the necessary function for sorting objects. Finally, I would write a custom function to generate an XML file from this list.
This is not the only way to do it, but I do it.
class Mp3file(object):
def __init__(self, filename):
# read the file
self.name = filename
self.size = ...
self.moddate = ...
self.track_num = ...
...
def to_xml(self):
return ...
def __lt__(self):
....
def __eq__(self):
....
...
mp3list = []
for filename in directory:
mp3list.append(Mp3file(filename))
def mp3list_to_xml(mylist):
# write preamble
for mf in sorted(mylist):
x = mf.to_xml()
# Add x to xml
# write footer
source to share
The list of dictionaries makes more sense to me.
mp3s = [
{'NAME': 'lalala.mp3', 'SIZE': '123456789','MODDATE': '20120508', 'TRKNUM': '152'},
{'NAME': 'lelele.mp3', 'SIZE': '45689654', 'MODDATE': '20120515', 'TRKNUM': '003'},
{'NAME': 'lululu.mp3', 'SIZE': '98754651', 'MODDATE': '20130101', 'TRKNUM': '062'}]
If you want to keep the sorting simple:
sor = sorted(mp3s, key=lambda x: x['NAME'])
source to share
Why not just use sqlite , it comes bundled for free.
You get sorting and searching built in, and the databases are one-off, so additional processes are required to manage.
Also the chances are that as your code evolves, you might want to add additional attributes to you and then a dict, etc. become unmanageable.
It helps to be able to choose a database and think
Yes, my data looks good - the next part should be simple.
source to share
List NamedTuple
s , perhaps? Tuple should be one of the least memorable types in Python AFAIK.
source to share