What config file format should I use for convenience strings of arbitrary bytes?
So I made a short Python script to run files on Windows with ambiguous extensions by looking at their magic number / file signature first :
I would like to compile it to an .exe to make the association easier (using bbfreeze or rewriting in C), but I need some convenient config file to specify the appropriate byte strings and program paths. Basically, I want to somehow put this information into a text file:
magic_numbers = {
# TINA
'OBSS': r'%PROGRAMFILES(X86)%\DesignSoft\Tina 9 - TI\TINA.EXE',
# PSpice
'*version': r'%PROGRAMFILES(X86)%\Orcad\Capture\Capture.exe',
'x100\x88\xce\xcf\xcfOrCAD ': '', #PSpice?
# Protel
'DProtel': r'%PROGRAMFILES(X86)%\Altium Designer S09 Viewer\dxp.exe',
# Eagle
'\x10\x80': r'%PROGRAMFILES(X86)%\EAGLE-5.11.0\bin\eagle.exe',
'\x10\x00': r'%PROGRAMFILES(X86)%\EAGLE-5.11.0\bin\eagle.exe',
'<?xml version="1.0" encoding="utf-8"?>\n<!DOCTYPE eagle ': r'%PROGRAMFILES(X86)%\EAGLE-5.11.0\bin\eagle.exe',
# PADS Logic
'\x00\xFE': r'C:\MentorGraphics\9.3PADS\SDD_HOME\Programs\powerlogic.exe',
}
(Hexadecimal bytes are just arbitrary bytes, not Unicode characters.)
My guess is that the .py file in this format works, but I have to leave it uncompiled and somehow import it into the compiled file, and there is still confusing extraneous content like {
and ,
to confuse by / screw up.
I looked at YAML and that would be great, except that the base64 binary is required first, which I don't want. I would prefer that the config file contains hex representations of the bytes. But also ASCII representations if it's all file signatures. Or maybe regular expressions.: D (In case the XML-based format can be written with a different number of spaces, for example)
Any ideas?
source to share
You already have the answer: YAML.
The above data stores textual representations of binary data; this will be fine for YAML, you just need to parse it correctly. Usually you use something from the binascii module; in this case, probably a function binascii.a2b_qp
.
magic_id_str = 'x100\x88\xce\xcf\xcfOrCAD '
magic_id = binascii.a2b_qp(magic_id_str)
To find out, I'll use the unicode character as an easy way to insert binary data into the REPL (Python 2.7):
>>> a = 'Φ'
>>> a
'\xce\xa6'
>>> binascii.b2a_qp(a)
'=CE=A6'
>>> magic_text = yaml.load("""
... magic_string: '=CE=A6'
... """)
>>> magic_text
{'magic_string': '=CE=A6'}
>>> binascii.a2b_qp(magic_text['magic_string'])
'\xce\xa6'
source to share
I would suggest doing it a little differently. I would separate these two parameters from each other:
- Magic number signature ===> mimetype
- mimetype ==> launcher
In the first part, I would use python-magic , a library with bindings to libmagic . You can use python-magic as a custom magic file:
import magic
m = magic.Magic(magic_file='/path/to/magic.file')
Your users can specify custom magic files that display magic numbers in mimetypes. The syntax for magic files is documented . Here is an example showing the magic file for the TIFF format:
# Tag Image File Format, from Daniel Quinlan (quinlan@yggdrasil.com)
# The second word of TIFF files is the TIFF version number, 42, which has
# never changed. The TIFF specification recommends testing for it.
0 string MM\x00\x2a TIFF image data, big-endian
!:mime image/tiff
0 string II\x2a\x00 TIFF image data, little-endian
!:mime image/tiff
The second part is pretty straightforward, as now you only need to specify text data. You can go with INI or yaml format as suggested by others, or you can just have a simple tabbed file like this:
image/tiff C:\Program Files\imageviewer.exe application/json C:\Program Files\notepad.exe
source to share
I have used some packages to generate config files as well as yaml. I recommend that you use ConfigParser or ConfigObj.
Finally, the best option. If you want to create a user-friendly config file with comments, I highly recommend using ConfigObj.
Enjoy!
ConfigObj example
With this code:
You can also use ConfigObj to store them. Try this: import configobj
def createConfig(path):
config = configobj.ConfigObj()
config.filename = path
config["Sony"] = {}
config["Sony"]["product"] = "Sony PS3"
config["Sony"]["accessories"] = ['controller', 'eye', 'memory stick']
config["Sony"]["retail price"] = "$400"
config["Sony"]["binary one"]= bin(173)
config.write()
You get this file:
[Sony] product = Sony PS3 accessories = controller, eye, memory stick retail price = $400 binary one = 0b10101101
source to share