How to hash a string in Python?
I am trying to implement the following algorithm in Python [1] :
Problem : (row compression) Let
A
an arrayn x m
limited degree d (i.e., each element ofA
an integerA
that0<a<d
.), And mayk
be a number of different rowsA
. Search for an arrayk x m
,A'
so that each lineA
appears inA'
.Method : (Hashing) Collapse the rows
A
into a table, skipping duplicates and adding the rest toA'
.
I don't understand how to hash a string in Python?
Well, I would like to understand what it means, "hash of strings A to table". I understand the following. Suppose I have a matrix like this:
A = [[1, 2, 3, 4],
[1, 2, 3, 4],
[6, 7, 5, 4]]
So, I haveh its lines (somehow) and I get:
B = [x1, x2, x3]
where xi
is the hash of the string i
. Here I will x1=x2
have since line 1 and line 2 are equal. As I get x1=x2
, I will keep one and I will finally have:
A' = [[1, 2, 3, 4],
[6, 7, 5, 4]]
I'm right? If so, how can I implement such an algorithm in Python?
Thank.
[1] D. Comer, βRemoving Duplicate Rows from a Bounded Degree Array Using Attempts,β Purdue University, 1977.
source to share
First of all, you need to remove duplicate lines. You can use for this set
, but first you need to convert all your strings to immutable object types.
You can convert them to tuple
:
>>> A = [[1, 2, 3, 4],
... [1, 2, 3, 4],
... [6, 7, 5, 4]]
>>>
>>> map(tuple,A)
[(1, 2, 3, 4), (1, 2, 3, 4), (6, 7, 5, 4)]
Then you can use set
. And since it set
uses a hash function, the result will be hashed automatically:
>>> set(map(tuple,A))
set([(1, 2, 3, 4), (6, 7, 5, 4)])
source to share