Why am I not restoring in python variables the same values as SQL from table

Question

Why am I not restoring in python variables the same values as SQL from table

This is my first post on Stack Overflow and I try to be as succinct as possible. I have some experience with SQL, but I am just getting started with python coding. I have a strange result getting SQL data in python variables and it seems like I am doing something wrong that I cannot find.

I got row 26244 in my SQLite table as the SQL query shows:

table = 'DatosLaboratorio' 
SQLQuery = "SELECT COUNT(*) FROM %s" % table
rows = cursor.execute(SQLQuery).fetchone()[0]
print(rows)

26244

However, when I try to sum the table, python does not recover the same numbers:

SQLQuery = "SELECT familia, COUNT(*) as num FROM %s GROUP BY familia ORDER BY familia" % table
rows = cursor.execute(SQLQuery).fetchall()
conn.commit()
# sum totals previously grouped in field 1 (num)
count=0
for row in rows:
    count=count+row[1]
print(count)

8862

I have verified that a direct SQL query against SQLite gives the correct numbers:

select sum(num) as total from 
(select familia, count (*) as num from DatosLaboratorio group by familia)

total 
26244

Worse, when I try to get the data in a DataFrame using pandas, I don't get the same counts, it seems that pandas only reads 33 valid rows, but I have values in all 26244 records:

SQLQuery = "SELECT * FROM %s" % table
df = pd.read_sql (SQLQuery,conn)
conn.commit()
df.count()

ID                     33
seccion                 0
fecha                  33
familia                33
codigo                 33
extractoseco           33
materiagrasa           33
sal                    33
ph                     33
observaciones          33
phsalmuera              0
temperaturasalmuera     4
densidadsalmuera        4

What am I missing? Thank you in advance for your help!

@Hrabal: adding output

This is the SQL of the query on SQLite:

select familia, count (*) as num from DatosLaboratorio group by familia

RecNo   familia num
1   CABRA BARRA TIERNO  297
2   CABRA MADURADO 3 KG 29
3   CABRA MADURADO MINI 44
4   CABRA TIERNO 3 KGS  140
5   CABRA TIERNO BARRA 4,2  50
6   CABRA TIERNO MINI   258
7   GRAN CAPITAN 3 KGS  2
8   MADURADO 3 KG SL    2588
9   MADURADO 3 KGS IQM  315
10  MADURADO 3 KGS S/LIS    308
11  MADURADO 3KG CL 1229
12  MADURADO BARRA  1585
13  MADURADO BARRA 4,2  523
14  MADURADO BARRA IQM  60
15  MADURADO BARRA IQM 4,2  41
16  MADURADO MINI   1393
...
50  TIERNO MINI IQM 142
51  TIERNO MINI LIGHT   572
52  TIERNO PÑO  323
53  TIERNO PÑO IQM  2124
54  TIERNO SOJA 3 KGS   3
55  TIERNO SOJA BARRA   14
56  TIERNO SOJA MINI    4

So the result is 56 lines of data grouped by "familia" and sum ("num") = 26244

When I print from python it doesn't seem to read all the data:

SQLQuery = "SELECT familia, COUNT(*) as num FROM %s GROUP BY familia ORDER BY familia" % table
rows = cursor.execute(SQLQuery).fetchall()
conn.commit()
columns = [column[0] for column in cursor.description]
print(columns)
for row in rows:
    print (row[0],row[1])

['familia', 'num']
CABRA BARRA TIERNO 297
CABRA MADURADO 3 KG 29
CABRA MADURADO MINI 44
CABRA TIERNO 3 KGS 140
CABRA TIERNO BARRA 4,2 50
CABRA TIERNO MINI 258
GRAN CAPITAN 3 KGS 2
MADURADO 3 KG SL 2588
MADURADO 3 KGS IQM 315
MADURADO 3 KGS S/LIS 308
MADURADO 3KG CL 1229
MADURADO BARRA 1585
MADURADO BARRA 4,2 523
MADURADO BARRA IQM 60
MADURADO BARRA IQM 4,2 41
MADURADO MINI 1393

All python data reads, apparently: the first 16 lines, or at least I can't get the rest of the data. It should read 56 lines. And pandas doesn't read all the data.

+3

python sql pandas

jrp 05 oct. 14 at 15:16

source to share

1 answer

Hrabal · Answer 1 · 2014-10-07T07:41:56+0000

All I can think of is the problem in .fetchall()

. Since python only gives you the first 16 lines, fetchall () doesn't work, try using .fetchone()

(if you have a small dataset) or a generator with .fetchmany()

:

def ResultGenerator(cursor, arraysize=8):
  while True:
    results = cursor.fetchmany(arraysize)
    if not results:
        break
    for result in results:
        yield result

cursor = con.cursor()
SQLQuery = "SELECT familia, COUNT(*) as num FROM %s GROUP BY familia ORDER BY familia" % table
cursor.execute(SQLQuery)

for row in ResultGenerator(cursor):
    print (row[0],row[1])

This way python only fetches 8 lines at a time, consuming less memory (maybe this is the problem here?).

Try playing with the arraysize variable to see if something has changed.

resources: Python generators are fun

Why am I not restoring in python variables the same values ​​as SQL from table

More articles:

Why am I not restoring in python variables the same values as SQL from table