Why am I not restoring in python variables the same values ββas SQL from table
This is my first post on Stack Overflow and I try to be as succinct as possible. I have some experience with SQL, but I am just getting started with python coding. I have a strange result getting SQL data in python variables and it seems like I am doing something wrong that I cannot find.
I got row 26244 in my SQLite table as the SQL query shows:
table = 'DatosLaboratorio'
SQLQuery = "SELECT COUNT(*) FROM %s" % table
rows = cursor.execute(SQLQuery).fetchone()[0]
print(rows)
26244
However, when I try to sum the table, python does not recover the same numbers:
SQLQuery = "SELECT familia, COUNT(*) as num FROM %s GROUP BY familia ORDER BY familia" % table
rows = cursor.execute(SQLQuery).fetchall()
conn.commit()
# sum totals previously grouped in field 1 (num)
count=0
for row in rows:
count=count+row[1]
print(count)
8862
I have verified that a direct SQL query against SQLite gives the correct numbers:
select sum(num) as total from
(select familia, count (*) as num from DatosLaboratorio group by familia)
total
26244
Worse, when I try to get the data in a DataFrame using pandas, I don't get the same counts, it seems that pandas only reads 33 valid rows, but I have values ββin all 26244 records:
SQLQuery = "SELECT * FROM %s" % table
df = pd.read_sql (SQLQuery,conn)
conn.commit()
df.count()
ID 33
seccion 0
fecha 33
familia 33
codigo 33
extractoseco 33
materiagrasa 33
sal 33
ph 33
observaciones 33
phsalmuera 0
temperaturasalmuera 4
densidadsalmuera 4
What am I missing? Thank you in advance for your help!
@Hrabal: adding output
This is the SQL of the query on SQLite:
select familia, count (*) as num from DatosLaboratorio group by familia
RecNo familia num
1 CABRA BARRA TIERNO 297
2 CABRA MADURADO 3 KG 29
3 CABRA MADURADO MINI 44
4 CABRA TIERNO 3 KGS 140
5 CABRA TIERNO BARRA 4,2 50
6 CABRA TIERNO MINI 258
7 GRAN CAPITAN 3 KGS 2
8 MADURADO 3 KG SL 2588
9 MADURADO 3 KGS IQM 315
10 MADURADO 3 KGS S/LIS 308
11 MADURADO 3KG CL 1229
12 MADURADO BARRA 1585
13 MADURADO BARRA 4,2 523
14 MADURADO BARRA IQM 60
15 MADURADO BARRA IQM 4,2 41
16 MADURADO MINI 1393
...
50 TIERNO MINI IQM 142
51 TIERNO MINI LIGHT 572
52 TIERNO PΓO 323
53 TIERNO PΓO IQM 2124
54 TIERNO SOJA 3 KGS 3
55 TIERNO SOJA BARRA 14
56 TIERNO SOJA MINI 4
So the result is 56 lines of data grouped by "familia" and sum ("num") = 26244
When I print from python it doesn't seem to read all the data:
SQLQuery = "SELECT familia, COUNT(*) as num FROM %s GROUP BY familia ORDER BY familia" % table
rows = cursor.execute(SQLQuery).fetchall()
conn.commit()
columns = [column[0] for column in cursor.description]
print(columns)
for row in rows:
print (row[0],row[1])
['familia', 'num']
CABRA BARRA TIERNO 297
CABRA MADURADO 3 KG 29
CABRA MADURADO MINI 44
CABRA TIERNO 3 KGS 140
CABRA TIERNO BARRA 4,2 50
CABRA TIERNO MINI 258
GRAN CAPITAN 3 KGS 2
MADURADO 3 KG SL 2588
MADURADO 3 KGS IQM 315
MADURADO 3 KGS S/LIS 308
MADURADO 3KG CL 1229
MADURADO BARRA 1585
MADURADO BARRA 4,2 523
MADURADO BARRA IQM 60
MADURADO BARRA IQM 4,2 41
MADURADO MINI 1393
All python data reads, apparently: the first 16 lines, or at least I can't get the rest of the data. It should read 56 lines. And pandas doesn't read all the data.
source to share
All I can think of is the problem in .fetchall()
. Since python only gives you the first 16 lines, fetchall () doesn't work, try using .fetchone()
(if you have a small dataset) or a generator with .fetchmany()
:
def ResultGenerator(cursor, arraysize=8):
while True:
results = cursor.fetchmany(arraysize)
if not results:
break
for result in results:
yield result
cursor = con.cursor()
SQLQuery = "SELECT familia, COUNT(*) as num FROM %s GROUP BY familia ORDER BY familia" % table
cursor.execute(SQLQuery)
for row in ResultGenerator(cursor):
print (row[0],row[1])
This way python only fetches 8 lines at a time, consuming less memory (maybe this is the problem here?).
Try playing with the arraysize variable to see if something has changed.
resources: Python generators are fun
source to share