Pyspark show dataframe as horizontally scrolling table in ipython notebook

Question

Pyspark show dataframe as horizontally scrolling table in ipython notebook

pyspark.sql.DataFrame

displays randomly with DataFrame.show()

- line wrapping instead of scroll.

but displayed with pandas.DataFrame.head

I have tried these options

import IPython
IPython.auto_scroll_threshold = 9999

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display

but no luck. While scrolling works when used in the Atom editor with the jupyter plugin:

+11

pandas ipython jupyter-notebook pyspark pyspark-sql

muon Apr 15. 17 at 14:17

source to share

3 answers

I created below li'l function and it works great:

def printDf(sprkDF): 
    newdf = sprkDF.toPandas()
    from IPython.display import display, HTML
    return HTML(newdf.to_html())

you can use it directly on your spark queries if you like, or on any spark dataframe:

printDf(spark.sql('''
select * from employee
'''))

0

Mbhatt June 20. 17 at 17:21

source to share

This is now possible natively with Spark 2.4.0 by setting the parameter spark.sql.repl.eagerEval.enabled

to True

:

0

Kyle barron 30 nov. 18 at 21:44

source to share

muon · Accepted Answer · 2018-09-28T17:42:39+0000

this is a workaround

spark_df.limit(5).toPandas().head()

although, I don't know the computational burden of this request. I think limit()

not expensive. corrections are welcome.

Pyspark show dataframe as horizontally scrolling table in ipython notebook

More articles: