Pyspark show dataframe as horizontally scrolling table in ipython notebook

pyspark.sql.DataFrame

displays randomly with DataFrame.show()

- line wrapping instead of scroll.

enter image description here

but displayed with pandas.DataFrame.head

enter image description here

I have tried these options

import IPython
IPython.auto_scroll_threshold = 9999

from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
from IPython.display import display

      

but no luck. While scrolling works when used in the Atom editor with the jupyter plugin:

enter image description here

+11


source to share


3 answers


this is a workaround

spark_df.limit(5).toPandas().head()

      



although, I don't know the computational burden of this request. I think limit()

not expensive. corrections are welcome.

+5


source


I created below li'l function and it works great:

def printDf(sprkDF): 
    newdf = sprkDF.toPandas()
    from IPython.display import display, HTML
    return HTML(newdf.to_html())

      



you can use it directly on your spark queries if you like, or on any spark dataframe:

printDf(spark.sql('''
select * from employee
'''))

      

0


source


This is now possible natively with Spark 2.4.0 by setting the parameter spark.sql.repl.eagerEval.enabled

to True

:

enter image description here

0


source







All Articles