Jython Processor¶

Sparkflows has a Jython Processor.

The Jython Processor allows writing Jython code to process the incoming DataFrame. It then produces a resulting DataFrame.

In the Jython node, the following variables are available:

inDF : Incoming Spark DataFrame
spark : The Spark Session object

Example Jython Code¶

Below are some example Jython code which can be used.

Select a specific column from the DataFrame¶

outDF = inDF.select(“c2”)

Count the number of records after grouping them¶

outDF = inDF.groupBy(“c2”).count()

Run a SQL on the input DataFrame¶

The Jython Processor registers the incoming dataframe as a temporary table with a configurable name.

The below SQL in Jython script, performs a SELECT on the registered temporary table.

outDF = spark.sql(“SELECT c1, c2 FROM fire_temp_table”)

Run a SQL followed by further grouping and count¶

outDF = spark.sql(“SELECT c1, c2 FROM fire_temp_table”)
outDF = outDF.groupBy(“c2”).count()

Read from HDFS and create a new DataFrame¶

The below Jython script, reads a JSON file from HDFS.

outDF = spark.read().json(“data/people.json”)