How to protect password and username in Spark (for example for JDBC connection / RDBMS database access)?

We have a use case where we need to export data from HDFS to an RDBMS. I've seen this example . This is where they store the username and password in code. Is there a way to hide the password when exporting data, for example we have the password aliases option in Sqoop.

+3


source to share


3 answers


Password setting

On the command line, as a plaintext config:

spark-submit --conf spark.jdbc.password=test_pass ... 

      

Using environment variable:

export jdbc_password=test_pass_export
spark-submit --conf spark.jdbc.password=$jdbc_password ...

      

Using the curvature config properties file:

echo "spark.jdbc.b64password=test_pass_prop" > credentials.properties
spark-submit --properties-file credentials.properties

      



Base64 encoded for "obfuscation":

echo "spark.jdbc.b64password=$(echo -n test_pass_prop | base64)" > credentials_b64.properties
spark-submit --properties-file credentials_b64.properties

      

Using a password in code

import java.util.Base64 // for base64
import java.nio.charset.StandardCharsets // for base64
val properties = new java.util.Properties()
properties.put("driver", "com.mysql.jdbc.Driver")
properties.put("url", "jdbc:mysql://mysql-host:3306")
properties.put("user", "test_user")
val password = new String(Base64.getDecoder().decode(spark.conf.get("spark.jdbc.b64password")), StandardCharsets.UTF_8)
properties.put("password", password)
val models = spark.read.jdbc(properties.get("url").toString, "ml_models", properties)

      

Edit: command line command line interface command line --conf and -properties-file:

  --conf PROP=VALUE           Arbitrary Spark configuration property.
  --properties-file FILE      Path to a file from which to load extra properties. If not
                              specified, this will look for conf/spark-defaults.conf.

      

The file property name is arbitrary.

+2


source


If you look at the documentation , you will see the arguments spark-submit

:

./bin/spark-submit \
  --class <main-class> \
  --master <master-url> \
  --deploy-mode <deploy-mode> \
  --conf <key>=<value> \
  ... # other options
  <application-jar> \
  [application-arguments]

      



Arguments passed in the [application-arguments]

post at the end after <application-jar>

are passed as args

to the method main

. You can use this mechanism to provide a username and password manually for a job on the command line when performing a job, if it is a one-off thing.

If you want a more durable solution, you can store the password (hashed in some way) in a restricted file. The location of that file will then be transferred to your work in [application-arguments]

.

+1


source


When launching your application from the console using spark-submit function, you can access it via Java API:

Console console = System.console();
char passwordArray[] = console.readPassword("Enter your secret password: ");
account.setPassword(passwordArray);

      

0


source







All Articles