How to protect password and username in Spark (for example for JDBC connection / RDBMS database access)?
Password setting
On the command line, as a plaintext config:
spark-submit --conf spark.jdbc.password=test_pass ...
Using environment variable:
export jdbc_password=test_pass_export
spark-submit --conf spark.jdbc.password=$jdbc_password ...
Using the curvature config properties file:
echo "spark.jdbc.b64password=test_pass_prop" > credentials.properties
spark-submit --properties-file credentials.properties
Base64 encoded for "obfuscation":
echo "spark.jdbc.b64password=$(echo -n test_pass_prop | base64)" > credentials_b64.properties
spark-submit --properties-file credentials_b64.properties
Using a password in code
import java.util.Base64 // for base64
import java.nio.charset.StandardCharsets // for base64
val properties = new java.util.Properties()
properties.put("driver", "com.mysql.jdbc.Driver")
properties.put("url", "jdbc:mysql://mysql-host:3306")
properties.put("user", "test_user")
val password = new String(Base64.getDecoder().decode(spark.conf.get("spark.jdbc.b64password")), StandardCharsets.UTF_8)
properties.put("password", password)
val models = spark.read.jdbc(properties.get("url").toString, "ml_models", properties)
Edit: command line command line interface command line --conf and -properties-file:
--conf PROP=VALUE Arbitrary Spark configuration property.
--properties-file FILE Path to a file from which to load extra properties. If not
specified, this will look for conf/spark-defaults.conf.
The file property name is arbitrary.
source to share
If you look at the documentation , you will see the arguments spark-submit
:
./bin/spark-submit \
--class <main-class> \
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
Arguments passed in the [application-arguments]
post at the end after <application-jar>
are passed as args
to the method main
. You can use this mechanism to provide a username and password manually for a job on the command line when performing a job, if it is a one-off thing.
If you want a more durable solution, you can store the password (hashed in some way) in a restricted file. The location of that file will then be transferred to your work in [application-arguments]
.
source to share