I recommend that you refer to How to create scala jar using sbt before reading this article.
This article will share the following contents.
- What is ProGuard
- sbt-proguard
- How to create obfuscated spark jar file
- Settings of ProGuard in build.sbt
What is ProGuard
ProGuard is an open-sourced Java (and Scala) class file shrinker, optimizer, obfuscator, and preverifier. As a result, ProGuard processed applications and libraries are smaller, faster, and somewhat hardened against reverse engineering.

- shrinking step :
detects and removes unused classes, fields, methods, and attributes. - optimizer step :
optimizes bytecode and removes unused instructions. - obfuscation step :
renames the remaining classes, fields, and methods using short meaningless names. - preverification step :
adds preverification information to the classes, which is required for Java Micro Edition and for Java 6 and higher.
This article mainly focus on obfuscation step.
sbt-proguard
sbt-proguard is sbt plugin for running ProGuard. This plugin’s used in this article.
How to create obfuscated spark jar file
This article provides you the spark code as an example.
Spark sql package method will be obfuscated by ProGuard.
Prepare files
src/main/scala/Main.scala (Scala source file)
import org.apache.spark.sql.SparkSession
object Main {
def main(args: Array[String]) {
val spark = SparkSession
.builder
.master("local")
.appName("test")
.config("spark.ui.enabled","false")
.getOrCreate()
import spark.implicits._
import org.apache.spark.sql.functions._
val df = spark.createDataset((0 until 10).toList)
.filter(col("value") > 5)
.withColumn("word",lit("life is beautiful"))
df.show()
}
}
project/plugins.sbt
Add plugin to project/plugins.sbt.
addSbtPlugin("com.github.sbt" % "sbt-proguard" % "0.5.0")
build.sbt
name := "test"
organization := "gotoqcode"
version := "1.0"
scalaVersion := "2.11.12"
// Dependencies
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.6"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.6"
// Enable sbt-proguard plugin
enablePlugins(SbtProguard)
// Specify the proguard version
Proguard / proguardVersion := "7.2.1"
// Set javaOptions
Proguard / proguard / javaOptions := Seq("-XX:-UseGCOverheadLimit","-Xmx4G")
// Set proguardOptions
// Specify the entry point
Proguard / proguardOptions += ProguardOptions.keepMain("Main")
// Configure proguard for scala
Proguard / proguardOptions ++= Seq(
// Specifiy not to warn
"-dontwarn",
// Specify not to optimize the input class files
"-dontoptimize",
// Specify all attributes to be preserved
"-keepattributes **",
// Specify classes and class members (fields and methods) to be preserved
"-keep class !org.apache.spark.sql.**, ** {*;}",
"-keep class org.apache.spark.sql.api.** {*;}",
"-keep class org.apache.spark.sql.catalog.** {*;}",
"-keep class org.apache.spark.sql.catalyst.** {*;}",
"-keep class org.apache.spark.sql.execution.** {*;}",
"-keep class org.apache.spark.sql.expressions.** {*;}",
"-keep class org.apache.spark.sql.internal.** {*;}",
"-keep class org.apache.spark.sql.jdbc.** {*;}",
"-keep class org.apache.spark.sql.sources.** {*;}",
"-keep class org.apache.spark.sql.streaming.** {*;}",
"-keep class org.apache.spark.sql.test.** {*;}",
"-keep class org.apache.spark.sql.types.** {*;}",
"-keep class org.apache.spark.sql.util.** {*;}",
"-keep class org.apache.spark.sql.vectorized.** {*;}",
// Specify to exhaustively list classes and class members matched by the various -keep options
"-printseeds seeds.txt")
// Set proguardInputs
Proguard / proguardInputs := (Compile / dependencyClasspath).value.files
// Set proguardFilteredInputs
Proguard / proguardFilteredInputs ++= ProguardOptions.noFilter((Compile / packageBin).value)
// Set proguardInputFilter
Proguard / proguardInputFilter := { file =>
file.name match {
case _ => Some("!META-INF/MANIFEST.MF,!META-INF/DUMMY.DSA,!META-INF/DUMMY.SF,!com/google/protobuf25/**,!org/apache/spark/unused/UnusedStubClass.class,!org/apache/orc/storage/**,!javax/inject/**,!org/apache/hadoop/yarn/factories/package-info.class,!org/apache/hadoop/yarn/factory/providers/package-info.class,!org/apache/hadoop/yarn/util/package-info.class, !org/aopalliance/aop/**, !com/sun/activation/registries/**, !org/apache/hadoop/yarn/client/api/impl/package-info.class, !org/apache/hadoop/yarn/client/api/package-info.class, !org/aopalliance/intercept/**, !org/apache/commons/collections/FastHashMap.class, !org/apache/commons/collections/FastHashMap$Values.class, !org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class, !org/apache/commons/collections/FastHashMap$1.class, !org/apache/commons/collections/Buffer.class, !org/apache/commons/collections/BufferUnderflowException.class, !org/apache/commons/collections/FastHashMap$KeySet.class, !org/apache/commons/collections/FastHashMap$CollectionView.class, !org/apache/commons/collections/FastHashMap$EntrySet.class, !org/apache/commons/collections/ArrayStack.class")
}
}
Run ProGuard
sbt proguard
You can check Fat jar is created in the following path.
- target/scala-2.11/proguard/test_2.11-1.0.jar
Run jar file
java -jar target/scala-2.11/proguard/test_2.11-1.0.jar
+-----+-----------------+
|value| word|
+-----+-----------------+
| 6|life is beautiful|
| 7|life is beautiful|
| 8|life is beautiful|
| 9|life is beautiful|
+-----+-----------------+
Check jar file
You can check jar file is obfuscated using jar tvf <jar's path>
command or JD-GUI.


You can also check target/scala-2.11/proguard/configuration.pro that shows the configuration of proguard.
Settings of ProGuard in build.sbt
I’d like to explain a little bit more about the settings of ProGuard in build.sbt.
javaOptions
Set javaOptions like this to avoid “OutOfMemoryError”.
Proguard / proguard / javaOptions := Seq("-XX:-UseGCOverheadLimit","-Xmx4G")
proguardOptions
> keepMain
Specifies Main class as entry point.
Proguard / proguardOptions += ProguardOptions.keepMain("Main")
> -dontwarn
and -dontoptimize
-dontwarn
: Specifiy not to warn-dontoptimize
: Specify not to optimize the input class files
※Not optimization is no problem because optimization isn’t necessary in this case.
> -keepattributes
-keepattributes **
option is used to keep all annotations from obfuscation.
> -keep
This -keep
options means that all org.apache.spark.sql classes including all methods are obfuscated except org.apache.spark.sql.{api,catalog,catalyst, …}.
Kindly check the ProGuard manual for details.
// Specify classes and class members (fields and methods) to be preserved
"-keep class !org.apache.spark.sql.**, ** {*;}",
"-keep class org.apache.spark.sql.api.** {*;}",
"-keep class org.apache.spark.sql.catalog.** {*;}",
"-keep class org.apache.spark.sql.catalyst.** {*;}",
"-keep class org.apache.spark.sql.execution.** {*;}",
"-keep class org.apache.spark.sql.expressions.** {*;}",
"-keep class org.apache.spark.sql.internal.** {*;}",
"-keep class org.apache.spark.sql.jdbc.** {*;}",
"-keep class org.apache.spark.sql.sources.** {*;}",
"-keep class org.apache.spark.sql.streaming.** {*;}",
"-keep class org.apache.spark.sql.test.** {*;}",
"-keep class org.apache.spark.sql.types.** {*;}",
"-keep class org.apache.spark.sql.util.** {*;}",
"-keep class org.apache.spark.sql.vectorized.** {*;}",
proguardInputs
Set all dependencies jar files as an input files.
Proguard / proguardInputs := (Compile / dependencyClasspath).value.files
You can check by the following command in sbt shell.
show Compile / dependencyClasspath
proguardFilteredInputs
Set /target/scala-2.11/test_2.11-1.0.jar jar including only Main class as an input files without proguardInputFilter
Proguard / proguardFilteredInputs ++= ProguardOptions.noFilter((Compile / packageBin).value)
You can check by the following command in sbt shell.
show Compile / packageBin
proguardInputFilter
You can avoid to put files in output jar file by proguardInputFilter.
// Set proguardInputFilter
Proguard / proguardInputFilter := { file =>
file.name match {
case _ => Some("!META-INF/MANIFEST.MF,!META-INF/DUMMY.DSA,!META-INF/DUMMY.SF,!com/google/protobuf25/**,!org/apache/spark/unused/UnusedStubClass.class,!org/apache/orc/storage/**,!javax/inject/**,!org/apache/hadoop/yarn/factories/package-info.class,!org/apache/hadoop/yarn/factory/providers/package-info.class,!org/apache/hadoop/yarn/util/package-info.class, !org/aopalliance/aop/**, !com/sun/activation/registries/**, !org/apache/hadoop/yarn/client/api/impl/package-info.class, !org/apache/hadoop/yarn/client/api/package-info.class, !org/aopalliance/intercept/**, !org/apache/commons/collections/FastHashMap.class, !org/apache/commons/collections/FastHashMap$Values.class, !org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class, !org/apache/commons/collections/FastHashMap$1.class, !org/apache/commons/collections/Buffer.class, !org/apache/commons/collections/BufferUnderflowException.class, !org/apache/commons/collections/FastHashMap$KeySet.class, !org/apache/commons/collections/FastHashMap$CollectionView.class, !org/apache/commons/collections/FastHashMap$EntrySet.class, !org/apache/commons/collections/ArrayStack.class")
}
}
You can use Merging.