How to create obfuscated spark FAT jar file using ProGuard

I recommend that you refer to How to create scala jar using sbt before reading this article.
This article will share the following contents.

  • What is ProGuard
  • sbt-proguard
  • How to create obfuscated spark jar file
  • Settings of ProGuard in build.sbt

What is ProGuard

ProGuard is an open-sourced Java (and Scala) class file shrinker, optimizer, obfuscator, and preverifier. As a result, ProGuard processed applications and libraries are smaller, faster, and somewhat hardened against reverse engineering.

  • shrinking step :
    detects and removes unused classes, fields, methods, and attributes.
  • optimizer step :
    optimizes bytecode and removes unused instructions.
  • obfuscation step :
    renames the remaining classes, fields, and methods using short meaningless names.
  • preverification step :
    adds preverification information to the classes, which is required for Java Micro Edition and for Java 6 and higher.

This article mainly focus on obfuscation step.

sbt-proguard

sbt-proguard is sbt plugin for running ProGuard. This plugin’s used in this article.

How to create obfuscated spark jar file

This article provides you the spark code as an example.
Spark sql package method will be obfuscated by ProGuard.

Prepare files

src/main/scala/Main.scala (Scala source file)

import org.apache.spark.sql.SparkSession

object Main {
  def main(args: Array[String]) {

    val spark = SparkSession
    .builder
    .master("local")
    .appName("test")
    .config("spark.ui.enabled","false")
    .getOrCreate()

    import spark.implicits._  
    import org.apache.spark.sql.functions._

    val df = spark.createDataset((0 until 10).toList)
    .filter(col("value") > 5)
    .withColumn("word",lit("life is beautiful"))
    df.show()
  }
}

project/plugins.sbt

Add plugin to project/plugins.sbt.

addSbtPlugin("com.github.sbt" % "sbt-proguard" % "0.5.0")

build.sbt

name := "test"
organization := "gotoqcode"
version := "1.0"
scalaVersion := "2.11.12"

// Dependencies
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.4.6"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.4.6"

// Enable sbt-proguard plugin
enablePlugins(SbtProguard)
// Specify the proguard version
Proguard / proguardVersion := "7.2.1"
// Set javaOptions
Proguard / proguard / javaOptions := Seq("-XX:-UseGCOverheadLimit","-Xmx4G")
// Set proguardOptions
// Specify the entry point
Proguard / proguardOptions += ProguardOptions.keepMain("Main")
// Configure proguard for scala
Proguard / proguardOptions ++= Seq(
    // Specifiy not to warn
    "-dontwarn",
    // Specify not to optimize the input class files
    "-dontoptimize",
    // Specify all attributes to be preserved
    "-keepattributes **",
    // Specify classes and class members (fields and methods) to be preserved
    "-keep class !org.apache.spark.sql.**, ** {*;}", 
    "-keep class org.apache.spark.sql.api.** {*;}",
    "-keep class org.apache.spark.sql.catalog.** {*;}",
    "-keep class org.apache.spark.sql.catalyst.** {*;}",
    "-keep class org.apache.spark.sql.execution.** {*;}",
    "-keep class org.apache.spark.sql.expressions.** {*;}",
    "-keep class org.apache.spark.sql.internal.** {*;}",
    "-keep class org.apache.spark.sql.jdbc.** {*;}",
    "-keep class org.apache.spark.sql.sources.** {*;}",
    "-keep class org.apache.spark.sql.streaming.** {*;}",
    "-keep class org.apache.spark.sql.test.** {*;}",
    "-keep class org.apache.spark.sql.types.** {*;}",
    "-keep class org.apache.spark.sql.util.** {*;}",
    "-keep class org.apache.spark.sql.vectorized.** {*;}",
    // Specify to exhaustively list classes and class members matched by the various -keep options
    "-printseeds seeds.txt")
// Set proguardInputs
Proguard / proguardInputs := (Compile / dependencyClasspath).value.files
// Set proguardFilteredInputs
Proguard / proguardFilteredInputs ++= ProguardOptions.noFilter((Compile / packageBin).value)
// Set proguardInputFilter 
Proguard / proguardInputFilter := { file =>
  file.name match {
    case _ => Some("!META-INF/MANIFEST.MF,!META-INF/DUMMY.DSA,!META-INF/DUMMY.SF,!com/google/protobuf25/**,!org/apache/spark/unused/UnusedStubClass.class,!org/apache/orc/storage/**,!javax/inject/**,!org/apache/hadoop/yarn/factories/package-info.class,!org/apache/hadoop/yarn/factory/providers/package-info.class,!org/apache/hadoop/yarn/util/package-info.class, !org/aopalliance/aop/**, !com/sun/activation/registries/**, !org/apache/hadoop/yarn/client/api/impl/package-info.class, !org/apache/hadoop/yarn/client/api/package-info.class, !org/aopalliance/intercept/**, !org/apache/commons/collections/FastHashMap.class, !org/apache/commons/collections/FastHashMap$Values.class, !org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class, !org/apache/commons/collections/FastHashMap$1.class, !org/apache/commons/collections/Buffer.class, !org/apache/commons/collections/BufferUnderflowException.class, !org/apache/commons/collections/FastHashMap$KeySet.class, !org/apache/commons/collections/FastHashMap$CollectionView.class, !org/apache/commons/collections/FastHashMap$EntrySet.class, !org/apache/commons/collections/ArrayStack.class")
  }
}

Run ProGuard

sbt proguard

You can check Fat jar is created in the following path.

  • target/scala-2.11/proguard/test_2.11-1.0.jar

Run jar file

java -jar target/scala-2.11/proguard/test_2.11-1.0.jar
+-----+-----------------+
|value|             word|
+-----+-----------------+
|    6|life is beautiful|
|    7|life is beautiful|
|    8|life is beautiful|
|    9|life is beautiful|
+-----+-----------------+

Check jar file

You can check jar file is obfuscated using jar tvf <jar's path> command or JD-GUI.

You can also check target/scala-2.11/proguard/configuration.pro that shows the configuration of proguard.

Settings of ProGuard in build.sbt

I’d like to explain a little bit more about the settings of ProGuard in build.sbt.

javaOptions

Set javaOptions like this to avoid “OutOfMemoryError”.

Proguard / proguard / javaOptions := Seq("-XX:-UseGCOverheadLimit","-Xmx4G")

proguardOptions

> keepMain

Specifies Main class as entry point.

Proguard / proguardOptions += ProguardOptions.keepMain("Main")

> -dontwarn and -dontoptimize

  • -dontwarn : Specifiy not to warn
  • -dontoptimize : Specify not to optimize the input class files
    ※Not optimization is no problem because optimization isn’t necessary in this case.

> -keepattributes

-keepattributes ** option is used to keep all annotations from obfuscation.

> -keep

This -keep options means that all org.apache.spark.sql classes including all methods are obfuscated except org.apache.spark.sql.{api,catalog,catalyst, …}.

Kindly check the ProGuard manual for details.

// Specify classes and class members (fields and methods) to be preserved
"-keep class !org.apache.spark.sql.**, ** {*;}", 
"-keep class org.apache.spark.sql.api.** {*;}",
"-keep class org.apache.spark.sql.catalog.** {*;}",
"-keep class org.apache.spark.sql.catalyst.** {*;}",
"-keep class org.apache.spark.sql.execution.** {*;}",
"-keep class org.apache.spark.sql.expressions.** {*;}",
"-keep class org.apache.spark.sql.internal.** {*;}",
"-keep class org.apache.spark.sql.jdbc.** {*;}",
"-keep class org.apache.spark.sql.sources.** {*;}",
"-keep class org.apache.spark.sql.streaming.** {*;}",
"-keep class org.apache.spark.sql.test.** {*;}",
"-keep class org.apache.spark.sql.types.** {*;}",
"-keep class org.apache.spark.sql.util.** {*;}",
"-keep class org.apache.spark.sql.vectorized.** {*;}",

proguardInputs

Set all dependencies jar files as an input files.

Proguard / proguardInputs := (Compile / dependencyClasspath).value.files

You can check by the following command in sbt shell.

show Compile / dependencyClasspath

proguardFilteredInputs

Set /target/scala-2.11/test_2.11-1.0.jar jar including only Main class as an input files without proguardInputFilter

Proguard / proguardFilteredInputs ++= ProguardOptions.noFilter((Compile / packageBin).value)

You can check by the following command in sbt shell.

show Compile / packageBin

proguardInputFilter

You can avoid to put files in output jar file by proguardInputFilter.

// Set proguardInputFilter
Proguard / proguardInputFilter := { file =>
  file.name match {
    case _ => Some("!META-INF/MANIFEST.MF,!META-INF/DUMMY.DSA,!META-INF/DUMMY.SF,!com/google/protobuf25/**,!org/apache/spark/unused/UnusedStubClass.class,!org/apache/orc/storage/**,!javax/inject/**,!org/apache/hadoop/yarn/factories/package-info.class,!org/apache/hadoop/yarn/factory/providers/package-info.class,!org/apache/hadoop/yarn/util/package-info.class, !org/aopalliance/aop/**, !com/sun/activation/registries/**, !org/apache/hadoop/yarn/client/api/impl/package-info.class, !org/apache/hadoop/yarn/client/api/package-info.class, !org/aopalliance/intercept/**, !org/apache/commons/collections/FastHashMap.class, !org/apache/commons/collections/FastHashMap$Values.class, !org/apache/commons/collections/FastHashMap$CollectionView$CollectionViewIterator.class, !org/apache/commons/collections/FastHashMap$1.class, !org/apache/commons/collections/Buffer.class, !org/apache/commons/collections/BufferUnderflowException.class, !org/apache/commons/collections/FastHashMap$KeySet.class, !org/apache/commons/collections/FastHashMap$CollectionView.class, !org/apache/commons/collections/FastHashMap$EntrySet.class, !org/apache/commons/collections/ArrayStack.class")
  }
}

You can use Merging.

< References >

コメントを残す

メールアドレスが公開されることはありません。