1 Prepare dataset

The raw dataset has 81 libs, 613 libs GAV, and 7986 clients, with information of the following variables:

##  [1] "ID"                           "Lib_groupId"                 
##  [3] "Lib_artifactId"               "Lib_version"                 
##  [5] "Compile"                      "Debloat"                     
##  [7] "Lib_original_test_error"      "Lib_original_test_failing"   
##  [9] "Lib_original_test_passing"    "Lib_debloat_test_error"      
## [11] "Lib_debloat_test_failing"     "Lib_debloat_test_passing"    
## [13] "Size_original_jar"            "Size_debloat_jar"            
## [15] "Nb_classes_original"          "Nb_methods_original"         
## [17] "Nb_debloated_classes"         "Nb_debloated_methods"        
## [19] "Lib_coverage"                 "Client_groupId"              
## [21] "Client_artifactId"            "Client_Compile"              
## [23] "Client_Debloat"               "Client_original_test_error"  
## [25] "Client_original_test_failing" "Client_original_test_passing"
## [27] "Client_debloat_test_error"    "Client_debloat_test_failing" 
## [29] "Client_debloat_test_passing"  "Client_coverage"             
## [31] "Cover_lib"                    "Lib"                         
## [33] "Lib_gav"                      "Client"

2 Overall lib debloat success

A total of 468 different libs GAVs were debloated successfully, whereas 144 fail.

For the rest of the analysis, we exclude the libs for which the debloat fail, this results in a dataset with 72 different libs, 468 different libs GAV, and 4351 different clients.

3 Distribution of versions per library

The lib com.alibaba:fastjson has the larger number of different versions with 35, followed by joda-time:joda-time with 25, and redis.clients:jedis with 23.