Automatic calculation of process metrics and their bug prediction capabilities

Identifying fault-prone code parts is useful for the developers to help reduce the time required for locating bugs. It is usually done by characterizing the already known bugs with certain kinds of metrics and building a predictive model from the data. For the characterization of bugs, software prod...

Teljes leírás

Elmentve itt :
Bibliográfiai részletek
Szerző: Gyimesi Péter
Dokumentumtípus: Cikk
Megjelent: 2017
Sorozat:Acta cybernetica 23 No. 2
Kulcsszavak:Informatika, Kibernetika, Számítástechnika, Folyamatmutatók
Tárgyszavak:
doi:10.14232/actacyb.23.2.2017.7

Online Access:http://acta.bibl.u-szeged.hu/50087
LEADER 02877nab a2200229 i 4500
001 acta50087
005 20220620145055.0
008 180213s2017 hu o 0|| eng d
022 |a 0324-721X 
024 7 |a 10.14232/actacyb.23.2.2017.7  |2 doi 
040 |a SZTE Egyetemi Kiadványok Repozitórium  |b hun 
041 |a eng 
100 1 |a Gyimesi Péter 
245 1 0 |a Automatic calculation of process metrics and their bug prediction capabilities  |h [elektronikus dokumentum] /  |c  Gyimesi Péter 
260 |c 2017 
300 |a 537-559 
490 0 |a Acta cybernetica  |v 23 No. 2 
520 3 |a Identifying fault-prone code parts is useful for the developers to help reduce the time required for locating bugs. It is usually done by characterizing the already known bugs with certain kinds of metrics and building a predictive model from the data. For the characterization of bugs, software product and process metrics are the most popular ones. The calculation of product metrics is supported by many free and commercial software products. However, tools that are capable of computing process metrics are quite rare. In this study, we present a method of computing software process metrics in a graph database. We describe the schema of the database created and we present a way to readily get the process metrics from it. With this technique, process metrics can be calculated at the file, class and method levels. We used GitHub as the source of the change history and we selected 5 open-source Java projects for processing. To retrieve positional information about the classes and methods, we used SourceMeter, a static source code analyzer tool. We used Neo4j as the graph database engine, and its query language - cypher - to get the process metrics. We published the tools we created as open-source projects on GitHub. To demonstrate the utility of our tools, we selected 25 release versions of the 5 Java projects and calculated the process metrics for all of the source code elements (files, classes and methods) in these versions. Using our previous published bug database, we built bug databases for the selected projects that contain the computed process metrics and the corresponding bug numbers for files and classes. (We published these databases as an online appendix.) Then we applied 13 machine learning algorithms on the database we created to find out if it is feasible for bug prediction purposes. We achieved F-measure values on average of around 0.7 at the class level, and slightly better values of between 0.7 and 0.75 at the file level. The best performing algorithm was the RandomForest method for both cases. 
650 4 |a Természettudományok 
650 4 |a Számítás- és információtudomány 
695 |a Informatika, Kibernetika, Számítástechnika, Folyamatmutatók 
856 4 0 |u http://acta.bibl.u-szeged.hu/50087/1/actacyb_23_2_2017_7.pdf  |z Dokumentum-elérés