Meta-Learning based Recommender System to Recommend Developers for Crowdsourcing Software Development

Project for the submitted paper for Empirical Software Engineering Journal


This is the project for our paper that proposed a meta-learning based recommender system to recommend reliable developers for crowdsourcing software development(CSD).
We shall give an insturction that will guide you to use the source code in this project here in detail.

Instruction for building the recommender system from source code and executing experiments

Prepare system environment
Start to run the data Crawler
Construct Input Data
Train Meta Models
Run Baselines and Policy Model for experiments

Prepare system environment

Minimum configuration of machines

RAM: 256G
CPU: 12 logic cores
Disk: 1TB+
TitanXP NVIDIA GPU is recommended for boosting computation
Make sure the bandwidth is at least 1000Mb/s if the database is not in your programming machine

Install python environment

We develop the whole system using python, so we recommend you to install an anaconda virtual python3.6 environment at: https://www.anaconda.com/

Install Mysql Database

Install mysql database into your computer with a linux system, and configure mysql ip and port according to the instruction of https://www.mysql.com/.

Install JDK8 and relative JAVA runtime

We use the crawler program implemented in JAVA. Please refer to the topcoder project at: https://github.com/lifeloner/topcoder for newest data crawler implemented in JAVA and prepare to import relative jar libraries.

Required python packages

machine learning:scikit-learn, lightgbm, xgboost, tensorflow, keras, imbalance-learn, networkx
data preprocessing: pymysql, numpy, pandas
models: Models required for the tool

Project Check

The DIG is implemented in CompetitionGraph Package.
The machine learning algorithms and policy model are implemented in ML_Models package.
For challenge and developer feature encoding and some data preprocessing modules of the system, refer to the DataPre package.
The Utility package contains some personalized tag definition, user function and testing scripts.
Make sure that the hierarchy of data folder is same in local disk.

Start to run the data Crawler

We do have a database in our laboratory, but due to the size and continuously updating of our database, it is not a good way to put the database here. Instead, we put the tools for data collection here, thus everyone can get enough data as they want. If you are eager for our data, contact me via the anonymous email mail@{[email protected]}.

Install mysql database into your computer with a linux system, and configure mysql ip and port according to the instruction of https://www.mysql.com/.
refer to the topcoder project at: https://github.com/lifeloner/topcoder for newest data crawler implemented in JAVA.
After downloading the java crawler maven project, please use intelliJ idea at: https://www.jetbrains.com/idea/ to deploy the crawler jar package in your machine
Configure the ip and port of your crawler according to the the configure of mysql database
Start run the crawler by the following command which will run in background: nohup java –jar crawler.jar &

Construct Input Data

Configure the datra/dbSetup.xml and set ip and port as same as the machine running mysql database, copy data/viewdef.sql and run it in your mysql client to create view for initial data cleaning.

You need to encode Developer and Challenge features at first

Run TaskContent.py of DataPre package to generate challenge feature encoding vectors and build clustering model
Run UserHistory.py of DataPre package to generate developer history data
Run DIG.py of CompetitionGraph package to generate developer rank score data

Run TaskUserInstances.py of DataPre package to generate input data

Adjust the maxProcessNum of DataInstances class to adapt your computer CPU and RAM
For training,set global variant testInst=False. The value of variant mode in global means 0-registration training data input, 1-submission training data input, 2-winning training data input. You have to run the script under the 3 values.
Generate test input data via set mode=2 and testinst=True

After finished running all the above scripts, check whether the generate traing input and test input data is completed via running the TopcoderDataset.py

Train Meta Models

Run XGBoostModel.py of ML_Models package

Feed “keepd” as key of tasktypes and run the script for 3 times with mode =0,1,and 2
Feed “clustered” as key of tasktrypes and run the script for 3 times with mode=0,1,and 2
After finished this, the meta model implemented using XGBoost algorithms can extract registration meta-feature, submission meta-feature and winning met-feature of all datasets

Run DNNModel.py of ML_Models package in the same way as XGBoostModel.py

Run EnsembleModel.py of ML_Models package in the same way as XGBoostModel.py

Generate the performance of all the winning meta models via running MetaModelTest.py of ML_Models package

Readers can build winning predictor based on the performance results

Run Baselines and Policy Model for experiments

Run BaselineModel.py of ML_Models package to build the baseline models we mentioned in the paper

After building baseline models, run the MetaModelTest.py of ML_Models package again but pass the model name as the names of classes of the baseline model in BaselineModel.py to generate performance results

Readers can refer to MetaLearning.py of ML_Models package which implemented some new learning process but may not be global optima

..........................................................

Please give a cite to our work if you want use the project somewhere else.

@INPROCEEDINGS{metalearning-recommender, 
author={Zhenyu Zhang, Hailong Sun, HongyuZhang}, 
title={Developer Recommendation for Topcoder through aMeta-learning based Policy Model},
year={2019},
url={https://github.com/zhangzhenyu13/CSDMetalearningRS} 
}

Name		Name	Last commit message	Last commit date
Latest commit History 134 Commits
Baselines		Baselines
CompetitionGraph		CompetitionGraph
DataPrepare		DataPrepare
ML_Models		ML_Models
Utility		Utility
data		data
.gitignore		.gitignore
README.md		README.md
SystemInstruction.pdf		SystemInstruction.pdf
experiments.xlsx		experiments.xlsx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Meta-Learning based Recommender System to Recommend Developers for Crowdsourcing Software Development

Project for the submitted paper for Empirical Software Engineering Journal

Instruction for building the recommender system from source code and executing experiments

Prepare system environment

Start to run the data Crawler

Construct Input Data

Train Meta Models

Run Baselines and Policy Model for experiments

Please give a cite to our work if you want use the project somewhere else.

About

Releases

Packages

Contributors 2

Languages

zhangzhenyu13/CSDMetalearningRS

Folders and files

Latest commit

History

Repository files navigation

Meta-Learning based Recommender System to Recommend Developers for Crowdsourcing Software Development

Project for the submitted paper for Empirical Software Engineering Journal

Instruction for building the recommender system from source code and executing experiments

Prepare system environment

Start to run the data Crawler

Construct Input Data

Train Meta Models

Run Baselines and Policy Model for experiments

Please give a cite to our work if you want use the project somewhere else.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages