Course Project

Suggestions on project topics:

Hoberg-Phillips similarity

Hoberg-Phillips provide firms' product cosine similarity.
It can be used as the kernel function between firms.
Use HP similarity as a proxy for stock correlation and calculate portfolio weight
Consider a proper y variable (?) for Kernel SVM or (Kernel PCA + ML methods)
Improve HP similarity with doc2vec method? (quite a big project). Improve industry classification.

Sentiment analysis on central bank statement

Train the FOMC (or PBoC's) statements or minute with doc2vec (or word2vec) algorithm to to create vector representation
Measure the distance between the statement and minute?
Fit ML models with the policy rate change or market reaction as y variable
Identify the most important vector (and corresponding word/phrase) with feature selection/extraction

Team List (Presentation Order)

Group	Members	Repo
1	Chen Man, Ning Lei	XGBOOST model for slecting impotant features of stock and forming unlinear factor
2	Li Xinsha	Predicting Chinese interest rate by machine learning approach
3	Li Panyu, Li Linxiong	Signal mining based on machine learning
4	Xie Zhonglin, Xu Xinyu	Predicting Rebar Futures Price with Low-mid Frequency Data
5	Zhang Wenchang, Yu Lei	Would Mr. Market sing FED's songs? -- Sentiment Analysis on the FOMC's documents
6	Jiang Yifan, Peng Feng	Discovery of investment opportunities in high-tech industries based on patent information
7	Qi Daifeng	Text-Based Firm Similarity Based on Edgar 10-K Report
8	Guo Xinran, Sun Bo	Quantitative Trading via Machine Learning in the Chinese Stock Market
9	Sihan Zhai, Hu Xueyang	Sentiment Analysis on FOMC Statements and Minutes
10	Wang Zijie, Ye Mengjie	Feauture selection in stock predicition and portfolio management by Lasso and LightBGM

Team Formation (10. 24 Sun)

Form a group (up to 3 students) and select data set
Designate a repository GITHUB_ID/PHBS_MLF_2021 of one team member for the team project.
Let TA know the repository to be used for th eproject
Put team members' student # and github ID in README.md (for the syntax of .md file, see markdown cheetsheet)
README.md will be eventually the report of your course project.

Data Selection (10. 28 Thurs)

No restriction on data set. However, business(fin/ma/econ) related data is welcome (extra credit for creative data selection and pre-processing)
Put the data under GITHUB_ID/PHBS_MLF_2021/data folder (if too big, put some samples)
Put a brief description of your data and the goal of the project in README.md (refer to markdown cheetsheet)

Project Guidline

Report should be consist of the summary in README.md and the execution in python notebooks .ipynb. ( .pdf, .ppt, .doc NOT accepted.)
In the README.md summary,
- You may update your proposal file.
- briefly describe your motivation, goal, data source, result and conclusion.
- A few figure or table for summary is recommended.
- Use links to data or .ipynb files (see past year examples below)
In the .ipynb execution,
- Put command cell and edit cell (comments) in a balanced way. (Do not only put code!)
- Put a brief table of contents with links (example: PML)
- You may breakdown code into several .ipynb files by function (e.g., data cleaning, learning, result analysis). In that case, make sure to save intermediate result into file so that I can run the later steps (result analysis) without running previous steps (data cleaning, learning).
- The use of .py file should be strictly restricted to function or class only. (Do not put any learning procedure in .py)
- I should be able to reproduce the result from your code. Your code should run with no error. Code with error will be severely deduct your score. Make sure to run your code in a new session.
Other considerations:
- Make sure the workload within team is balanced. (Add your team members to collaborators, each team members commit codes, etc)
- There should be no secret component (e.g., stock trading strategy)
- Creative (out-of-textbook) ideas are recommended for better result or result analysis
Deadline for updating report is 11.21 Sunday Midnight (11:59 PM)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project.md

Project.md

Course Project

Suggestions on project topics:

Hoberg-Phillips similarity

Sentiment analysis on central bank statement

Team List (Presentation Order)

Team Formation (10. 24 Sun)

Data Selection (10. 28 Thurs)

Project Guidline

Files

Project.md

Latest commit

History

Project.md

File metadata and controls

Course Project

Suggestions on project topics:

Hoberg-Phillips similarity

Sentiment analysis on central bank statement

Team List (Presentation Order)

Team Formation (10. 24 Sun)

Data Selection (10. 28 Thurs)

Project Guidline