deepfakes meets Markov chains
Group Name: textov
Group members: Praveen Kalva spkalva3, Anushri Mittal anushri6, Akanksha Kumar kumar65, Arul Viswanathan arulv2
Project intro: Our project uses Markov chains to generate sentences from a text dataset. textov will read in user inputted text files and generate uniquely ordered text that resembles the words in the file. Essentially a deepfakes for text. Uses parallelism and sparse matrices for efficient memory and runtime.
Goals:
- Create a Markov Chain map off of the words in a text file
- Be able to use that Markov Chain to generate sentences
Why we chose it:
- We found markov chains interesting and we thought the application of it on text would be a fun project idea.
System Overview:
- Read from text file
- Cleaning the data
- Create Markov Chain mapping using sparse matrix and data
- Create stochaistic model using weighted probability from the sparse matrix to simulate/generate sentence
- Repeat for more sentences
- Add pararallelism/concurrency to speed up creating the markov chain
- Eventually, add on higher order markov chains for different results (more random to more deterministic)
- If we have time, create a web app/ui
Possible Challenges:
- Dealing with errors when reading data and creating Markov chain
- Using pararallelism without running into errors with shared memory and ownership
- Edge cases with data formatting/cleaning
References:
- We found inspiration from this article: https://chalkdustmagazine.com/features/fun-with-markov-chains/ .