My domain is Natural Language Processing (NLP).
From 2011 to 2023, I was a postdoctoral research fellow at Trinity College Dublin in the School of Computer Science and Statistics, working at the Adapt Research Centre. Most of my github stuff is related to my research work. Now that I'm a consultant data scientist, my work is not public anymore.
About my research activities:
- My professional page.
- My publications on my page, on Google Scholar, on ORCID, on Semantic Scholar, on the ACL Anthology, or on HAL.
- A fair number of repositories, usually experiments linked to one of my papers
About my non-research activities:
- I contribute to DataScience StackExchange.
- I like TiddlyWiki and I wish I could spend more time doing stuff with it.
- I also have a few tools and some other stuff.
A not very well known but challenging subdomain of NLP.
- https://github.com/erwanm/CLGTextTools: Perl library containing functions to analyze text documents and especially extract text features.
- https://github.com/erwanm/clg-authorship-analytics: set of scripts and libraries to perform author-identification related tasks (Perl).
- https://github.com/erwanm/clg-authorship-experiments: a set of experiments with detailed documentation for
clg-authorship-analytics
(Perl + R).
- https://github.com/alfredomg/ADAPT-MWE17: participation to the VMWE17 Shared Task.
- https://github.com/erwanm/adapt-vmwe18: participation to the VMWE18 Shared Task.
See also related Shiny visualizations at https://brainmend.adaptcentre.ie/
- https://github.com/erwanm/tdc-tools: tools for representing and manipulating data in the Tabular Document-Concept (TDC) format. Used in my other LBD repos (Python).
- https://github.com/erwanm/medline-discoveries: a method for "mining impactful discoveries from the biomedical literature" (Python, R)
- https://github.com/erwanm/lbd-contrast: an experimental approach for LBD.
- https://github.com/erwanm/knowledgediscovery: modified fork to extract and apply LBD methods.
- https://github.com/erwanm/PowerGraph: dependency for the above
- https://github.com/erwanm/kd-data-tools: an ad-hoc concept disambiguation system for KD output (Medline and PMC).
- https://github.com/erwanm/elephant-wrapper: wrapper for the Elephant tokenizer, together with several experiments (LREC18 paper)
- https://github.com/erwanm/erw-ml-utils: ML-related scripts, especially for use with weka
- https://github.com/erwanm/TreeTaggerWrapper: a convenient wrapper to use the venerable POS tagger.
- https://github.com/erwanm/quest: an abandoned fork of Quest (for MT Quality Estimation).
- https://github.com/erwanm/tw-aggregator: a system to automatically aggregate TiddlyWiki content from a collection of public wikis
- https://github.com/erwanm/TW-WhoAmIGame: a simple game meant to be customized with your own questions and answers.
- https://github.com/erwanm/tw-doc: in-house basic documentation generator which adds information extracted from code files to an existing tiddlywiki file.
- https://github.com/erwanm/TiddlyWiki5: fork
- https://github.com/erwanm/Projectify: fork, not started working on it.
- https://github.com/erwanm/TW5-TimeTodo: another fork that I didn't work on.
- https://github.com/erwanm/encfs-util: scripts for linking EncFS (directory encryption) with pass (a command-line password manager) (Bash).
- https://github.com/erwanm/erw-setup: my config and a few scripts (Bash)
- https://github.com/erwanm/erw-tsv-commons: scripts to perform manipulations on TSV files (Perl)
- https://github.com/erwanm/erw-bash-commons: various useful bash functions. includes my "project management" system (Bash)
- https://github.com/erwanm/hugo-chalk: a modified template for Hugo (a framework for building a static website).
- https://github.com/erwanm/indie-coding: various pieces of code (currently about collecting open-license images) (Bash)
- https://github.com/erwanm/Poker-StatsSystem: old attempt at automatic statistics from poker hands, unfinished (Perl).
-
- https://github.com/erwanm/code-snippets: as the name suggests