Open Source Toolkit for Extraction of Cognates and False Friends (TECFF)

September 30, 2009

Today I granted to the community (under MIT license) the source code of the most interesting algorithms designed for my PhD thesis (implemented in C#):

MMEDR – algorithm for measuring weighted orthographic similarity between Bulgarian and Russian words taking into account some linguistically motivated Bulgarian-Russian correspondences (current supports Bulgarian and Russian only)
SemSim – algorithm for measuring semantic similarity between words by searching in Google and analyzing the returned text snippets (currently supports Bulgarian, Russian and English)
CrossSim – algorithm for measuring cross-lingual semantic similarity by searching in Google and analyzing the returned text snippets (currently supports Bulgarian and Russian only)
FFExtract: algorithm for extracting false friends from parallel corpus by determining candidates through MMEDR algorithm and combining statistical and semantic evidence for distinguishing between cognates and false friends (currently supports Bulgarian and Russian only)

The project is titled TECFF (Toolkit for Extraction of Cognates and False Friends) and is available for public download from http://code.google.com/p/cognates-and-false-friends-tools/.

Comments (0)

RSS feed for comments on this post. TrackBack URL

Thoughts on Software Engineering

Open Source Toolkit for Extraction of Cognates and False Friends (TECFF)

LEAVE A COMMENT

Books

Useful Links

My Projects

Recent Posts

Tags

Categories