clear solution for fuzzy tasks
TextLib Software
fast text converting and processing components for MS Office and Adobe Acrobat documents
Fuzzy Search technique versus Stemming

So as it mentioned below Textolution full text search and retrieval tools are based on Fuzzy principle. There the fuzzy search technique is used as more advanced alternative to text search and retrieval based on Stemming principle. Where is the difference between these two approaches?

Searching and retrieving a stemmer reduces the query to its word root form and matches results containing this stem. For example for query 'specially' a stemming algorithm will find the results "especially", "special", "specialize", "specializing", "specification" and other having the root "spec". However if in the query word will be casual mismatch like 'spesial' or 'spetial' the search engine based on a stemming algorithm will show zero results. Or by example for the root "use" a stemmer will additionally match "user", "useful" but not "using" or "usage".

Fuzzy technique uses approximate full text search and retrieval. This means it will match all possible results for a search query despite its form or spelling mistakes/mismatches presence no matter what part of word they will be in. This way it will retrieve "special" even either your query will be "spesial", "spetial" or "spizial". It will show all related results by relevancy and similarity degree.

Another significant advantage of Fuzzy search technique versus Stemming is its approximate matching can be applied to multilanguage search while the Stemming cannot work with more than one language texts. There are several known stemmers created for most spoken world languages. At that time every stemmer can work only with one language i.e. it will be impossible apply it to index another language text. This is very inconveniently when working with texts containing citations, passages, remarks and other info in different languages. And it will impossible to apply full text search / retrieval / indexing for a text written in any language that no stemmers does exist for. In the same time creating of self stemming solution will take a lot of time, investments and require profound linguistic knowledge. Furthermore it usually is very hard to fit a stemming algorithm to a language nature specificity that will work correctly because the accidence principles of languages are very different while the most stemmers are based on to word root reducing. And there are many languages where the accidence is based on root structure change (for example "man" and "men" in English).

The language neutrality of Fuzzy technique makes it the best of existing solutions for such problem. Its full text search, retrieval and indexing supports different languages simultaneously. There is no need to fit it for a specific language or alphabet. Based on Unicode Fuzzy solutions are linguistic universal. The only requirement is the text has be written from left to right.

The conclusion is: using Textolution tools you are getting search and retrieval of full text information that is very fast thanks to its advanced indexing and flexible and universal thanks to its fuzzy approximate matching technique.