What are the valid types for Elasticsearch's word delimiter filter type table?

Elasticsearch's Word Delimiter filter has a very useful option type_table; it lets you turn otherwise special characters into legitimate characters for tokens.

It is, however, very sparsely documented:

type_table A custom type mapping table, for example (when configured using type_table_path): # Map the $, %, '.', and ',' characters to DIGIT # This might be useful for financial data. $ => DIGIT % => DIGIT . => DIGIT \\u002C => DIGIT # in some cases you might not want to split on ZWJ # this also tests the case where we need a bigger byte[] # see http://en.wikipedia.org/wiki/Zero-width_joiner \\u200D => ALPHANUM

From that example, we can discern that DIGIT and ALPHANUM are valid options to which we can map characters. What other options are there, and what do they do?


I found the answer by digging down into the Lucene documentation, from which Elasticsearch is basically quoting.

The docs for WordDelimiterFilterFactory linked to this file in the Subversion repository. It's heavily quoted by the Elasticsearch docs, but contained this additional snippet:

A customized type mapping for WordDelimiterFilterFactory the allowable types are: LOWER, UPPER, ALPHA, DIGIT, ALPHANUM, SUBWORD_DELIM

