Elasticsearch's Word Delimiter filter has a very useful option
type_table; it lets you turn otherwise special characters into legitimate characters for tokens.
It is, however, very sparsely documented:
type_table A custom type mapping table, for example (when configured using type_table_path): # Map the $, %, '.', and ',' characters to DIGIT # This might be useful for financial data. $ => DIGIT % => DIGIT . => DIGIT \\u002C => DIGIT # in some cases you might not want to split on ZWJ # this also tests the case where we need a bigger byte # see http://en.wikipedia.org/wiki/Zero-width_joiner \\u200D => ALPHANUM
From that example, we can discern that
ALPHANUM are valid options to which we can map characters. What other options are there, and what do they do?
I found the answer by digging down into the Lucene documentation, from which Elasticsearch is basically quoting.
The docs for WordDelimiterFilterFactory linked to this file in the Subversion repository. It's heavily quoted by the Elasticsearch docs, but contained this additional snippet:
A customized type mapping for WordDelimiterFilterFactory the allowable types are: LOWER, UPPER, ALPHA, DIGIT, ALPHANUM, SUBWORD_DELIM