What are the valid types for Elasticsearch's word delimiter filter type table?

Elasticsearch's Word Delimiter filter has a very useful option type_table; it lets you turn otherwise special characters into legitimate characters for tokens.

It is, however, very sparsely documented:

type_table A custom type mapping table, for example (when configured using type_table_path): # Map the $, %, '.', and ',' characters to DIGIT # This might be useful for financial data. $ => DIGIT % => DIGIT . => DIGIT \\u002C => DIGIT # in some cases you might not want to split on ZWJ # this also tests the case where we need a bigger byte[] # see http://en.wikipedia.org/wiki/Zero-width_joiner \\u200D => ALPHANUM

From that example, we can discern that DIGIT and ALPHANUM are valid options to which we can map characters. What other options are there, and what do they do?


I found the answer by digging down into the Lucene documentation, from which Elasticsearch is basically quoting.

The docs for WordDelimiterFilterFactory linked to this file in the Subversion repository. It's heavily quoted by the Elasticsearch docs, but contained this additional snippet:

A customized type mapping for WordDelimiterFilterFactory the allowable types are: LOWER, UPPER, ALPHA, DIGIT, ALPHANUM, SUBWORD_DELIM

Category:elasticsearch Time:2017-12-06 Views:1

Related post

  • What are the Valid values for http Pragma 2011-09-10

    What are the valid values for http header pragma . I know no-cache is one but i wnat to enable caching so what should i set it. I did some googleing and all that i got was most clients ignore this but no info on other values it accepts. -------------

  • What are the valid values for a django URL field? 2012-01-08

    What are the valid values for a django URL field? Is it only for http URL resources or does it support a wider range. eg ssh, rsync, git etc. I tried putting what I considered to be valid Git URL and it failed. Because I am not using the verify_exist

  • What are the valid signatures for C's main() function? 2010-01-21

    This question already has an answer here: What should main() return in C and C++? 18 answers What is the proper declaration of main? 4 answers What really are the valid signatures for main function in C? I know: int main(int argc, char *argv[]) Are t

  • What are the valid reasons for incrementing the version ID of a Serializable object definition? 2012-02-19

    What are the valid and invalid reasons for incrementing the version ID of a class definition that implements Serializable? In other words what will introduce a write-then-read incompatibility? Is there any way that a change to a method can introduce

  • What are the valid characters for Registry keys and valuenames? 2009-06-18

    More specifically, what is the authoritative source for that information? This may look like a non-programming question, but I need to know whether a registry path fed to my code contains a regular expression or not. I decided the best way to do that

  • What are the valid conditions for zcml:condition? 2009-10-20

    ZCML can include conditional directives of the form <configure zcml:condition="installed some.python.package"> (conditional configuration directives) </configure> What is the expression syntax for condition? Is 'or' allowed? -------------

  • What are the valid values for Safely Remove reg entry 2012-12-14

    I've seen in a few forum posts that you can change the value of this registry entry: HKCU\Software\Microsoft\Windows\CurrentVersion\Applets\SysTray\Services to various values to add or remove items that will trigger the Safely Remove Hardware icon fr

  • What are the valid characters for macro names? 2008-12-15

    Are C-style macro names subject to the same naming rules as identifiers? After a compiler upgrade, it is now emitting this warning for a legacy application: warning #3649-D: white space is required between the macro name "CHAR_" and its replacement t

  • What are the valid characters for a Mime Multipart message ContentId "CID:"? 2010-04-09

    From reading the RFC it appears that CID can/must only contain characters from the same set as those permissable by a regular URI. Is this correct. Im asking because I wish to writeup a simple helper that takes a CIDs prefix and adds a counter when g

  • What is the technical name and what are the valid characters for the jsessionId token that is often appended by J2EE containers into urls 2011-05-30

    I have observed there are various RFC that describe the limits and constraints for the various components within a URL but i just do not know the name of the token that is hacked onto the end of a URL by J2EE containers that insert a session id token

  • What are the valid data types for vertex arrays in WebGL? 2015-01-11

    What are the valid data types for vertex arrays in WebGL? --------------Solutions------------- The valid data types for vertex arrays in WebGL are:1 byte, ubyte, short, ushort, float The corresponding constants are: BYTE, UNSIGNED_BYTE, SHORT, UNSIGN

  • What are the best practices for safe type conversion in C#? 2010-08-11

    what are the best practices for type conversions in C# ? int temp=System.ConvertToInt32(Request.QueryString["Id"]); if (temp!=null) { // logic goes here } This fails if Id somehow turns out to be 'abc' Please advice the use of of ternary operators an

  • What are the valid regular expression formats for the nonProxyHosts property in Java? 2010-11-01

    When using the proxyHost, proxyPort and nonProxyHosts properties in Java to modify a URL connection, what are the valid uses of wildcards in the nonProxyHosts property? Could I do any or all of the following? explicit server name: nonProxyHosts=serve

  • What are the valid instanceState's for the Amazon EC2 API? 2009-01-12

    What are the valid instanceState's for the Amazon EC2 API? It doesn't seem to be defined in the current API doc. Google doesn't turn up much. So far I know about: 0: pending 16: running 32: shutting-down 48: terminated but I'm pretty sure I've seen a

  • What are the best practices for Design by Contract programming 2009-04-13

    What are the best practices for Design by Contract programming. At college I learned the design by contract paradigma (in an OO environment) We've learned three ways to tackle the problem : 1) Total Programming : Covers all possible exceptional cases

  • What are the precise rules for when you can omit parenthesis, dots, braces, = (functions), etc.? 2009-07-25

    What are the precise rules for when you can omit (omit) parentheses, dots, braces, = (functions), etc.? For example, (service.findAllPresentations.get.first.votes.size) must be equalTo(2). service is my object def findAllPresentations:Option[List[Pre

  • What are the design criteria for primary keys? 2010-09-03

    Choosing good primary keys, candidate keys and the foreign keys that use them is a vitally important database design task -- as much art as science. The design task has very specific design criteria. What are the criteria? --------------Solutions----

  • What are the valid use cases of goto in PHP? 2011-01-30

    I know, there are other questions about the goto statement introduced in PHP 5.3. But I couldn't find any decent answer in there, all were of the type "last resort", "xkcd", "evil", "bad", "EVIL!!!". But no valid example. Only statements that there a

  • What is a promotional disk and what are the licensing terms for it? 2014-02-22

    what is mean by promotional disk? what are the licencing term for it? --------------Solutions------------- Given away at some event/function. It is a retail license - "Not for Resale". Same license restrictions as a retail copy of the product. It doe

Copyright (C) pcaskme.com, All Rights Reserved.

processed in 1.470 (s). 13 q(s)