Include letters, numbers & underscore always as Elasticsearch token
There are various other regexes here that are trying to capture tokens in different contexts but at the very least we should also always be greedily capturing a series of letters, numbers and underscores. It's OK if this is already covered in some cases by another regex since we de-duplicate tokens anyway. The test included in this change is an example where we don't correctly capture this token today and it is a common example in Ruby so we should cover it.
Showing
Please register or sign in to comment