Want to collaborate?

Right now, you can get in touch with me for a few things:
Open Source Contributions
Live Streaming
Mentoring
+ more
Follow
Developer, team lead, and technical coach
Read more
I'm available for
2021
Jun 23, 2021
Jun 23, 2021
Great article on how one can explain technical debt using a "make a cup of tea" metaphor:

Explaining Technical Debt, with Tea | Scrum.org
Read more
Jun 20, 2021
Jun 20, 2021
Still struggling with tokens and filters and such. The end goal is to aggregate over a list of social media posts that we've gathered and produce a word (aka, tag) cloud. 

The struggle is ignoring words like "then" and "when" (yes, using a stop filter will do that) and combining plurals and other stemmed words, such as "pumped" and "pumping".

I think I've got the right setup for the index now:

GET /_analyze
{
  "tokenizer": "classic",
  "filter": [ 
    {
      "type": "keep_types",
      "types": [ "<EMOJI>", "<NUM>" ],
      "mode": "exclude"
    },
    {
      "type": "length",
      "min": 2
    },
    "kstem", "stop",
    "classic", 
    "asciifolding"
  ],
  "text": "açaí à la carte can't the foxes trumpet trumping trump trumps trump's milk milks milky jumping jumper quicker quickly 1234 attentive attention 🙏"
}

This produces the following tokens, which is just about right...

{
  "tokens" : [
    {
      "token" : "acai",
      "start_offset" : 0,
      "end_offset" : 4,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "la",
      "start_offset" : 7,
      "end_offset" : 9,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "carte",
      "start_offset" : 10,
      "end_offset" : 15,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "can't",
      "start_offset" : 16,
      "end_offset" : 21,
      "type" : "<APOSTROPHE>",
      "position" : 4
    },
    {
      "token" : "fox",
      "start_offset" : 26,
      "end_offset" : 31,
      "type" : "<ALPHANUM>",
      "position" : 6
    },
    {
      "token" : "trumpet",
      "start_offset" : 32,
      "end_offset" : 39,
      "type" : "<ALPHANUM>",
      "position" : 7
    },
    {
      "token" : "trump",
      "start_offset" : 40,
      "end_offset" : 48,
      "type" : "<ALPHANUM>",
      "position" : 8
    },
    {
      "token" : "trump",
      "start_offset" : 49,
      "end_offset" : 54,
      "type" : "<ALPHANUM>",
      "position" : 9
    },
    {
      "token" : "trumps",
      "start_offset" : 55,
      "end_offset" : 61,
      "type" : "<ALPHANUM>",
      "position" : 10
    },
    {
      "token" : "trump",
      "start_offset" : 62,
      "end_offset" : 69,
      "type" : "<APOSTROPHE>",
      "position" : 11
    },
    {
      "token" : "milk",
      "start_offset" : 70,
      "end_offset" : 74,
      "type" : "<ALPHANUM>",
      "position" : 12
    },
    {
      "token" : "milk",
      "start_offset" : 75,
      "end_offset" : 80,
      "type" : "<ALPHANUM>",
      "position" : 13
    },
    {
      "token" : "milky",
      "start_offset" : 81,
      "end_offset" : 86,
      "type" : "<ALPHANUM>",
      "position" : 14
    },
    {
      "token" : "jump",
      "start_offset" : 87,
      "end_offset" : 94,
      "type" : "<ALPHANUM>",
      "position" : 15
    },
    {
      "token" : "jumper",
      "start_offset" : 95,
      "end_offset" : 101,
      "type" : "<ALPHANUM>",
      "position" : 16
    },
    {
      "token" : "quick",
      "start_offset" : 102,
      "end_offset" : 109,
      "type" : "<ALPHANUM>",
      "position" : 17
    },
    {
      "token" : "quick",
      "start_offset" : 110,
      "end_offset" : 117,
      "type" : "<ALPHANUM>",
      "position" : 18
    },
    {
      "token" : "1234",
      "start_offset" : 118,
      "end_offset" : 122,
      "type" : "<ALPHANUM>",
      "position" : 19
    },
    {
      "token" : "attentive",
      "start_offset" : 123,
      "end_offset" : 132,
      "type" : "<ALPHANUM>",
      "position" : 20
    },
    {
      "token" : "attention",
      "start_offset" : 133,
      "end_offset" : 142,
      "type" : "<ALPHANUM>",
      "position" : 21
    }
  ]
}
Read more
Jun 19, 2021
Jun 19, 2021
Learning about analyzers and filters in Elasticsearch. 

I continue to be amazed at how quickly one can search and even aggregate data in Elastic indexes.
Read more
2018
Jan 01, 2018
Jan 01, 2018
Excited to join Empact Development as Chief Developer! 🎉
Read more
Chief Developer, Empact Development
Loading...