Taming ElasticSearchRead time in minutes: 9
- limbo: my hacky prototype with sonic
- looker and sonic
- yes i really did name a search component after a bumbling incompetent detective and it's likely going to be load bearing when it comes to running things in Vercel.
- parsing the markdown
- both websites use different dialects of markdown
- I mostly wrote a parser that did most of what we needed, and even used it to convert some articles from one format to another
- this code was very annoying to write, and it was even worse to read.
- looker and sonic
- lust: the realization that we actually need elasticsearch sooner rather
- I was horrified by this, I have only ever had bad experiences with this program
- I was very lucky that one of my coworkers used to work at elastic, so he had some experience with these horrors
- apparently there was a feature that could allow us to not have to write that hacky code to index everything, automatic indexing with app search
- gluttony: setting up elasticsearch with elastic cloud
- this was actually pretty easy, it was pretty nice to have something be plug in play throughout this entire philosophical nightmare
- the go experience
- go and elastic search differ on how JSON works. a lot.
- go expects JSON types to be consistent, if something always has a value, go works really well
- there is no such luck with elastic search
- insert some example JSON
- insert example types
- attempts to use generics to make it simpler
- it failed
- manually written types with a lot of ignoreempty directives
- greed: using appsearch to automatically index the site
- this actually worked pretty well at first
- then we found out it doesn't actually strip off headers and footers from HTML pages
- I would have expected it to use the readability algorithm or
something to automatically strip the stuff out, because it's
usually not relevant from a search perspective
- this made the word Tailscale return search results for every page, which is almost certainly not what we want
- this seemed to not be configurable, even though the UI seemed to have a lot of configuration options
- anger: frustration at the manually written types
- I found out that the elastic search client for go does have a
typed dialect, where it will automatically have all of the types
that elastic search uses to generate cromulent JSON.
- of course, it was not typeful enough for my needs
- It also added 500 go packages to the build, which added about 3
minutes of build time due to our container build tool running
go build -vin CI. This would quickly become untenable. Our side goal was to decrease build time, not increase it.
- my manager suggested using a go template as a joke
- to both of our amazement, it worked
- paste the template
- I'm not questioning it, but it does make it a bit easier to maintain for people that aren't the best at Go types to JSON types conversions, which is a nice bonus.
- I found out that the elastic search client for go does have a typed dialect, where it will automatically have all of the types that elastic search uses to generate cromulent JSON.
- heresy: writing my indexer
- I found out that the knowledge base had a side effect where dumped
everything to a JSON file, this included a plain text version of
every knowledge base article. this was exactly what I needed
- generally you want to search plain text, not HTML. otherwise you can have people searching for HTML elements, which is almost certainly what neither person wants
- I got things working, but then no documents showed up in the index
- the index itself wasn't showing up, even as a hidden index
- then I found out the true horror of the go elastic search library: it doesn't handle errors from elastic search for you.
- I found out that the knowledge base had a side effect where dumped everything to a JSON file, this included a plain text version of every knowledge base article. this was exactly what I needed
- violence: adding manual response code checking everywhere
- I had to check the response code everywhere, and the documentation didn't tell me what response code was the correct response code from elastic search. so I had to guess a bit and then run my code a few times to make sure that the response code was correct.
- this is very frustrating, because if I'm using an API client, I
expect this to be handled for me. I can understand this if it's a
low level HTTP client like the one that the go standard library
ships, but if I'm using an API wrapper, I expect this to be
handled for me and errors to automatically be surfaced as errors
at the language level.
- I had to stop working for the rest of the day when I found this out
- code examples on how other apis work
- a lamentation that they're probably not going to be able to fix this because of API compatibility
- fraud: the input format for everything
- when passing input to the server, you are expected to give it byte
slices. this pattern usually creates "type holes", which are
places where non determinism can get in at the type system, and
make things unsafe. this was most annoying when I was trying to
index documents, where it said I could just pass in a byte slice,
but it didn't say what type of data was in the byte slice.
- I'm sure this pattern makes sense in java, but go is not java. no matter how much the Kubernetes team wants to believe otherwise.
- this is another thing that I expected API client to handle for me, or at least the documentation to be more explicit about what format you should use here. yes, from context I know that it's supposed to be JSON, but I wanted to be absolutely sure with confirmation.
- when passing input to the server, you are expected to give it byte slices. this pattern usually creates "type holes", which are places where non determinism can get in at the type system, and make things unsafe. this was most annoying when I was trying to index documents, where it said I could just pass in a byte slice, but it didn't say what type of data was in the byte slice.
- treachery: authentication
- Elastic search allows for a lot of options when it comes to authentication. Almost none of them work.
- the credentials that are used in the cloud panel for creating your elastic search server are not used to authenticate to your elastic search server. this goes contrary to what the documentation says.
- elastic search has a method for using API keys for authentication instead of user names at passwords. this also does not work. the documentation is vague as to how to use an API key, and all of my tests have proven that API keys do not work with the go library.
- the only thing that I got working was to use a user name and
password like some kind of caveman. but at this point, I was happy
that anything worked. this entire process has been fighting uphill
to a level that would be absurd for most people to attempt to
accomplish. I actually started doing this on the knowledge base
because the back end was written in go, and I didn't want to be
fighting my knowledge of typescript at the same time that I was
fighting my knowledge of elastic search. I am not sure if this was
a mistake or not.
- I am told that the typescript client for elastic search is a lot easier to deal with than the go client for elastic search. this makes sense, typescript has a much more flexible type system that allows you to get away with a lot of things that would be impossible in go.
- the light at the end of the tunnel: everything working
- at this point I was writing everything very very defensively, but then everything started to work. it was like the sky parted and I felt light reign on me from above. I have felt a level of peace that I have not felt very many times in my career. and then it was shattered by things not working.
- and then I got things working again. that made me feel a bit better.
- blasphemy: descent into madness again indexer fails in prod, github actions debugging
- flattery: random panics from looking up search results, replicating
the panic locally because we don't have logs in www prod
- Turns out that adding highlights to search results means sometimes
you can get matches that don't have a highlighted region.
- Aoi\ Ohhhh, is this why array index lookups in Rust return an Option>T< instead of a normal value?
- Cadey+coffee\ Yes. Yes it is.
- I got lucky because "NixOS" was one of the search terms that triggered this condition.
- I ended up having to fix this by having it return the first 150 characters of an article instead of a highlighted segment of the article. It was kinda ugly but it worked!
- Turns out that adding highlights to search results means sometimes you can get matches that don't have a highlighted region.
- sorcery: Things work locally but not in CI
- Temporary folder behavior differs somehow
hypocracy: thievery: discord: falsification: