rolisz's site

Searching for something?

This post is "cu dedicatie pentru Ciprian de la Bistrita", who has asked for a search feature for some time now

I didn't have a search on my blog for quite some time, because it's a static website, without any dynamic backend (except for comments, but those are well isolated, on a subdomain). But Javascript and the browsers are getting more and more features every day, so it's now possible to do all of this clientside. You just have to go to the search page (also linked in the menu).

I had three options:

The total size of my posts is about 1.5Mb. If I build the inverted index offline, it's about 9Mb. Because I didn't want to send that over the network on loading the search page, I decided to send only a JSON document containing all the posts and then create the index on the client side. This was quite simple:

class LunrSearch(View):
    def generate(self, conf, env, request):
        if not env.options.search:
            raise StopIteration()
        docs = []
        for i, entry in enumerate(request['entrylist']):
            docs.append({"url": entry.permalink,
                         "date": entry.date.isoformat(' '),
                         "tags": entry.tags,
                         "title": entry.title,
                         "content": entry.content})
        yield (io.StringIO(json.dumps(docs, ensure_ascii=False)),
              joinurl(conf['output_dir'], self.path))

This view is called for a route in my con­fig­u­ra­tion and dumps there all the posts, in JSON format.

The client side is more in­ter­est­ing.

<input id="search"/>
<ul id='results'></ul>

<script src="/static/js/lunr.min.js"></script>
<script>
'use strict';
var index, docs, tasks;
var input = document.getElementById('search'); 
var resultdiv = document.getElementById('results');

We start by defining an input where we can type and a list where the results will be shown.

document.addEventListener("DOMContentLoaded", function(event) { 
    // Set up search
    var xhr = new XMLHttpRequest();
    xhr.onreadystatechange = function()
    {
        if (xhr.readyState === XMLHttpRequest.DONE) {
            if (xhr.status === 200) {
                    parseResults(JSON.parse(xhr.responseText));
            } else {
                resultdiv.innerHTML = "<li>Error loading search index! \
                    Please tell me about this! :(</li>";
            }
        }
    };
    xhr.open("GET", '/static/js/search.json', true);
    xhr.send();

});

Because I don't use jQuery, I had to load the JSON file with good ol' XML­HttpRe­quest (I can't wait for the fetch API to become more mainstream!), and then I call a function to parse the results (or show an error if that's the case). This function doesn't do much, except it ini­tial­izes the index, schedules the indexing and adds a listener for the input tag to process user input. The index is ini­tial­ized with the fields that we will want to search on and what should be the reference of a document. We boost the importance of the title and tags fields.

function parseResults(response) {
    docs = response;
    tasks = response.slice(); // Make copy of document list
    // Create index
    index = lunr(function(){
            // Boost increases the importance of words found in this field
            this.field('content');
            this.field('url');
            this.field('title', 5);
            this.field('tags', 10);
            this.field('date');
            // the id
            this.ref('id');
    });
    // Schedule background indexing
    scheduleIndexing();
    // Add search handler
    document.getElementById('search').addEventListener("input", search)
};

Indexing takes about 2-4 seconds, so if we were to do it here, it would block the UI thread, resulting in a janky UI. So, we use the shiny new API of re­questI­dle­Call­back, which allows us to do it in the background, during idle moments. Because this is also not well supported yet (coughSafaricough), I give an al­ter­na­tive of just doing the indexing in the main thread, by mocking the API. To add a document to the index, you just have to call the add function with a JSON object rep­re­sent­ing the document.

function scheduleIndexing() {
    if ('requestIdleCallback' in window) {
        requestIdleCallback(indexInBackground);
    } else { // Mock the API and do the indexing in the main thread
        indexInBackground({timeRemaining: function() { return 1}});
    }
    function indexInBackground(deadline) {
        // Run next task if possible
        while (deadline.timeRemaining() > 0 && tasks.length > 0) {
            var entry = tasks.pop();
            index.add({
                url: entry.url,
                date: entry.date,
                title: entry.title,
                content: entry.content,
                tags: entry.tags,
                id: tasks.length
            });
        }
        // Schedule further tasks if necessary
        if (tasks.length > 0) {
            requestIdleCallback(indexInBackground);
        } else {
            if (document.getElementById('search').value != '') {
                search(); 
            }
        }
    }
}

This re­questI­dle­Call­back API works by taking a function which receives a deadline object, which tells you how much more time you have left. You are supposed to return before the time expires. Because of this, it's good only for tasks that can be split into small chunks. Indexing is a perfect example: indexing one document takes very little, on the order of 5 ms, and when we detect we ran out of time, we stop and request another time slot. When the browser "takes a break", it will schedule us again. For more details on the API, read this post. We do this as long as there are tasks left. When we finished the indexing, we check to see if the user has written anything in the checkbox and trigger a search if that's the case.

function search() {
    var query = input.value;
    if (query.trim().length >= 3) {
        var result = index.search(query); // Search for it
        // Output it
        if (result.length === 0) {
            resultdiv.innerHTML = "<li>No result found! :(</li>";
        } else {
            resultdiv.innerHTML = '';
            for (var i=0; i < result.length; i++) {
                var ref = result[i].ref;
                var doc = docs[ref];
                var li = document.createElement("li");
                li.innerHTML = '<a href="' + doc.url + '">' + doc.title + '</a>';
                resultdiv.appendChild(li);
                if (i > 30) {
                    break;
                }
            }
        }
    } else {
        resultdiv.innerHTML = "<li>Query is too short.</li>";
    }
};

Searching is not too com­pli­cat­ed. Too eliminate the case where there are too many results to be useful, we search only when there are at least three letters in the input form. We then loop over the results and add to the emptied list a link to their URL, with a title. We also limit the list to 30 items. Pagination could be added, but I don't think it's that useful to look at the long tail. lunr.js returns results sorted by score, so it's fine to cut off like this.

Some posts that have inspired me: