Skip to content

Naive String and Boyer Moore Javascript Implementation

Github link:


I need a string search algorithm in one of my frontend project. String.protoype.indexOf( ) is of course good. It also accept start index as a second parameter. I made a simple function with indexOf( ) that return array of indexes. While googling about this topic, I found some interesting algorithms, which naive string search and boyer moore. So, I decided to try implementing both out of curiosity.

Naive String Search

Being the simplest string matching algorithm, I can’t find any research paper who developed the algorithm. Probably this method is the default method every freshmen at CS major use when they’re asked to implement a string search algorithm.

The algorithm goes checking the full text string and searched pattern, from left to right. Two pointer is used. The first pointer used to store initial pattern index, sliding from left to right. While the second pointer is used to check whether the current pattern position, matching the text. Following image explain the algorithm.

Naive string search algorithm

After implementing the algorithm, I wanted to create a unit test for it. But, to make sure that my algorithm works perfectly, I need a ‘valid’ unit test, which then reminds me of JavaScript’s indexOf( ) function. If someone developed indexOf( ) function, meaning that it should have a good unit test 🙂

After did further googling, I stumbled on Google’s Chromium V8 engine repository here. Seems it uses a custom testing script called mjsunit. But since I’m more familiar with jest, so I had to modified the test case file for indexOf( ) into jest version. I uploaded my source code in my github here. I keep the original test case file header intact from the V8’s repository.

Boyer Moore Algorithm

Boyer Moore algorithm was developed by Robert S. Boyer and J Strother Moore in 1977. It was one of the most famous string search algorithm so far. A google scholar search landed me into a pdf file hosted somewhere in the internet. It pre-process the pattern, create two tables, then try to match the pattern from right to left. If it’s a match for all characters, then it will return the index, otherwise, it will slide the index to the right based on the preprocessed pattern table. The pre-process phase produces two tables, firstTable with pattern length size, and a secondTable with all possible characters size. Basically, both tables indicates how ‘far‘ the pattern should ‘jump‘ whenever it encounter a mismatch. I tried to visualize in the following image.

Boyer Moore string search algorithm

In my example here, the pattern checking start from the rightmost character of the pattern, sliding to left (starting from ‘e‘, then goes to ‘l‘, ‘o‘, etc.). But since ‘e’ is not a match with ‘d’, so it slides pattern to the Max value from two values (1) firstTable [pattern.length – 1 – numCharChecked] which equal to 1 and (2) secondTable with the char ‘d‘ which equal to 4. Math max (1, 4) returns 4, so the pattern slides 4 characters to the right. Then try matching the pattern again (from right to left). Because it found a perfect match, so it returns the current index.

Here is my boyer moore javascript implementation at my Github. Again, the algorithm’s unit test is taken from V8’s unit test.

While implementing Boyer Moore, I realized that this algorithm requires to create an array containing all possible alphabet. Such algorithm might be possible in 1977, but with modern character encoding standard, like Unicode, this won’t be enough. In my implementation, I use charCodeAt( ) JavaScript function that unfortunately only have maximum returned value of 65535 (from 2^16 – 1). Which is… very few. The latest Unicode 13.0.0 has a whooping possible 143,859 characters, while actually Unicode standard designed to have at max 1,111,998 characters ( 😮 ). So, that’s why there are many optimization for this search function. Such as the famous Boyer Moore Horspool, it works similar with Boyer Moore, but with much simpler code. Probably I will implement this one next time.


~ tt ~

Published inTech