Github link:

- https://github.com/tealtadpole/tt-boyer-moore-js
- https://github.com/tealtadpole/tt-naive-search-js

I need a string search algorithm in one of my frontend project. String.protoype.indexOf( ) is of course good. It also accept start index as a second parameter. I made a simple function with indexOf( ) that return array of indexes. While googling about this topic, I found some interesting algorithms, which naive string search and boyer moore. So, I decided to try implementing both out of curiosity.

**Naive String Search**

Being the simplest string matching algorithm, I can’t find any research paper who developed the algorithm. Probably this method is the default method every freshmen at CS major use when they’re asked to implement a string search algorithm.

The algorithm goes checking the full **text **string and searched **pattern**, from left to right. Two pointer is used. The first pointer used to store initial pattern index, sliding from left to right. While the second pointer is used to check whether the current pattern position, matching the text. Following image explain the algorithm.

After implementing the algorithm, I wanted to create a unit test for it. But, to make sure that my algorithm works perfectly, I need a ‘valid’ unit test, which then reminds me of JavaScript’s **indexOf( )** function. If someone developed **indexOf( )** function, meaning that it should have a good unit test ðŸ™‚

After did further googling, I stumbled on **Google’s Chromium V8 engine **repository **here**. Seems it uses a custom testing script called **mjsunit**. But since I’m more familiar with **jest**, so I had to modified the test case file for **indexOf( )** into **jest **version. I uploaded my source code in my **github** **here**. I keep the original test case file header intact from the V8’s repository.

**Boyer Moore Algorithm**

Boyer Moore algorithm was developed by Robert S. Boyer and J Strother Moore in 1977. It was one of the most famous string search algorithm so far. A google scholar search landed me into a pdf file hosted somewhere in the internet. It pre-process the pattern, create two tables, then try to match the pattern from **right** to **left**. If it’s a match for all characters, then it will return the index, otherwise, it will **slide the index to the right based on the preprocessed pattern table**. The pre-process phase produces two tables, **firstTable** with pattern length size, and a **secondTable** with **all possible characters **size. Basically, both tables indicates how ‘*far*‘ the pattern should ‘*jump*‘ whenever it encounter a mismatch. I tried to visualize in the following image.

In my example here, the pattern checking start from the **rightmost **character of the pattern, sliding to left (starting from ‘**e**‘, then goes to ‘**l**‘, ‘**o**‘, etc.). But since **‘e’** is not a match with **‘d’**, so it slides pattern to the Max value from two values (1) firstTable [pattern.length – 1 – numCharChecked] which equal to **1** and (2) secondTable with the char ‘**d**‘ which equal to **4**. Math max (**1**, **4**) returns **4**, so the pattern slides **4** characters to the right. Then try matching the pattern again (from right to left). Because it found a perfect match, so it returns the current index.

Here is my **boyer moore javascript **implementation at my **Github**. Again, the algorithm’s unit test is taken from **V8’s unit test**.

While implementing Boyer Moore, I realized that this algorithm requires to create an array containing all possible alphabet. Such algorithm might be possible in 1977, but with modern character encoding standard, like **Unicode**, this won’t be enough. In my implementation, I use **charCodeAt( )** JavaScript function that unfortunately only have maximum returned value of **65535 **(from 2^16 – 1). Which is… very few. The latest **Unicode 13.0.0** has a whooping possible 143,859 characters, while actually Unicode standard designed to have at max **1,111,998 **characters (* ðŸ˜®* ). So, that’s why there are many optimization for this search function. Such as the famous **Boyer Moore Horspool**, it works similar with Boyer Moore, but with much simpler code. Probably I will implement this one next time.

~ tt ~