I need a string search algorithm in one of my frontend project.is of course good. It also accept start index as a second parameter. I made a simple function with indexOf( ) that return array of indexes. While googling about this topic, I found some interesting algorithms, which naive string search and boyer moore. So, I decided to try implementing both out of curiosity.
Naive String Search
Being the simplest string matching algorithm, I can’t find any research paper who developed the algorithm. Probably this method is the default method every freshmen at CS major use when they’re asked to implement a string search algorithm.
The algorithm goes checking the full text string and searched pattern, from left to right. Two pointer is used. The first pointer used to store initial pattern index, sliding from left to right. While the second pointer is used to check whether the current pattern position, matching the text. Following image explain the algorithm.
After did further googling, I stumbled on Google’s Chromium V8 engine repository here. Seems it uses a custom testing script called mjsunit. But since I’m more familiar with jest, so I had to modified the test case file for indexOf( ) into jest version. I uploaded my source code in my github here. I keep the original test case file header intact from the V8’s repository.
Boyer Moore Algorithm
Boyer Moore algorithm was developed by Robert S. Boyer and J Strother Moore in 1977. It was one of the most famous string search algorithm so far. A google scholar search landed me into a pdf file hosted somewhere in the internet. It pre-process the pattern, create two tables, then try to match the pattern from right to left. If it’s a match for all characters, then it will return the index, otherwise, it will slide the index to the right based on the preprocessed pattern table. The pre-process phase produces two tables, firstTable with pattern length size, and a secondTable with all possible characters size. Basically, both tables indicates how ‘far‘ the pattern should ‘jump‘ whenever it encounter a mismatch. I tried to visualize in the following image.
In my example here, the pattern checking start from the rightmost character of the pattern, sliding to left (starting from ‘e‘, then goes to ‘l‘, ‘o‘, etc.). But since ‘e’ is not a match with ‘d’, so it slides pattern to the Max value from two values (1) firstTable [pattern.length – 1 – numCharChecked] which equal to 1 and (2) secondTable with the char ‘d‘ which equal to 4. Math max (1, 4) returns 4, so the pattern slides 4 characters to the right. Then try matching the pattern again (from right to left). Because it found a perfect match, so it returns the current index.
~ tt ~