Naive Mid-Sentence Processing
The first program I ever tried to create was an original process for making sense of a sentence of natural language as it was being spoken or written. With just a year of computer science classes under my belt and some experience tutoring the same material, this project was ambitious enough that I was probably doomed to fail. However, I would learn a lot in the process, not only about programming and working in a group but also about what kinds of projects I would like to work on going forward.
The idea for the project came from the OpenCog Foundation, who graciously offered to mentor me under the auspices of Google’s Summer of Code 2015. As they pointed out, there had already existed algorithms for producing syntactic parses of complete sentences, such as the Link Grammar Parser. What makes the Link Grammar Parser attractive to OpenCog, as one of its leaders Dr. Ben Goertzel once told me over Slack, is that the words have direct relations of dependency or hierarchy with one another, resulting in simpler parses (fewer nodes). In formal linguistic terminology, it is a syntax parser based on dependency grammar, rather than the phrase structure grammar that most K-12 grammar textbooks and linguists like Noam Chomsky discuss, wherein not only individual words but also groupings of words serving a more general purpose in the sentence may count as nodes in the parse.
To see it work, you can parse a complete sentence here. To understand what the relations in a given linkage mean, there is also a reference dictionary.
In the hopes of developing an algorithm to predict syntactic parses mid-sentence on the basis of semantics as a real human brain does, I spent the first half of my summer reading somewhat deeply into Word Grammar, a linguistic system describing on a theoretical level the relationships of many kinds of linguistic terms with each other (including words, spellings, pronunciations, syntactic links, and semantic concepts). I also looked into tweaking the primary algorithm to prioritize checking linkages within a limited portion of a sentence, such as a phrase.
What I didn’t realize at the time was that the Link Grammar Parser is fast enough that it may have been first worth implementing a much simpler and more naive form of mid-sentence parsing – rerunning the parser for each input action. So, as a tribute to my as-of-yet failure to improve upon the mid-sentence processing problem, I’ve implemented a small ruby script for the command line that repeatedly runs the link grammar parser every time a character is entered or deleted.
To run this program on your own Mac or Linux terminal, you’ll need to install ruby
, the command line tool gem
, and the linkparser
gem. On Debian/Ubuntu/derivatives, that should look like:
sudo apt-get install ruby gem
sudo gem install linkparser
To use sudo, you will need to enter your root-level password.
Then, you can use git clone
from my Github Repository or copy the code below into a file, and run it like so, assuming the file’s name is mid-sentence.rb:
ruby mid-sentence.rb
Hit Escape to exit the program.