Friday, October 11, 2013

Morse Palindromes?

There was a fun post today on Metafilter about the longest palindrome in Morse code being "intransigence" (along with other odd facts).

This jumped out to me for a few reasons:

So, maybe we can confirm that "intransigence" really is the longest palindrome!


The basic definition for a "palindrome" is a word that "reads the same backward as forward". Given that, it's easy to build a first version that checks this:

: palindrome? ( str -- ? ) dup reverse = ;

However, in our tutorial we suggest building a more robust version that normalizes the input to handle palindromes such as "A man, a plan, a canal: Panama.":

: normalize ( str -- str' ) [ Letter? ] filter >lower ;

: palindrome? ( str -- ? ) normalize dup reverse = ;


For our morse code palindrome detector, we need to convert our string to morse code, removing extra spaces that the morse code vocabulary adds between letters, before checking for palindrome-ness:

: normalize-morse ( str -- str' )
    normalize >morse [ blank? not ] filter ;

: morse-palindrome? ( str -- ? )
    normalize-morse dup reverse = ;

The longest word in the dictionary that is a morse palindrome is:

IN: scratchpad "/usr/share/dict/words" ascii file-lines
               [ morse-palindrome? ] filter longest .

Wait, that isn't "intransigence"!

Well, no, but "incalescence" has the same number of letters (13) and happens to be slightly longer in morse code. So maybe it's a tie, or maybe they should update their trivia!

In fact, there are several "longest" morse palindromes:

IN: scratchpad "/usr/share/dict/words" ascii file-lines
               [ morse-palindrome? ] filter all-longest .

P.S., it looks like "Raphaelesque" might be the longest morse palindrome by morse code length.

P.P.S., for some reason /usr/share/dict/words doesn't contain the word "intransigence".


John Daily said...

Hate to be the bearer of bad news, but each of your alternatives to intransigence is actually only 12 characters long (vs 13 for intransigence).

Thanks for the great blog. Speaking as someone still struggling to get a toehold with Factor, I appreciate it greatly.

John Daily said...

Speaking of toeholds, the example fought me nearly to a standstill, so for any other beginners who might stumble across this...

Listener prompts me to add ascii to the search path while defining the words, which is straightforward enough.

Once when working through the example I failed to "USE: morse" before defining normalize (and Listener couldn't seem to find ">morse" to prompt me to add the vocabulary) and deferring turned out to be a bad idea. Restarting Listener wasn't enough to get it to recognize ">morse" from the vocabulary, even after a restart it was listed as deferred when I ran help, finally had to quit out of Factor entirely. Is that expected?

When I try to run the filter against /usr/share/dict/words, however, the vocabulary it suggested for longest was compiler.codegen.gc-maps, which I suspected was not the vocabulary you're using, because I saw the resulting value as the length (12) instead of the word identified as the longest.

Listener also couldn't identify a matching word all-longest, which the docs site indicated is in the sequences vocabulary.

Finally decided that 0.94 must be out of date, and as it turns out, longest and all-longest must have been added to sequences between 0.94 and the current version, 0.96.

John Daily said...

Blast, well, turns out that all-longest isn't found in sequences; it's actually in sequences.extras, which Listener doesn't suggest - had to re-check the docs site.

The joys of a <1.0 language.

mrjbq7 said...

@John Daily, that's so funny. I must have blinked and double-counted a letter! Thanks for pointing that out.

I'm glad you updated and are using a later version, there have been a number of improvements made since 0.94.

The "extras" vocabularies have been places to experiment with new words that may or may not be useful to everyone or properly documented. In the case of the sequences.extras vocabulary, there are several words that we should probably "promote" to the more standard "sequences" library.

By the way, your issue with "defer" can be best explained as a small gap of understanding. When you DEFER: a word, you are actually creating a word with a "fake" definition that just throws an error if called. It allows code to use a word before it is defined. If you end up having a deferred word that shadows another, you can just "FORGET:" it in the listener to remove the deferred word.