Xah Lee, 2008-07
The following is originally a post to newsgroup “comp.lang.lisp”.
Fundamental Problems of Lisp — Prelude
Since i'm writing... n i wrote a lot in the past on diverse issues scattered in various essays... i'll sum up some fundamental problems of lisp:
• Lisp relies on a regular nested syntax. However, the lisp syntax has several irregularities, that reduces such syntax's power and confuses the language semantics. (i.e. those «' # ; ` ,» chars.) (and whenever i tried to get some technical answer about this to clarify at least my own understanding, protective lispers muck and obfuscate the truth. (its likely that few lispers actually completely understand all of lisp's syntactical irregularities.))
• Lisp's irregular syntax those «' # ; ` ,» things, are practically confusing and made the lang less powerful. i.e. in elisp, there's no form of comment in matching delimiters (and consequently no nested comment). The reliance on EOL chars as part of the syntax semantics is one of the major detriment of the power of pure nested syntax.
• Lisp relies on a regular nested syntax. Because of such regularity of the syntax, it allows transformation of the source code by a simple lexical scan. This has powerful ramification. (practically, lispers realized just one: the lisp macros) For example, since the syntax is regular, one could easily have alternative, easier to read syntaxes as a layer. (the concept is somewhat known in early lisp as M-expression) Mathematica took this advantage (probably independent of lisp's influence), so that you really have easy to read syntax, yet fully retain the regular form advantages. In lisp history, such layer been done and tried here and there in various forms or langs ( CGOL↗, Dylan↗), but never caught on due to largely social happenings. Part of these reasons are political. (thanks to, in part, sensitive and ignorant lispers here that stops proper discussion of it.)
• One of the advantage of pure fully functional syntax is that a programer should never need to format his source code (i.e. pressing tabs, returns) in coding, and save the hundreds hours of labor, guides, tutorials, advices, publications, editor tools, on what's known as “coding style convention”, because the editor can reformat the source code on the fly based on a simple lexical scan. This is done in Mathematica version 3 (~1996). In coding elisp, i'm pained to no ends by the manual process of formatting lisp code. The lisp community, established a particular way of formatting lisp code as exhibited in emacs's lisp modes and written guides of conventions. The recognization of such convention further erode any possibility and awareness of automatic, uniform, universal, formatting. (e.g. the uniform and universal part of advantage is exhibited by Python)
• Lisp relies on a regular nested syntax. One of the power of such pure syntax is that you could build up layers on top of it, so that the source code can function as markup of conventional mathematical notations (i.e. MathML) and or as a word-processing-like file that can contain structures, images (e.g. Microsoft Office Open XML↗), yet lose practical nothing. This is done in Mathematica in ~1996 with release of Mathematica version 3. (e.g. think of XML, its uniform nested syntax, its diverse use as a markup lang, then, some people are adding computational semantics to it now (i.e. a computer language with syntax of xml. e.g. O:XML↗). You can think of Mathematica going the other way, by starting with a computer lang with a regular nested syntax, then add new but inert keywords to it with markup semantics. The compiler will just treat these inert keywords like comment syntax when doing computation. When the source code is read by a editor, the editor takes the markup keywords for structural or stylitic representation, with title, chapter heading, tables, images, animations, hyperlinks, typeset math expression (e.g. think of MathML↗) etc. The non-marked-up keywords are shown as one-dimentional textual source code just like source code is normally shown is most languages.)
Further readings:
The above are some of the damages lispers has done to themselfs, with respect to the nested syntax. The other fundamental problem in the language is the cons business.
• Lisp at core is based on functional programing on lists. This is comparatively a powerful paradigm. However, for historical reasons, lisp's list is based on the hardware concept of “cons” cell. From a mathematical point of view, what this means is that lisp's lists is limited to a max of 2 elements. If you want a longer list, you must nest it and interpret it in a special way. (i.e. effectively creating a mini-protocol of nesting lists, known as proper lists.) The cons fundamentally crippled the development of list processing.
Lisp being historically based the cons for like 2 or 3 decades. The cons (and cdr, car, caadar etc) are fundamentally rooted in the lisp langs, is thus not something that can be easily mended. Quite unfortunate. However, this situation could be improved. (by, for example, exposing only higher-level list manipulation functions in all newer literature, or even mark cons related functions as obsolete) But, whenever i discuss this, you can see that the lisper slaves here, their mentality, prevent any possible improvement. (most do not even understand what's the issue. (in general, this is because, lispers usually do not have serious experience or investment in other functional langs, such as Mathematica, Haskell, etc.))
One of the myth that is quickly injected into budding lispers, is that cons are powerful. Powerful my ass. It is powerful in the sense any assembly lang is powerful. Lisp's cons is perhaps the greatest fuck up in the history of computer languages.
Further readings:
2008-07-15, Addendum:
In lisp communities, it is widely recognized that lisp's regular syntax has the property that “code is data; data is code”. However, there was never, clear, explanation what this means exactly.
Here's what it means exactly, in one concise sentence:
A regular nested syntax, makes it easy (possible) to do source code transformations with a lexical scan. (think of XML. See XML transformation language↗ )
The consequence of the ability to do such source code transformation, are many as i discussed above.
Among lispers, often when people ask what it means about lisp's “code is data; data is code”, usually they are unable to explain it exactly, because they don't possess mathematician's analytic ability. Here i'll repeat, and please remember it.
A regular nested syntax, makes it possible to do systematic source code transformation in a way that's also trivial to implement. There are important consequences. Some of the examples are: lisp's macros, structural pattern matching, term rewriting, source code dual functioning as markup lang for presentation (e.g. word-processor, Mathematica's “Notebook”), mathematical markup (e.g. MathML), in general all benefits of XML, on-the-fly source code “formatting” for a automatic, uniform, universal, source code display (aka “coding style convention”).
(Note the phrase “automatic, uniform, universal, source code display”. The “uniform” and “universal” aspect is a well-known propery of Python lang's source code. The reason Python's source code has such uniform and universal display formatting is because it is worked into the language's sematics. i.e. the semantics of the code depends on the formatting (i.e. where you press tabs and returns). But also note, Python's source code is not and cannot be automatically formatted, precisely because the semantics and formatting is tied together. A strictly regular nested syntax, such as Mathematica's, can, and is done, since 1996. Lisp, despite its bunch of irregularities such as thos «` ' # ; ,» chars, i think it still can have a automatic formatting at least to a large, practical, extent. (one of my future elisp project would be this) Once lisp has automatic on-the-fly formatting (think of emacs's fill-sexp, auto-fill-sexp), then lisp code will achieve uniform and universal source code formatting display. By “uniform”, it means there is one simple, mechanical, heuristic, to determine a cannonical way to format any lisp code for human-readible display. By “universal” is meant that all programers, will recognize and habituated with this one cannonical way, as a standard. (they can of course set their editor to display it in other ways) The advantage of having a automatic, uniform, universal, source code display for a language is that, first of all, it gets rids of the hundreds of hours on the labor, tools, guides, arguments, about how one should format his code. (this is partly the situation of Python already) But more importantly, by having such properties, it will actually have a impact on how programer codes in the language. i.e. what kind of idioms they choose to use, what type of comments they put in code, and where. This, further influences the evolution of the language, i.e. what kind of functions or features are added to the lang. For some detail on this aspect, see: The Harm of Manual Code Formating )
* * *
The other point in my previous article discussed lisp's cons problems. Here's what it means in a concise, mathematical perspective:
From a mathematical point of view, what this means is that lisp's lists is limited to a max of 2 elements. If you want a longer list, you must nest it and interpret it in a special way. (i.e. effectively creating a mini-protocol of nesting lists, known as proper lists.) The cons fundamentally crippled the development of list processing.
* * *
The cons issue bugged me for 10 years, since i first tried to learn scheme in ~1999. I've never worked with lisp (other than academic reading) until in recent years with emacs lisp since 2005. Several times i tried to explain to lispers this problem, but usually with paragraphs and paragraphs of descriptions, analogies, examples, frustrated by how hard it is to convey such a simple flaw (aided by their blind, deep-seated, lisp fanaticsm). Yesterday it hit on me. The above mathematical aspect of lisp's cons is the first time i formulated the cons problem concisely. (my previous verbose explanation here: Lisp's List Problem )
Also, about the meaning of lisp's regular syntax said to be “data is code; code is data”... it was also my first time seeing it clearly. (in my 10 years of comp.lang.lisp reading, you read laregly just the male nature.)
If you don't like cons, Common Lisp has arrays and hashmaps, too.
Suppose there's a lang called gisp. In gisp, there's cons but also fons. Fons are just like cons except it has 3 cells with car, cbr, cdr. Now, gisp is a old lang, the fons are deeply rooted in the lang. Every some 100 lines of code you'll see a use of fons and car, cbr, cdr, or any one of the caar, cdar, cbbar, cdbbar, etc. You got annoyed by this. You as a critic, complains that fons is bad. But then some gisp fan retort by saying: “If you don't like fons, gisp has cons, too.”.
You see, by “having something too”, does not solve the problem of polution. Sure, you can use just cons in gisp, but every lib or other's code you encounter, there's a invasion of fons with its cbbar, cdbbar, cbbbr. The problem created by fons cannot be solved by “having cons too”.
I like the cons concept. Even in functional languages like Haskell it is popular, e.g. when matching in the form of (x:xs), which is the same like car/cdr in Lisp.
Languages that has a list datatype and First, Rest functions do not mean it has lisp's cons problem.
One part of the cons problem in lisp is that it forces programer to think of list in a low-level nested of 2-item construction, with explicit functions like “cons”, “car”, “cdr”, “caar”, “cadr”, “cdar”, “cddr”, “caaar”, “caadr” etc.
In other langs, the programer is not forced to think of nested 2-items.
The other problem with lisp's cons, is that it hinders any development of tree data structure. For example, one might write a function that extracts the leafs of a tree. But due to lisp's list made of cons, there is a different interpretations of what's considered a leaf. Similarly, binary tree in lisp can be implemented either using cons natively, or use so-called “proper list” that is implemented on top of cons. Worse, any proper list can be mixed with improper list. So, you can have a list of cons, or cons of lists, cons of cons, list of lists, or any mix. The overall effect of the cons is that it prevents lisp to have a uniform view of tree structure, with the result that development of functions that work on tree are inconsistent, few, or otherwise hampered.
Now, a little speculation.
We might wonder, why lisp has the cons problem and was never addressed?
I guess at the time, 1960s and 1970s, the very fact that you could have a concept like a list in a lang and manipulate it as a unit, is extremely advaced at the time. The list, being built up by hardware cons, is just something that has to be done.
Having data as a single manipulatable object (list) didn't really become popular until the 1990s. (notably by Perl) And today, it is THE most important feature of highlevel languages (perl, python, php, javascript... of the langs i'm expert of).
The lisp's cons, as a underlying primitive that builds lists, even though a bit cumbersome, but works just fine when list structure are simple. Even today, with all the perl, python, php, javascript etc langs that deal with lists, vast majority of list usage is just simple flat list, sometimes 2 level of nesting (list of list, list of hash, hash of list). 3 levels of nesting is seldomly used unless its 3d matrices used mostly in computer graphics or linear algebra applications. Greater than 3 level is almost never seen. Systematic manipulation and exploitation of nested list, such as mapping to leafs, to particular level, transposition by permutation on level, or list structure pattern matching in today's functional langs, etc is hardly ever to be seen. (except in Mathematica, APL)
So, in general, when you just deal with simple lists, the cumbersomeness in using cons, car, cdr, caardr etc for list doesn't really suface. Further, the cons is fundamentally rooted in the language. It's not something that can be easily changed except creating a new language. When there is a specific need in a application, there is a haphazard collection of functions that deal with lists at a higher level.
Today, with list manipulation being mainstream, especially with the uprising of many functional langs, the lisp's cons historical baggage becomes more apparent and painful.
Mathematica today sells for over 2 thousands dollars. Its sales record, throughout its history, is probably more than ALL commercial lisps combined. Such a illuminating social fact, in not known in lisp communities. These fuckheads thinks that lisp is more popular.
10 years ago, in the dot com days (~1998), where Java, Javascript, Perl are screaming the rounds. It was my opinion, that lisp will inevitably become popular in the future, simply due to its inherent superior design, simplicity, flexibility, power, whatever its existing problems may be. Now i don't think that'll ever happen as is. Because, due to the tremendous technological advances, in particular in communication (i.e. the internet and its consequences, e.g. Wikipedia, youtube, youporn, social networks sites, blogs, Instant chat, etc) computer languages are proliferating like never before. (e.g. Erlang, OCaml, Haskell, PHP, Ruby, C#, F#, Perl6, Arc, NewLisp, Scala, Groovy, Lua, Q, Qz, Mercury, Scratch, Flash, Processing, (see Proliferation of Computing Languages) ..., helped by the abundance of tools, libraries, parsers, existence of infrastructures) New langs, will either have most advantages of lisps, and or with modern libraries and idioms that better fits today's need. I see that, perhaps in the next decade, as communication technologies further hurl us forward, the proliferation of langs will reduce to a trend of consolidation (e.g. fueled by virtual machines such as Microsoft's “.NET”. (and, btw, the breaking of programer's social taboo of cross communication of computing languages, led by Xah Lee)).
There is one slight hope for Lisp the lang as we understood it, and that is Emacs Lisp. Due to, it being deeply rooted in the industry as a powerful text editor, and the embedded lisp have major practical impact in getting people to know lisp and also actually use it for practical need. The installed base of emacs lisp system is perhaps 100 or 1000 times more than the number of all Common Lisp and Scheme Lisp's installed base. (this satisfying prospect is however mar'd by the tech geekers. e.g. the Common Lispers, and Scheme Lispers, perennially inject snide and sneer at emacs lisp that harms its progress, while the fucking emacs priests, want to coffin emacs in its 1980s UI and terminologies. (see The Modernization of Emacs, Text Editors Popularity.))
