Key Thread from “As We May Think”

When I wrote my first Nugget post, I identified what I there termed “critical portions”. I later realized that because of my bias and focus on the “tool” aspect of the paper, I was seeing just one thread within the paper. There simultaneously exist additional threads, each focused differently, each with its own thesis and supporting statements. Different phrases and sentences comprise the “critical portions” of each of those separate threads. I now see (almost?) ALL the words in the paper as critical and significant, but arranged in different subsets according to the threads they support.

Nonetheless, the portions I originally identified are still particularly significant to me, as they comprise the majority of the tool-requirements thread. I planned to extract and post these together, and comment further. So here goes…

I originally outlined:

I believe the first critical portion is in section 1, paragraphs 3-5. The next is the first sentence of section 2. Next the first three sentences of section 4, and the fourth paragraph of section 4. The second and third paragraphs of section 5 are also key. Then the first two paragraphs of section 6, and the third sentence of the third paragraph: “Selection by association, rather than indexing, may yet be mechanized.” Section 7 describes key aspects of usage, and a couple paragraphs in section 8, perhaps the third and ninth, sum up.

Substituting the sentences from the paper, this yields:

[1] There is a growing mountain of research. But there is increased evidence that we are being bogged down today as specialization extends. The investigator is staggered by the findings and conclusions of thousands of other workers—conclusions which he cannot find time to grasp, much less to remember, as they appear. Yet specialization becomes increasingly necessary for progress, and the effort to bridge between disciplines is correspondingly superficial.

Professionally our methods of transmitting and reviewing the results of research are generations old and by now are totally inadequate for their purpose. If the aggregate time spent in writing scholarly works and in reading them could be evaluated, the ratio between these amounts of time might well be startling. Those who conscientiously attempt to keep abreast of current thought, even in restricted fields, by close and continuous reading might well shy away from an examination calculated to show how much of the previous month’s efforts could be produced on call. Mendel’s concept of the laws of genetics was lost to the world for a generation because his publication did not reach the few who were capable of grasping and extending it; and this sort of catastrophe is undoubtedly being repeated all about us, as truly significant attainments become lost in the mass of the inconsequential.

The difficulty seems to be, not so much that we publish unduly in view of the extent and variety of present day interests, but rather that publication has been extended far beyond our present ability to make real use of the record. The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.

[2] A record if it is to be useful to science, must be continuously extended, it must be stored, and above all it must be consulted.

[4] The repetitive processes of thought are not confined however, to matters of arithmetic and statistics. In fact, every time one combines and records facts in accordance with established logical processes, the creative aspect of thinking is concerned only with the selection of the data and the process to be employed and the manipulation thereafter is repetitive in nature and hence a fit matter to be relegated to the machine. Not so much has been done along these lines,beyond the bounds of arithmetic, as might be done, primarily because of the economics of the situation.

It is a far cry from the abacus to the modern keyboard accounting machine. It will be an equal step to the arithmetical machine of the future. But even this new machine will not take the scientist where he needs to go. Relief must be secured from laborious detailed manipulation of higher mathematics as well, if the users of it are to free their brains for something more than repetitive detailed transformations in accordance with established rules. A mathematician is not a man who can readily manipulate figures; often he cannot. He is not even a man who can readily perform the transformations of equations by the use of calculus. He is primarily an individual who is skilled in the use of symbolic logic on a high plane, and especially he is a man of intuitive judgment in the choice of the manipulative processes he employs.

[5] Logic can become enormously difficult, and it would undoubtedly be well to produce more assurance in its use. The machines for higher analysis have usually been equation solvers. Ideas are beginning to appear for equation transformers, which will rearrange the relationship expressed by an equation in accordance with strict and rather advanced logic. Progress is inhibited by the exceedingly crude way in which mathematicians express their relationships. They employ a symbolism which grew like Topsy and has little consistency; a strange fact in that most logical field.

A new symbolism, probably positional, must apparently precede the reduction of mathematical transformations to machine processes. Then, on beyond the strict logic of the mathematician, lies the application of logic in everyday affairs. We may some day click off arguments on a machine with the same assurance that we now enter sales on a cash register. But the machine of logic will not look like a cash register, even of the streamlined model.

[6]The real heart of the matter of selection, however, goes deeper than a lag in the adoption of mechanisms by libraries, or a lack of development of devices for their use. Our ineptitude in getting at the record is largely caused by the artificiality of systems of indexing. When data of any sort are placed in storage, they are filed alphabetically or numerically, and information is found (when it is) by tracing it down from subclass to subclass. It can be in only one place, unless duplicates are used; one has to have rules as to which path will locate it, and the rules are cumbersome. Having found one item, moreover, one has to emerge from the system and re-enter on a new path.

The human mind does not work that way. It operates by association. With one item in its grasp, it snaps instantly to the next that is suggested by the association of thoughts, in accordance with some intricate web of trails carried by the cells of the brain. It has other characteristics, of course; trails that are not frequently followed are prone to fade, items are not fully permanent, memory is transitory. Yet the speed of action, the intricacy of trails, the detail of mental pictures, is awe-inspiring beyond all else in nature.

Selection by association, rather than indexing, may yet be mechanized.

[7]All this is conventional, except for the projection forward of present-day mechanisms and gadgetry. It affords an immediate step, however, to associative indexing, the basic idea of which is a provision whereby any item may be caused at will to select immediately and automatically another. This is the essential feature of the memex. The process of tying two items together is the important thing.

When the user is building a trail, he names it, inserts the name in his code book, and taps it out on his keyboard. Before him are the two items to be joined, projected onto adjacent viewing positions. At the bottom of each there are a number of blank code spaces, and a pointer is set to indicate one of these on each item. The user taps a single key, and the items are permanently joined. In each code space appears the code word. Out of view, but also in the code space, is inserted a set of dots for photocell viewing; and on each item these dots by their positions designate the index number of the other item.

Thereafter, at any time, when one of these items is in view, the other can be instantly recalled merely by tapping a button below the corresponding code space. Moreover, when numerous items have been thus joined together to form a trail, they can be reviewed in turn, rapidly or slowly, by deflecting a lever like that used for turning the pages of a book. It is exactly as though the physical items had been gathered together from widely separated sources and bound together to form a new book. It is more than this, for any item can be joined into numerous trails.

The owner of the memex, let us say, is interested in the origin and properties of the bow and arrow. Specifically he is studying why the short Turkish bow was apparently superior to the English long bow in the skirmishes of the Crusades. He has dozens of possibly pertinent books and articles in his memex. First he runs through an encyclopedia, finds an interesting but sketchy article, leaves it projected. Next, in a history, he finds another pertinent item, and ties the two together. Thus he goes, building a trail of many items. Occasionally he inserts a comment of his own, either linking it into the main trail or joining it by a side trail to a particular item. When it becomes evident that the elastic properties of available materials had a great deal to do with the bow, he branches off on a side trail which takes him through textbooks on elasticity and tables of physical constants. He inserts a page of longhand analysis of his own. Thus he builds a trail of his interest through the maze of materials available to him.

And his trails do not fade. Several years later, his talk with a friend turns to the queer ways in which a people resist innovations, even of vital interest. He has an example, in the fact that the outraged Europeans still failed to adopt the Turkish bow. In fact he has a trail on it. A touch brings up the code book. Tapping a few keys projects the head of the trail. A lever runs through it at will, stopping at interesting items, going off on side excursions. It is an interesting trail, pertinent to the discussion. So he sets a reproducer in action, photographs the whole trail out, and passes it to his friend for insertion in his own memex, there to be linked into the more general trail.

[8]Thus science may implement the ways in which man produces, stores, and consults the record of the race. It might be striking to outline the instrumentalities of the future more spectacularly, rather than to stick closely to methods and elements now known and undergoing rapid development, as has been done here. Technical difficulties of all sorts have been ignored, certainly, but also ignored are means as yet unknown which may come any day to accelerate technical progress as violently as did the advent of the thermionic tube. In order that the picture may not be too commonplace, by reason of sticking to present-day patterns, it may be well to mention one such possibility, not to prophesy but merely to suggest, for prophecy based on extension of the known has substance, while prophecy founded on the unknown is only a doubly involved guess.

Presumably man’s spirit should be elevated if he can better review his shady past and analyze more completely and objectively his present problems. He has built a civilization so complex that he needs to mechanize his records more fully if he is to push his experiment to its logical conclusion and not merely become bogged down part way there by overtaxing his limited memory. His excursions may be more enjoyable if he can reacquire the privilege of forgetting the manifold things he does not need to have immediately at hand, with some assurance that he can find them again if they prove important.

In going back to copy/paste those sections into this post, I note that I’ve left out some closely-related ideas which are significant to the tool-aspect, and also (especially from section 7) included some that could perhaps be left off. So I see again that Dr Bush’s composition is more complex and subtly crafted that I first perceived!

I think I need to go back and repeat this exercise in a word processor, highlighting the words to indicate the main tool-thread, but using different colors to indicate the core ideas as distinct from related ones.

There’s too much to comment on all at once, so I’ll pick a couple which jump out at me tonight.

First, the paragraph from section 5 beginning “A new symbolism” reminds me that the storage format is significant. Dr Bush is describing what’s necessary to (partially) automate certain aspects of logical analysis (thinking). If we are able to store our knowledge/information/thoughts in an appropriate manner, then machines can be programmed to process (reason with) that information to infer and derive new information and conclusions. Douglas Lenat (among others) has done significant research along these lines, and implemented several systems including Cyc which can represent facts, manipulate them logically, and derive new conclusions. But the representation is important: the Word docs, PPTs, GIFs and JPEGs, PDFs, MP3s, MP4s, and other files and formats we use today on the web are not (in themselves) suitable representations of facts and knowledge, with which computers can be programmed to reason.

Several persons who have studied the human mind and our thinking processes have described the key operation as symbol processing. We are able to process abstractions, somehow manipulating symbols in our brains, as we reason and think. About five years ago I read a fantastic book describing this, whose name and author currently escape me. I’m fascinated by the fact that Dr Bush specifically (and presciently) used the term “a new symbolism” to describe what is needed.

Second, the second paragraph of section 6 states that “The human mind … operates by association.” This is very important! How we build and represent associations, or relationships, between and among the facts/thoughts/mental symbols stored in our brains is critical. Somehow our brains are able to make inferences and create relationships within and among our memories, and those relationships are themselves stored and used as part of the associational index with which we retrieve our memories and thoughts. For computers to help us think, this process must be better understood, so that it can be automated within the computers. (Which is exactly what Dr Bush said in the next paragraph: “Man cannot hope fully to duplicate this mental process artificially, but he certainly ought to be able to learn from it.”)

The process of building a link (described in section 7) needs to be semi-automated: perhaps the computer can process the data which is being referenced by the user, and propose a bunch of possible associations, from which the user selects a subset to be saved.

I’m going to stop here for tonight, but there’s lots more to ponder!

Notes for future follow-up:

  • What was that book on the brain I read?
  • Highlight the thread in a word processor, and refine using colors.
  • Comment more fully on key concepts within this thread.
  • Who else besides Lenat made major progress on automated inferencing?
  • Post a blog entry on why I’m including these notes for follow-up.
  • Comment on the use of section referencing, and how much better it would be to have more finely-grained references such as used for religious texts, and especially “purple numbers”. Point at Christina Engelbart’s recent post, as well as Eugene Eric Kim’s implementation and other documentation.

Leave a Reply

Your email address will not be published.