Haiku, Dog Grammar, and Computer-Generated Text, Part II
To continue from Part I ....
To review, in formal language theory, a language is defined by a set of terminal symbols, a set of non-terminal symbols, a symbol marked as the starting symbol, and a set of rules using the symbols that describe how to replace or rewrite a symbol in terms of other symbols.
To see if a sentence (in the formalism, it need only be a string of symbols) is part of the language, we see if it's possible to apply the rules in such a way as to derive the sentence (string) from the start symbol. If there is, the sentence (string) is in the language. Otherwise, not. This process is parsing.
This kind of grammar, however, can be used to generate sentences as well. And, in the previous post, I hinted that if we were clever in the construction of the rules, we might actually get interesting sentences.
And so, here is a first run at a grammar for haiku (note that we're ignoring syllable count, and bunch of other things here):
The first rule says that a Haiku consists of a Season-Reference followed by a Haiku-Body, or (that's the vertical bar) a Haiku consists of a Haiku-Body followed by a Season-Reference. In turn, a Season-Reference consists of a Season-Word, or (the vertical bar again) a Season-Reference is a Season-Word followed by a Noun.
Now, we can use this grammar and Norvig's Lisp code to see if we generate something a little more interesting than random sentences. I translated Norvig's Common Lisp code into a dialect of Lisp called Scheme. And, to make it easier for the program to use the grammar, I also translated it into what are called S-expressions which are kind of the native form for expressions in dialects of Lisp. You can take a look at the code and the translated grammar. (If you want to explore Scheme, probably the best thing to start with is DrScheme, a free interactive Scheme environment. The tutorials are top-notch.)
Running the generator program, which consists of typing (generate 'Haiku) at the prompt, we'll get a succession of strings. Adding some line breaks, some of them are more interesting:
this silent morning
snowy branch
spring bones
summer torchlight
cranes
moonlight on the eye
autumn
no-one
chestnut by that outhouse
spring
a misty moon
a misty roadside
than others:
winter worm
this tonight's horse
on that evening
spring
the my law
weathered salmon
One obvious thing to do, given this last, would be to handle the possessive pronouns separately from the adjectives. But first, let's talk about text grammars—that's the road we're headed down with this.
In 1928, Vladimir Propp wrote what would become known, in translation, as Morphology of the Folktale. Propp's book wasn't translated until the 1950s and it didn't make it into English until 1968. Propp's book laid out a grammar for the Russian folktale, a text grammar. The grammar included possible actions, character types, events, and rules for combining them into stories. When Propp's work reached the U.S., it set off a search for other possible text grammars as part of the shift toward structuralism in literary studies.
It also held out the possibility of running the grammar backwards and generating folktales—and it didn't work worth a darn. Part of the problem is the way that the elements of a grammar are relatively independent of each other. There's no method, in our current formulation, to indicate, say, the selection of a plural adjective so that we'll later select a plural noun. Or, that the Prince is already married when he encounters a maiden a second time. We could make more refined categories, but this is usually handled by adding attributes to elements in the grammar creating what is, not surprisingly, an attributed grammar. This wasn't, however, the first thing tried.
Enter GPS, the General Problem Solver. The program, despite the hyped expectations set by its developers, wasn't really a generalized problem solver. Instead it was the first of what are known as planning programs. Norvig talks quite a bit about GPS and provides code for a scaled-down version of it. GPS is designed to take a start state and find a series of actions that produce a goal state. Actions have three parts: the preconditions that make the action possible, the action itself, and the conditions that result from the action. The program begins with the start state and searches for a series of actions that produce the goal state. Planning algorithms like GPS are in wide use creating schedules and in structuring processes. They're also of great interest in robotics.
In 1976, James Meehan wrote Tale-Spin, a program that applied (and elaborated) the GPS algorithm to writing stories. The program took a story start, a goal, and specifications about characters and objects and the actions they could perform. It then found a series of actions that would produce the goal. These actions formed the narrative spine which was then fleshed out into a story by another part of the program that produced descriptions of the actions and world states. Tale-Spin solved the problems of text grammars by separating the generation of the narrative from the generation of the language giving the narrative.
This was in keeping with much that was going on in literary studies at the time surrounding narratology—the study of narrative structure and how a sequence of chronological events are re-mapped into a story (among other things).
And there I'll stop, for now.
To review, in formal language theory, a language is defined by a set of terminal symbols, a set of non-terminal symbols, a symbol marked as the starting symbol, and a set of rules using the symbols that describe how to replace or rewrite a symbol in terms of other symbols.
To see if a sentence (in the formalism, it need only be a string of symbols) is part of the language, we see if it's possible to apply the rules in such a way as to derive the sentence (string) from the start symbol. If there is, the sentence (string) is in the language. Otherwise, not. This process is parsing.
This kind of grammar, however, can be used to generate sentences as well. And, in the previous post, I hinted that if we were clever in the construction of the rules, we might actually get interesting sentences.
And so, here is a first run at a grammar for haiku (note that we're ignoring syllable count, and bunch of other things here):
Haiku -> Season-Reference Haiku-Body
| Haiku-Body Season-Reference
Season-Reference -> Season-Word | Season-Word Noun
Season-Word -> winter | spring | autumn | summer
Haiku-Body -> ExtendedImage | Image Image
ExtendedImage -> Image Preposition Determiner Noun
Image -> Determiner Adjective Noun | Adjective Noun
| Noun
Determiner -> a | the | this | that
Preposition -> about | as | by | down | for | in
| into | of | on | to
Conjunction -> and
Verb -> ate | cut | digs | goes | has | have | keep
| melt | opens | see
Adjective -> bare | black | crane_s | dried | first
| interesting | misty | my | settled
| silent | snowy | tonight_s | weathered
| white | wind-pierced
Noun -> body | bones | branch | chestnut | chewing
| crow | day | end | evening | eye | frost
| fuji | hand | hibiscus | horse | i
| law | legs | mind | moon | moonflower
| moonlight | morning | myself | net | no_one
| outhouse | rain | road | roadside | salmon
| seeing | torchlight | tree | way | worm
The first rule says that a Haiku consists of a Season-Reference followed by a Haiku-Body, or (that's the vertical bar) a Haiku consists of a Haiku-Body followed by a Season-Reference. In turn, a Season-Reference consists of a Season-Word, or (the vertical bar again) a Season-Reference is a Season-Word followed by a Noun.
Now, we can use this grammar and Norvig's Lisp code to see if we generate something a little more interesting than random sentences. I translated Norvig's Common Lisp code into a dialect of Lisp called Scheme. And, to make it easier for the program to use the grammar, I also translated it into what are called S-expressions which are kind of the native form for expressions in dialects of Lisp. You can take a look at the code and the translated grammar. (If you want to explore Scheme, probably the best thing to start with is DrScheme, a free interactive Scheme environment. The tutorials are top-notch.)
Running the generator program, which consists of typing (generate 'Haiku) at the prompt, we'll get a succession of strings. Adding some line breaks, some of them are more interesting:
this silent morning
snowy branch
spring bones
summer torchlight
cranes
moonlight on the eye
autumn
no-one
chestnut by that outhouse
spring
a misty moon
a misty roadside
than others:
winter worm
this tonight's horse
on that evening
spring
the my law
weathered salmon
One obvious thing to do, given this last, would be to handle the possessive pronouns separately from the adjectives. But first, let's talk about text grammars—that's the road we're headed down with this.
In 1928, Vladimir Propp wrote what would become known, in translation, as Morphology of the Folktale. Propp's book wasn't translated until the 1950s and it didn't make it into English until 1968. Propp's book laid out a grammar for the Russian folktale, a text grammar. The grammar included possible actions, character types, events, and rules for combining them into stories. When Propp's work reached the U.S., it set off a search for other possible text grammars as part of the shift toward structuralism in literary studies.
It also held out the possibility of running the grammar backwards and generating folktales—and it didn't work worth a darn. Part of the problem is the way that the elements of a grammar are relatively independent of each other. There's no method, in our current formulation, to indicate, say, the selection of a plural adjective so that we'll later select a plural noun. Or, that the Prince is already married when he encounters a maiden a second time. We could make more refined categories, but this is usually handled by adding attributes to elements in the grammar creating what is, not surprisingly, an attributed grammar. This wasn't, however, the first thing tried.
Enter GPS, the General Problem Solver. The program, despite the hyped expectations set by its developers, wasn't really a generalized problem solver. Instead it was the first of what are known as planning programs. Norvig talks quite a bit about GPS and provides code for a scaled-down version of it. GPS is designed to take a start state and find a series of actions that produce a goal state. Actions have three parts: the preconditions that make the action possible, the action itself, and the conditions that result from the action. The program begins with the start state and searches for a series of actions that produce the goal state. Planning algorithms like GPS are in wide use creating schedules and in structuring processes. They're also of great interest in robotics.
In 1976, James Meehan wrote Tale-Spin, a program that applied (and elaborated) the GPS algorithm to writing stories. The program took a story start, a goal, and specifications about characters and objects and the actions they could perform. It then found a series of actions that would produce the goal. These actions formed the narrative spine which was then fleshed out into a story by another part of the program that produced descriptions of the actions and world states. Tale-Spin solved the problems of text grammars by separating the generation of the narrative from the generation of the language giving the narrative.
This was in keeping with much that was going on in literary studies at the time surrounding narratology—the study of narrative structure and how a sequence of chronological events are re-mapped into a story (among other things).
And there I'll stop, for now.
0 Comments:
Post a Comment
<< Home