| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

WordHoard The Classics

Page history last edited by Ryan Cadinha 15 years, 2 months ago

 

 

Research Report: WordHoard The Classics

 

By Ryan Cadinha of The Venetian Project Team

 


 

 

Abstract

 

     WordHoard is an extensively detailed text analysis tool designed for maximum scholarly efficiency without the need for tedious data collection.  The works of Chaucer, Shakespeare, and Spenser are complied digitally, along with various early Greek epics in the original language and English translations.  In short, lifetimes of critical work lie at the command of one’s mouse-click.  

 

Description 

 

     Created through the collaboration of various departments at Northwestern University, WordHoard is an excellent example of the future of liberal arts research.  The project was funded by a grant from The Andrew W. Mellon Foundation, an organization who supports multi-disciplinary approaches to collegiate studies.  The field of humanities is on the cusp of achieving the societal validity that is usually reserved for the quantitative sciences or mathematics.  This complex tool, a pseudo-big brother of the TagCrowd engine, is a prime example of how such respectability will come to pass.  TagCrowd is a site that allows one to feed any portion of a text into a word box through cut-and-paste.  The application then visualizes word frequency into what is known as a tag cloud.  The largest words in the visual cluster correlate with the most frequent in the text.  A notation to the side of each word indicates its frequency of use for the provided sample.  The TagCrowd site is clean, well-built and functions well, but its scope is quite limited.    WordHoard has the ability to chart word use frequency, and even provides the text itself, eliminating the need for hours of data collection.  Although not directly affiliated, one can see how WordHoard could be referred to as an older sibling of TagCrowd.  After all, tag clouds are almost a side note on the WordHoard site, but the usefulness of both applications lies in their ability to tally the occurrences of words.  The sole advantage of TagCrowd over WordHoard lies in its ability to create word clouds.  Although visually interesting, these artifacts of modern philogical study lack any robust scholastic value.  

     Northwestern’s design is vastly more suited to serious academia.  Each of the canonical works in the digital library has undergone a lemmatization process, whereby each word in all its various uses is bundled together and filed by its lemma.  In philogical terms, a “lemma” is the most basic form of a word.  For example: drove, driving, and driven share the common lemma of the word “drive.”  The site’s authors refer to this concept as “deep tagging,” and it is astounding to consider that such a small group of individuals deep tagged such a huge collection of canonical works.  To revisit the analogy of familial personification, WordHoard is the great grandson of the Biblical concordance, allowing for the same in-depth analysis of word usage.  In fact, it is not limited to the two aforementioned methods of textual examination.  With the WordHoard program, one can compare word usage between all the characters of a single author’s work, characters across multiple works, characters by gender, and even word usage by the authors themselves.  Such rigorous philogical analysis was not impossible in the past, but usually reserved for religious texts, since no other document justified the sacrificing of men’s entire lifetimes.

     The WordHoard tool allows the novice to create highly advance observations.  The caveat to this notion is the fact that the program is laden with jargon, and entails a quite drastic learning curve.  The comforting aspect of this paradox is the idea that the tool can do things one has yet to think of, making the application of WordHoard seemingly infinite in regard to the works it embodies.

 

 

Commentary  

 

     While working with the TagCrowd tool to determine word use frequencies of all characters in William Shakespeare’s Merchant of Venice, it became painfully obvious that a much broader program was required in order to achieve a thorough synthesis.  For example, I envisioned an application with the ability to take our Merchant character’s texts, provide a word frequency analysis for them, then compare each word frequency analysis to all other Shakespeare works or perhaps contrast them against modern plays in production.  In class, Professor Liu pointed out that such a tool did exist, and was in fact linked to the class website in the toy box.  In essence, WordHoard cuts the busy work out of word frequency discovery, and raises the bar to allow for in-depth philogical study.  Separating, sorting, cutting and pasting each character’s dialogue into the TagCrowd application took many hours and counting word frequency by pencil and hand may have taken weeks.  Beyond that, without such elaborate tools as WordHoard, comparison between The Merchant of Venice and other dramatic works, or the English language as a whole would be impossible.  In his book, American Architects and Texts, Juan Pablo Bonta provides a potent example of how a writer may leave the box of familiar, time-honored study, and still cover his tracks.  The main necessary ingredient is quantified data.  Through various charts and graphs representing the intersection of literature and architecture, Bonta crafts convincing arguments regarding the evolution of America’s cities.  This method cuts to the heart of cross-disciplinary study, and provides a framework for such departures. 

     The Venetian Project was the result of two false starts.  As seems quite common anytime a group is involved in creating something, there were issues of desired scope, personal preference, and familiarity with certain works or the disclipline of English itself.  Group two’s original concept was the construction of a three dimensional representation of Dante’s Inferno.  When similar projects were found to exist online, the idea was abandoned in favor of a more scientific approach.  An online data base of Shakespeare’s works was found, and it was decided that the word frequency of each character in the Merchant of Venice would be analyzed, and then graphed in a flash format.  Once graphed, an explanation or “reading” of each character’s discovered frequencies would be written, and presented alongside the moving graphs.  While this method seemed interesting and aesthetically pleasing, I feared that a simple collection of word usage data with explanatory notes was insufficient to instill the proper amount of aforementioned quantitative respect.  The introduction to WordHoard opened doors that fostered professional critics' results by the hands of amateurs.  The true test of a modern scholar is how he can use these brilliant tools.  I must admit, I’m still at somewhat of a loss as to what to do with all the information that the WordHoard tool provides.  I am confident, however, that an interesting, visually pleasing, scholastically respectable, readily usable, and publicly accessible site dealing with quantitative criticism of The Merchant of Venice is realistically attainable. 

 

Resources for further study    

 

1. Bonta, Juan Pablo. American Architects and Texts. Massachusetts: The MIT Press, 1996. 

 

2. "Dante's Inferno- A Virtual Tour of Hell." 15 Jan. 2009 <http://web.eku.edu/flash/inferno/

 

3."Full text- script of the play Merchant of Venice by William Shakespeare." WILLIAM SHAKESPEARE. 1 Feb. 2009 <http://www.william-shakespeare.info/script-text-merchant-of-venice.htm

 

4."FusionCharts Free- Animated Flash Charts amd Graphs for ASP, PHP, ASP.NET, JSP, RoR and other web applications." FusionCharts v3- Animated Flash Charts & Maps for web applications. 8 Feb. 2009 <http://www.fusioncharts.com/free/.

 

5. TagCrowd. 10 Feb. 2009 7 October 2007. 5 February 2009 <http://tagcrowd.com/

 

6. WordHoard. 18 Feb. 2009 <http://wordhoard.northwestern.edu/userman/index.html.

 

Comments (0)

You don't have permission to comment on this page.