Textual Analysis: The “Text” Part

Another marvelously instructive post from HASTAC scholar Tobias Hrynick, Department of History, Fordham University. Here he outlines some systematic tips and guidelines for creating a digitized version of a text. 

The digital analysis of text constitutes the core of the digital humanities. It is here that Roberto Busa and his team began, and though subsequent scholars have expanded somewhat, exploring the possibilities of digital platforms for applying geographic analysis or presenting scholarship to wider audiences, the humanists’ interest in text has ensured the growth of a healthy trunk directly up from the root, along with the subsequent branches.

Necessary for all such projects at the outset, however, is the creation of a machine-readable text, on which digital analytical tools can be brought to bear. This process is generally more tedious than difficult, but it is nevertheless fundamental to digital scholarship, and a degree of nuance can be applied to it. What follows is intended as a basic introduction to some of the appropriate techniques, intended to highlight useful tools, including some (such as Antconc and Juxta) which also have powerful and user-friendly analytic functionality.

Acquiring Text

The appropriate way of acquiring a machine-readable text file (generally a .txt file, or some format which can be easily converted to .txt), and the difficulty involved in doing so, varies according to several factors. Often, digital versions of the text will already exist, so long as the text is old enough that the copyright has expired, or new enough that it was published digitally. Google Books, Project Gutenberg, and Archive.org all maintain substantial databases of free digital material. These texts, however, are all prone to errors – Google Books and Archive.org texts are generally created with a process of scanning and automated processing that is likely to produce just as many errors as performing this process yourself.

Such automated processing is called Optical Character Recognition (OCR). It requires a great deal of labor intensive scanning if you are working from a print book – though a purpose-built book scanner with a v-shaped cradle will speed the work considerably, and a pair of headphones will do a great deal to make the process more bearable.

Toby 1st photo.png

Once you have .pdf or other image files of all the relevant text pages, these files can be processed by one of a number of OCR software packages. Unfortunately, while freeware OCR software does exist, most of the best software is paid. Adobe Acrobat (not to be confused with the freely available Adobe Reader) is the most common, but another program, ABBYY Finereader deserves special mention for additional flexibility, particularly for more complicated page layouts, and a free trial version.

As a quick glance through the .html version of any Archive.org book will confirm, the outcome of an OCRing process is far from a clean copy. If a clean copy is required, you will need to expend considerable effort editing the text.

Toby 2nd photo

The other option is to simply re-type a given text in print or image format into a text editor – both Apple and Windows machines come with native text-editors, but if you are typing at length into such an editor, you might prefer a product like Atom or Notepad++. Neither of these platforms provides any crucial additional functionality, but both offer tabbed displays, which can be useful for editing multiple files in parallel; line numbers, which are helpful for quickly referencing sections of text; and a range of display options, which can make looking at the screen for long periods of time more pleasant. Alternately, you can choose to type out text in a word processor and then copy and paste it into a plain-text editor.

Assuming there is no satisfactory digital version of your text already available, the choice between scanning and OCRing, and manually retyping should be made keeping the following factors in mind:

  1. How long is your text?

This is important for two reasons. First, the longer a text is, the more the time advantage of OCR comes into play. Second, the longer a text is, the more individual errors within it become acceptable, which can sometimes make the time-consuming process of editing OCRed text by hand less critical. Digital textual analysis is best at making the king of broad arguments about a text in which a large sample size can insulate against the effects of particular errors.

The problem of this argument in favor of OCR is that it assumes the errors produced will be essentially random. OCR systems, however, when they make mistakes are likely to make that same mistake over and over again – particularly common are errors between the letters i, n, m, and r, and various combinations thereof – such errors are likely to cascade across the whole text file. A human typist might make more errors over the course of a text, especially a long text in a clear type-face, but the human is likely to make more random errors, which a large sample size can more easily render irrelevant.

That said, OCR should still generally be favored for longer texts. While automated errors can skew your results more severely than human ones, they are also more amenable to automated correction, as will be discussed in the next section.

  1. What is the quality of your print or image version?

Several features of a text which might cause a human reader to stumble only momentarily will cripple an OCR systems ability to render good text. Some such problems include:

  • A heavily worn type-face.
  • An unusual type-face (such as Fractur).
  • Thin pages, with ink showing through from the opposite side.

If your text or image has any of these features, you can always try OCRing to check the extent of the problem, but it is wise to prepare yourself for disappointment and typing.

  1. How do you want to analyze the text?

Different kinds of study demand different textual qualities. Would you like to know how many times the definite article occurs relative to the indefinite article in the works of different writers? Probably, you don’t need a terribly high quality file to make such a study feasible. Do you want to create a topic model (a study of which words tend to occur together)? Accuracy is important, but a fair number of mistakes might be acceptable if you have a longer text. Do you intend to make a digital critical edition highlighting differences between successive printings of nineteenth century novels? You will require scrupulous accuracy. None of these possibilities totally preclude OCRing, especially for longer texts, but if you choose to OCR, expect a great deal of post-processing, and if the text is relatively short, you might be better served to simply retype it.

Cleaning Text

Once you have produced a digital text, either manually or automatically, there are several steps you can take to help reduce any errors you may have inadvertently introduced. Ultimately, there is no substitute for reading, slowly and out loud, by an experienced proof-reader. A few automated approaches, however, can help to limit the labor for this proof-reader or, if the required text quality is not high, eliminate the necessity altogether.

  1. Quick and Dirty: Quickly correcting the most prominent mistakes in an OCRed text file.

One good way of correcting some of the most blatant errors which may have been introduced, particularly the recurring errors which are common in the OCRing process, is with the use of concordancing software – software which generates a list of all the words which occur in a text. One such program is Antconc, which is available for free download, and contains a package of useful visualization tools as well.

Toby 3rd photo.png

Once you have produced a text file, you can load it using AntConc, and click on the tab labeled Word List. This will produce a list of all the words occurring in the text, listed in order of frequency. Read through this list, noting down any non-words, or words whose presence in the text would be particularly surprising. Once you have noted down all the obvious and likely mistakes, you can correct them using the Find and Find and Replace tools on your preferred text editor.

This method of correction is far from fool-proof. Some subtle substitutions of one plausible word for another will likely remain. This is, however, a good way of quickly eliminating the most glaring errors from your text file.

A similar effect can be achieved using the spell-check and grammar check functions on a word processor, but there are several reasons the concordance method is generally preferable. First, reading through the list of words present in the text will tend to draw your attention to words which are real, but unlikely to be accurate readings in the context of the text, which would be over-looked by spelling and grammar-check functions. Second, a concordancer will present all the variants of a given word which occur in the text – through alternate spelling, use of synonyms, or varying grammatical forms (singular vs. plural, past vs. future) – which might be significant for your analysis.

  1. Slow and Clean: Cross-Collating multiple Text Files

A more extreme way of correcting digitized text is to produce multiple versions and to collate them together. Because of the frequency of similar errors being repeated across OCR versions, comparing two OCR versions is of limited use (although if you have access to more than one version of OCR software, it might be worth trying). It is of greater potential use if you compare two hand-typed versions, or a hand-typed version and an OCRed version, which are much less likely to contain identical errors.

Cross-comparison of two documents can be accomplished even using the merge document tools on Microsoft Word. A somewhat more sophisticated tool which can accomplish the same task is Juxta. This is an online platform (available also as a downloadable application), which is designed primarily to help produce editions from multiple varying manuscripts or editions, but which is just as effective as a way of highlighting errors which were introduced in the process of digitization.

Toby 4th photo

This process is a relatively thorough way of identifying errors in digitized text, which can even identify variations that might escape the attention of human proofreaders. The major weakness of the technique, however, is that it requires you to go through the effort of producing multiple different versions, ideally including one human-typed version. If you need a scrupulously corrected digital text, however, it is a powerful tool in your belt, and in the event that multiple digital versions of your text have already produced, it is an excellent way of using them in concert with one another – another strength of the Juxta platform is that you can collate together many different versions of the same text at once.

Conclusion

Once you have a digitized and cleaned version of the text in which you are interested, a world of possibilities opens up. At a minimum, you should be able to use computer search functions to quickly locate relevant sections within the text, while at maximum you might choose to perform complex statistical analysis, using a coding language like R or Python.

A good way to start exploring some of the possibilities of digital textual analysis is to go back and consider some of the tools associated with Antconc other than its ability to concordance a text. Antconc can be used to visualize occurrences of a word or phrase throughout a text, and to identify words which frequently occur together. Another useful tool for beginners interested in text analysis is Voyant, which creates topic models – visualizations of words which frequently occur together in the text, which can help to highlight key topics.

Celebrating the Success of NYCDH Week

Something amazing happened in New York City last week. No, I’m not talking about this or this or this. Well…a lot of things happened in the city last week. I digress.

Last week marked the inaugural NYCDH Week.

If you live outside of the city (or in the city and under a rock), here’s what happened: NYCDH Week brought together cross-institutional scholars who engage in digital research, pedagogy, and publication to share their projects and learn and teach digital tools. Over 150 academics and members of the GLAM community shared their expertise and enthusiasm for digital scholarship. The week kicked off with an afternoon of lightning talks from graduate students, professors, archivists, and librarians. This experience was incredible. I still tell friends about Ellen Hoobler’s Digital Zapotec. And I’m eager to track Caroline Catchpole and her team’s Culture in Transit on Twitter. I was grateful to share my own project, The U.S. Goes Postal, (and receive invaluable feedback and support). My friend and colleague, Boyda Johnstone, gave a presentation on the importance of online communities and academic blogging:

Screen Shot 2016-02-19 at 10.23.18 AM

Each of the presentations showed how digital tools can animate humanistic study. Seeing such an array of good work was encouraging, inspiring, and energizing for all involved. Following the lightning talks were panel and roundtable conversations about the past, present, and future of the digital humanities in NYC with Matt Gold, Jennifer Vinopal, Micki McGee, and many others. Of course, we rounded out the day with a happy hour. Here, Margaret Galvan, Eilleen Clancy, and I (three of this year’s NYCDH Graduate Student Award Winners) compare notes over the din of DH networking at the Digi-bar:

Screen Shot 2016-02-22 at 9.41.59 AM

Following the Kick-off Event was a week-long series of workshops hosted at campuses, museums, and libraries across NYC. You could visit NYPL for a Digital Maps Primer, Fordham hosted a Typography workshop, and there were a couple Git workshops at Columbia. The full range of the program was enough to fill your digital toolbox for years to come.

Screen Shot 2016-02-19 at 10.20.57 AM

I’m not recounting NYCDH Week purely out of nostalgia or to incite FOMO (although, sheesh, you really should have been there!). I am sharing the success of last week’s event because it can be a model for what organizer Alex Gil calls a “low effort, high impact” event. NYCDH Week was successfully implemented with next to no budget. The event was a success because a large group of people pooled their academic and intellectual resources and donated their time to lead workshops and coordinate the events.

The nitty-gritty: when the steering committee (a group of volunteers) put out the call for workshops, the presenter had to arrange for their own space (although assistance was given to presenters from outside NYC). Then, presto! All the committee needed to do was confirm with the presenters, loosely arrange the schedule, put it all online (on the site Jesse Merandy built), and advertise the event through any channels at their disposal.

To call NYCDH Week a success would be an understatement. Over the course of a week, I made new connections and reunited with friends, planned future projects, expanded my digital repertoire, and even gave my own tidbits of advice here and there. NYCDH Week fostered the best kind of academic community–one built on mutual respect, generosity, and intellectual inquiry. If you want to learn more about the event, check out the website or #NYCDHWeek on Twitter.

NYCDH Week!

Join us for the first annual NYCDH Week from February 8-12th for a celebration of all things DH in New York City! Digital humanists from Fordham, Columbia, NYU, NYPL, and other area academic and cultural institutions will come together to learn from one another and collaborate across our institutional divides. The program includes networking sessions, social events, and open workshops offered across the city. The first NYCDH Week promises to be full of great experiences for novices and experts alike.

What’s happening at Fordham?

  • The central event of NYCDH Week is an afternoon of networking, lightning talks, and panels held at our very own Fordham Lincoln Center campus. This meeting will take place on Tuesday, February 9th beginning at 12:30 and will be followed by a social outing at a nearby bar.
  • Tobias Hrynick (Fordham Medieval History) is leading an “Introduction to Omeka” workshop on Thursday from 9:30-11:30 at Lincoln Center. This session is intended to equip beginners with sufficient knowledge of Omeka to assess whether it is appropriate for their particular projects, and to describe some resources which they might use to unravel any problems they encounter with the system in the future. (See the NYCDH Week schedule for a full description of Toby’s workshop.)
  • Amy Papaelias (SUNY New Paltz) will shed light on “Typography for [Digital] Humanists” at Fordham Lincoln Center at 10am on Friday. This workshop will provide an overview of basic typographic principles and will focus specifically on issues related to typography for [digital] humanists, such as typeface selection for digital projects, web typography tools and typography for UI/ UX design.

 

Untitled drawing.png

Just Enough Python to Get You Into Trouble

Follow along with Tobias Hrynick, Fordham HASTAC Scholar and medievalist extraordinaire, for an incredibly helpful and detailed Python tutorial! If you want to save this material for a future blizzard weekend, download the PDF version: Hrynick Python Trouble.

1. Introduction

Just as the digital humanities remain to some extent separated from non-digital scholarship (though progress is being made), digital scholarship involving active scripting remains separated from that involving manipulation of pre-programmed consoles. This separation is not a function simply of difficulty: basic scripting is relatively easy, while the manipulation of popular consoles like ArcGIS mapping software or the Omeka content management system can be wildly infuriating. Console-based systems, however, tend to be clearly and obviously oriented to particular purposes – a scholar working with geographical data is likely to persistently run up against Geographic Information Systems, until they finally break down and try to learn one. The sheer number of broadly similar programming languages and the breadth of possible uses for each makes the decision to learn any particular one spring less automatically out of a particular humanities research topic.

Second, and more importantly, scripting languages tend to resist initial attempts at experimental learning. Consoles offer buttons to push. Knowing the right buttons to push is not always easy – but in many senses this doesn’t matter. At first, push all the buttons! Something is bound to happen, and one can combine observations of what these somethings are into some level of working knowledge. Scripting platforms offer no buttons. It is just that typing things in and seeing what happens will not work – pressing random buttons on a console generally won’t work either. It is that initial trial and error in scripting will generally fail in uninteresting ways.

The first problem can be solved simply by a willingness to devote some time to some scripting language, even if you randomly select one from a list, on the assumption that it might be helpful eventually. The odds are in your favor, and the same flexibility which makes the choice between popular scripting languages difficult makes the decision less important. This blog entry is intended to help address the second problem, for the language (*dice rattling*) Python – a flexible programming language which has been particularly popular for text analysis. It is intended to demonstrate some basic Python commands, point out some of the things which Python could be used for, equip readers with and understanding sufficient to make the program fail to work in more interesting ways, and provide some basic vocabulary which will make it easier to search for the answers to problems which arise.

 

2. Basic Python

In general, it is better to learn Python by trying to solve problems with it. However, trying to do this without some base-level knowledge is more frustrating than enlightening. The following is a very brief and partial description of Python, intended to help you reach the trial-and-error phase with a minimum of psychological trauma.

Python is a system of constructing instructions in a text editor which a computer can understand and carry out. The name is employed in multiple, slightly different ways – it can refer either to the language in which these instructions are written, or more loosely, into the system of software which allows them to be implemented (as in the ever-popular remark, “Python just did something weird…”).

Using Python on your computer will require two different programs. The first is a plain-text editor for writing instructions: this can be any text editor already on your machine, but it is probably easier at first to download a text editor specifically designed to facilitate the creation of Python code. The second necessary program is an interpreter (or shell) – this lets the computer understand and execute the instructions you give it. Both of these can be downloaded together – many other versions exist, but IDLE version is fairly user-friendly, popular (a boon for finding tutorials), and provides a good common point of reference.

It is also important to know that there are several versions of Python which are active, with a basic division between Python 2 and 3. 3 is likely to become increasingly popular eventually, but because it is not backwards-compatible with 2, many users have been reluctant to change. The continuing preference for 2 means that the resources available online for learning 2 are generally better, which is a strong argument in favor of the older system for a beginner. Information here is for Python 2 (2.7 specifically).

Once you install a version of IDLE, open it and you will be brought to the Python shell. From here, you can run longer scripts, or you can enter single commands, and see how Python evaluates them one step at a time.

Python 1

3. Data Types

Python is structured around data elements which can all be manipulated in broadly similar ways. However, Python data exist in a number of different classes, which affects some of the particular operations which can be conducted with them. The following are some of the most important types:

  • int (integer): Integers are positive, negative, or zero whole-number values. These can be manipulated using ordinary mathematical operators (like +, -, *, and /), but any operation which would render a non-whole number will be given as a truncated whole number value (so 3/2 becomes 1). Another useful operator is %, which will output the remainder of a division (so 4%2 outputs 0, while 5%2 outputs 1).
  • float (floating point number): A more precise kind of number, though still not a perfectly exact one. Python will understand 3.0 or 4.987876890 as floating point numbers, and will perform division rendering a decimal answer which is adequately precise for the vast majority of uses.
  • str (string): A series of symbols, understood as symbols rather than as any content they might convey. Python understands as strings symbols which are enclosed in either single or double quotation marks. “1,” would be a string, as would the entire text of Hamlet, so long as it was enclosed in quotation marks, but 1, with no quotes, would be understood as an integer.
  • list: a meta-set of data elements, signified by square brackets, with items within it separated with commas, as in [1, “fred”, 3.14, [1, 2, 3], “zebra”]. As shown in the example, the items within a list can be integers, floats, strings, or even other lists. Note that several other broadly list-like classes of data exist: dictionaries and tupples. We will ignore these for the moment, however.

Once you have defined a variable as a list, you can call particular elements from the list, using the command listname[itemnumber], remembering that items within a list are numbered from 0 rather than from 1. So if the example list above were called list1, list1[0] would render 1, list1[1] would render “fred,” and so on. The command len(), with the name of a list within the parentheses, will give the number of elements (length) of the list – len(list1) would give 5.

Commands int(), float(), str(), and list() all exist to convert values to the appropriate type. Try out some commands on shell –“list(‘hello)”, “int(1.3)” “list(str(546+89))” and similar – to try out the limits of these commands. Note, for example, that the int() command is smart enough to convert “1” to 1, but not to convert “one” to 1.

 

4. Variables

Variables are names to which particular values are assigned. Python will generally interpret any string of characters which is not marked out as one of the types above as a variable name, and will react with a confused error message if you enter such a thing without previously defining it – note the difference between the reaction of the shell when you input a name like “James,” with or without quotation marks.

python2

Variables are assigned value using the = sign.

Python3

Try assigning values of various types to variables of your choosing, and then having the shell evaluate them by re-entering the name, as above. Note that while you can use number in variable names (ie “variable1”) you cannot have a number alone be the variable name (preventing confusing commands like 1=8).

 

 

5. Some More Basic Commands and Operations

 

  • print: The print command causes something within a python script to be displayed to the user. Note the automatic evaluation of variables which occurs in the Python shell does not occur when the shell is running a full script – Python will only display what the script explicitly tells it to. “print ‘x’” will cause python to return “x”. ‘print x,’ will cause Python to return the value of the variable x, if one has been assigned, or to give an error message if one has not.
  • input(): The input command prompts the user by printing whatever is contained within the parentheses. The normal way of using this command would be to tie it to a variable, as in “age=input(‘What is your age?)”. A similar command is raw_input(), which will automatically convert the input to a string.
  • if: The simplest of Python’s conditional statements. The if-statement evaluates whether a given statement is true or false, and performs the succeeding block of code only if it is true. The condition must be followed by a colon, and all the things which are carried out only if the statement is true must be indented. The indentation can be done either with spaces or tabs, but all indented elements must be indented the same amount.

 

There are several ways of making statements which Python can evaluate to be true or false. The simplest way is a numerical evaluation, using one of the following operators

== is equal to (distinguished from a single = which declares a variable equal to

something, rather than checking if it is or not)

< is less than

> is greater than

<= is less than or equal to

>= is greater than or equal to

!= is not equal to

 

So, a simple section of script might run:

python 4

 

 

If you want to try running this script, by the way, select “new file,” from the file menu on the top of your Python shell, enter the text of the script in the text editor window which opens, save the file, and click “run module,” on the “run,” menu.

 

 

Python 5

Another useful evaluative operation, relevant to lists is the command “in,” which tests whether a given data element is in a particular list. For instance, a program which tested if a letter were upper case might run – not that the script also demonstrates the use of an else, statement, which allows you to create a block of code which will activate only if the if-statement is not true.

python 6

 

 

Note that lists can be spaced out over multiple lines like this if it is more convenient, though they may also be put on a single line – the commas are necessary in either case.

 

The preceding information is relatively bare-bones. However, it is already enough to do a great deal. The following section will illustrate a problem which can be solved entirely using the above commands. Some descriptions of commands will be reiterated in the narrative below, as reminders, but you may want to refer back up to this section if you are having difficulty understanding particular commands.

 

 

6. Sample Program

 

Note: Commentary in this section will be written in blue to distinguish it from code.

 

Let us suppose that you wanted to have a program that could rapidly translate between Centigrade and Fahrenheit degrees of temperature. How could you go about this, using the above commands (bearing in mind that the relevant equation is C=(5/9)(F-32). If you are feeling reasonably confident about the information above, try scripting your own. If you would rather not try that yet, look through the script below and see if you can follow it.

Python7

 

 

In case you were having trouble following that, let’s go through it again section by section.

python 8

 

 

Here, we are printing out instructions for the user, and requesting a user input which we will store in the variable “corf,” which the program can later use to determine which conversion it needs to make. We use the “raw_input()” command to ensure that the input will be treated as a string.

python 9

 

The if-statement in the first line tests to see if the user input was either “c” or “C” – it is good to check both upper and lower case because Python is case sensitive, and users are unpredictable. The second line prints an instruction for the user to input a number of degrees, which the third line prompts for, and stores in the variable “degreesc.” This line also uses the command float() around the raw_input() command, to convert the input from a string into a floating point number.

The forth line, “degreesf=degreesc*(9.0/5.0)+32.0 calculates the equivalent value in degrees Fahrenheit of the input in degrees Centigrade. It is important to include the “.0” for the numbers, because Python interprets whole numbers without “.0” as integers, which Python divides by truncating all the numbers after the decimal point (so that 9/5 would yield 1, which would throw off the value considerably).

The final two lines print first and identifier (“Degrees Fahrenheit=”) for the users benefit, and then the actual value in degrees Fahrenheit, which the program earlier stored in the variable “degreesf.”

Python 10

 

The second block of code does for the choice of Fahrenheit to Centigrade exactly what the first block did for the choice of Centigrade to Fahrenheit. This kind of repetition with slight variation is extremely common across many kinds of script.

 

 

7. Going Forward

 

Get all that? Well, if you have never looked at a Python script before, maybe not. But that’s fine. With luck, you will have gotten enough to start messing around, if you want to. If you do, here are a couple of problems which you might try to solve, to get used to the language.

 

  1. Try to modify the above script, or write a whole new script, which can also handle conversions to and from degrees Kelvin.
  2. At the moment, if a careless or malicious user entered a letter other than C or F in the above script, the program would simply fail to respond. Try to modify the above script to produce an error message if this occurs.
  3. Try to write a script which can convert between meters, centimeters, feet, and inches.

 

Or, better still, just do something else entirely, even if it is wholly trivial. Once you can use a scripting language to solve trivial problems, it is only a matter of time before you start using it to solve significant ones, and shifting from strictly mathematical manipulation to manipulation with greater applicability for the study of the humanities.

 

 

Thanks to Renee and Steve Symonds for programming advice on aspects of this entry – readers may be assured that the errors, however, are entirely mine.

Blog Post By: Tobias Hrynick, HASTAC Scholar

 

Social Media and Collaboration in the Digital Age

As the year draws to a close, and the Fordham Graduate Student Digital Humanities group looks forward to another semester of workshops, talks, and wrestling with computers, we realize that we somehow forgot to blog about one of our most successful events. So, here is our ‘better late than never’ post!

On Mon Nov 9, fifteen or so students gathered for a panel on Social Media and Collaboration in the Digital Age, presided over by Erin Glass (Digital Fellow and PhD student in English at the CUNY Graduate Center), Evan Misshula (PhD student in Criminal Justice at the CUNY Graduate Center), and Boyda Johnstone (PhD Candidate in English and Campus Digital Scholar at Fordham University). Each presenter took a different approach to the theme, though the panel worked together cohesively as an invigorating introduction to the collaborative possibilities offered by digital technology and social networking sites.

Glass introduced us to her NEH-funded project Social Paper, which she describes as a “site of radical potential” for student writing. The project takes as its basis the acknowledgement that most undergraduate writing is ephemeral and read by almost no one, while network-writing might help students write more, and better. Ongoing feedback and evaluation from a small group of peers is more useful for intellectual development than one-off feedback from a teacher at the end of a project. Moreover, ethical problems arise when teachers publish student writing on public course blogs, creating an archive of work that could feasibly last forever. Social Paper, in response, is a cloud-based networking writing environment that grants students full control over their own privacy settings, facilitating archived peer commentary for multiple courses, and helping students become invigorated and inspired by peer observance and critique. The “egalitarian peer pedagogy” of this project fosters student empowerment and a culture of healthy accountability over and responsibility for one’s work. Social Paper officially launched in December, and Glass hopes that it will soon become available to institutions beyond CUNY.

Misshula’s talk aimed to render complex technological tools more accessible to students who don’t come from computer science backgrounds, observing that, as he says, anyone can write computer programs! A list of free programming manuals can be found here, and other resources to which Misshula introduced us include The Programming Historian, which provides an introduction to Python, and the website Hack*Blossom, engaged in issues of cybersecurity, feminism, tech, and history. He contends that more communication and collaboration between digital humanists and computer scientists would be mutually beneficial. Offering himself as a resource for those in the audience who had ideas for digital projects, Misshula used the Mozilla-powered tool Ethernet to allow participants to communicate with one another and share tools in real-time.

Johnstone moved the conversation more into the realm of social media with her talk, “Using Twitter as a Professional Tool.” It is now widely agreed that Twitter can be a useful resource for collaboration, networking, and the sharing of ideas, and for those just starting out in the academic Twitter world, Johnstone shared some advice for who to follow: Screen Shot 2015-12-31 at 5.37.04 PM.png

She also introduced us to the application Tweetdeck, which attempts to remedy the linear, rapid progression of Twitter by allowing users to create and maintain multiple columns of feed: Screen Shot 2015-12-31 at 5.40.47 PM.png

One of the cool things about Tweetdeck is that users can assemble together tweets based around individual interests, so, for instance, Johnstone has created a column for tweets that pertain to the topic of her dissertation, on medieval dreams. She ended her talk by offering some advice for proper Twitter etiquette at academic conferences, as based on her previous blog post published here: always ensure each tweet contains attribution for its ideas, treat tweeting as a conversation rather than a monologue, be aware of the physical space you inhabit while tweeting at a conference panel, and try not to sacrifice complexity for simplicity.

We at FGSDH look forward to another year of open-source, interdisciplinary collaboration, pedagogical enhancement through online tools, and digital project-building!

 

 

 

Preparing for the Future of DH

♫ Christmas is coming, the goose is getting fat. ♫
♫ Please put a penny in the old man’s hat. ♫
♫ If you don’t have a penny–

You are probably a graduate student in the humanities.

If you are a graduate student in the humanities, you probably have some level of anxiety about the academic job market. If you’re early on in your program, that anxiety might look something like this:

1402090800633

or if you’re in the fifth circle–ah, excuse me, fifth year–the job market might make you feel more like this:

tumblr_ni9xp5lynm1qg20oho4_500

In the spirit of the holiday season, the FGSDH Group aimed to ease this job-market-induced-anxiety by offering a “DH For the Job Market” panel with Professor Angela R. Bennett Segler (University of Nevada-Reno) and Professor Jean Graham (SUNY Stonybrook). Professors Segler and Graham, recent hires in the digital humanities, offered words of advice for tackling the job-market challenges of the twenty-first century.

If you missed the event, don’t despair, dear graduate student! Here are several helpful tidbits from the panel:

  • Read between the lines of a job-listing: some institutions are happy with a DH-curious candidate, others want a full-fledged DH-wizard.
  • Make friends with techies. Most DH work thrives from interdisciplinary collaboration.
  • If you get an on campus interview, ask who will be there–your answers should be catered to this audience: folks in IT will want a different perspective on your work than people in the humanities.
  • Learn one programming language (even if you won’t use it all the time).
  • Be ready to give an opinion on innovative teaching during your interview.
  • When you are asked about future projects/research/work within a specific university, think about what makes that university stand apart–think outside of your discipline–think about the identity of the university. Give an answer that takes into account the full range of work within the university.
  • DH Positions open up year-round and aren’t always listed in traditional venues. Digital job postings can be found at Digital Humanities Quarterly, Digital Humanities Now, HERC, and University Affairs (if you’re willing/desperate to relocate to Canada).

In the end, Professors Segler and Graham dialed down the hellfire of job market and encouraged a brighter perspective on academic work. After all, as Professor Graham emphasized, the digital humanities is simply the humanities (which is the best answer to the standard DH job question everyone asks). As academics, we can enhance our research with well-selected digital tools. We can help our students think critically about and write effectively in the world around them.

Within this drive toward new literacies and digitization, however, we shouldn’t lose sight of the core of humanistic inquiry. If we are to understand our role as humanists (as we established in a previous workshop with Alex Gil) to be “stewards of human memory,” digital tools can help us arrange, collect, and share this “memory” in new ways. A favorite take-away from this panel is the idea that “understanding James Joyce can teach us more about the Digital Humanities than the Digital Humanities can teach us about James Joyce.” Digital scholarship can work in surprising, multidirectional ways.

In Media Res #2: An Hour-Long Tour of the Digital Humanities in New York City

Blog post by Tobias Hrynick

On Tuesday, November 19th, the New York City Digital Humanities group held their second Media Res session at New York University’s Bobst Library. The hour-long session was packed with twelve five-minute flash presentations, interspersed with two question and answer periods. Speakers came from Fordham, NYU, CUNY Graduate Center Columbia, and Stony Brook, and presented on topics related to digital mapping, internet archiving, text mining and text analysis, from historical, literary, anthropological and theatrical perspectives. Afterwards, presenters and other attendees stayed for conversation and refreshments, courtesy of funds provided by the Fordham GSA.

The five-minute format of the talks and the wide range of subjects involved made the session a whirl-wind tour of active lines of inquiry in the digital humanities in New York City, rapidly demonstrating a wide variety of digital tools and potential applications. As the name of the session implies, all of the presentations were on projects which are still ongoing – the question and answer periods and the informal gathering following the presentations were used to exchange ideas and share experience, to help further the projects.

Brief summaries of all the talks are listed below. Those interested in the Media Res program can find information on the first session here.

 

Uncanny Seduction: Masculinity, Pickup Artists, and the Uses of Social Media in Social Skills Trainings Communities

Anders Wallace – CUNY Graduate Center, Department of Anthropology

Anders Wallace presented on a project in which he is analyzing the text archives of forums discussing techniques of seduction, examine collectively constructed conceptions of masculinity, and analyzing the networks of forum users in terms of production and influence, as measured through interactions with other forum-goers. Wallace further explored changes in forum activity over time, influenced by the growth of a monetized industry in competition with informal forums. Technically, Wallace discussed Python as a tool for word analysis, based on positive and negative word valances.

 

TWiC (Topic Words in Context)

Jonathan Armoza – NYU, Department of English

Jonathan Armoza presented on a project relating to his Master’s thesis work, in which he is using MALLET (MAchine Learning for LanguagE Toolkit) to explore the works of Emily Dickinson. Armoza is examining the corpus of Emily Dickinson poems, distinguishing topics as indicated by frequently linked words, and relating these topics to the understanding of Emily Dickonson’s poems in traditional scholarship.

 

Exploring Place in the French of Italy

Heather Hill and Tobias Hrynick – Fordham, Medieval Studies Program

Heather Hill and Tobias Hrynick presented on a Fordham Medieval Studies project to examine the corpus of texts written in French on the Italian peninsula during the middle ages. The project mapped place names mentioned in the texts using CartoDB and is presenting them through an Omeka website, contextualized with essays, and “micro-essays,” containing brief observations on patterns present in maps, designed to encourage engagement with the visualizations by site users.

 

East of East: Mapping Community Narratives in South El Monte and El Monte

Nicholas Juravich and Daniel Morales – Columbia, Department of History

Nicholas Juravich and Daniel Morales presented on a project to support the collection and presentation of history in the communities of El Monte and South El Monte, in Los Angeles County, California. The project explores the possibility of a communal digital space in which collect and display materials assembled through outreach to the community, and to present resources geographically using Omeka Neatline.

 

CUNY Syllabus Project

Andrew McKinney – CUNY Interdisciplinary Project

Andrew McKinney gave a presentation planned in collaboration with Laura Kane on a project to create a central database of crowd-sourced CUNY syllabi, with tools to search and visualize syllabi, allowing general trends to be revealed without making accessible specific personal information from the text of the syllabi. The project is exploring ways in which the process of integrating syllabi into the database might be automated. More information on the project is available here.

 

Reading histories of New York City women, 1789-1805: The case of the missing Gothic novels

Sara Partridge – NYU Department of English

Sara Partridge presented on a project to organize lending records of the New York Society Library, the oldest lending library in the state of New York into a relational database, and to present this data as a website using the Collective Access Content Management System. Partridge discussed her findings, particularly pointing out the way in which novels, though acquired by the library and commonly borrowed, particularly by women, were less privileged in institutional records, often distinguished only by genre and not, like non-fiction books, also by title.

 

The Independent Crusaders Project

Heather Hill and Alexander Profacci – Fordham Medieval Studies Program

Heather Hill and Alexander Profacci presented on a new project of Fordham’s Medieval Studies program in collaboration with Dr. James Doherty of Lancaster University. The project is intended to process and visualize information from charters with information concerning crusaders who departed for the Latin East outside of formal, large-scale crusading expeditions. The project is intended to create an Omeka based website to display this information, and to house CartoDB maps of the points from which Crusaders departed.

 

GIT-Lit

Johnathan Reeve – Columbia, Department of English

Jonathan Reeve presented on a project designed to avoid the difficulties in maintaining central digital text archives over the long term. Instead of centrally housing its texts, GIT-Lit intends to house its texts in a dispersed manner, using GIT-Hub. The project is working to digitize a corpus of scanned texts from the British Library, and to develop a system for automating text uploads. More information on this project is available here.

 

Reading as Navigation: Mapping the Spatial Affordances of the American Novel

David Rodriguez – Stony Brook, English

David Rodriguez presented on a project intended to incorporate cowpony (place name) mapping into literary studies in a novel way, blurring the line between creating and analyzing artistic works, and emphasizing the narrative rather than static aspects of space in literature. Rodriguez’s project incorporates texts from a corpus of American novels, and generates visualization of points generated by following from an initial point to a place mentioned immediately after that place in a randomly selected work, and then to a place mentioned immediately after that one in another randomly selected work, and so on.

 

The Roots and Routes of Boylesque

Kalle Westerling – CUNY, Theater

Kalle Westerling described a project of examining the boylesque genre of strip-tease through an analysis of texts posted on social media. A corpus of texts was generated through an automated collection of twitter posts, on which Westerling subsequently performed topic modeling analysis. Westerling is also mapping the regions referenced in these tweets, and has emphasized the relatively itinerant nature of boylesque, against more established forms of strip-tease. More information on Westerling’s project is available here.

 

Graphic Information Systems in the Humanities

Scott Zukowski – Stony Brook, English

Scott Zukowski presented on his effort to map active nineteenth century newspapers on Omeka neatline, displaying operating periodicals at intervals between 1790 and 1850. This map is a continuation on Zukowski’s earlier work analyzing these papers, and adds a level of macro-level analysis, highlighting the way in which the sources themselves changed over time, helping to inform the ways in which they can be used – Zuckowski particularly noted the failure of small-town rural newspapers in favor of urban publications in the mid-nineteenth century, which might represent the growth of rapid rail transportation, or a growth of the cultural influence of metropolitan culture. Zuckowski intends to expand his data-set to take into account and display the varying political stances of different papers.

Imagining Digital Pedagogy at Fordham

This is your life:

images

You just finished teaching your American History class. You slam-dunked a lecture on the transcontinental railroad’s influence on national commerce, communication, and territorial expansion. Students nodded, took vigorous notes, and were eager to participate in a lively discussion following your lecture. It was a good class. You think to yourself: tweed blazers with elbow-patches do help you scrutinize the past and question mainstream ideas more effectively. As you make a note to add more iron-on patches to your shopping cart on Amazon, you see a particularly eager student waiting to catch your attention after class.

This student–probably two weeks shy of declaring a history major–stays behind to tell you about her family’s connection to the U.S. railroad industry. As you wipe the dry eraseboard clean, she draws insightful connections between your lecture and her family’s experience in Tennessee. Apparently, this student’s family owned a company that helped establish, build, and expand railroad lines in the region in the 1880s. She’s excited about the connection. She wants to understand her family’s influence on railroad growth in a broader historical context. She’s eager to use the research tools you’ve helped her cultivate. You know, there might just be elbow-patches in her future.

You give a passing nod to the frazzled composition instructor who teaches in the room after you; he’s carrying a stack of freshly graded three-paragraph essays and looks tired. In the hall, you continue talking to the student, asking leading questions, and giving insights–just as you begin to encourage her to explore the topic in her final paper, you realize: “I don’t want to read that.”

Let me rephrase. It’s not a question of what you want, exactly. You care about the student’s development as a writer, and you don’t question their ability to make a convincing historical argument. Rather, this student’s project presents a genre problem. An 8-page research essay on a Tennessee railroad, regional geography, and national commerce could indeed be compelling (hell, I’d read it). Academic prose, however, might not be the most appropriate genre for communicating geographical expansion over time; papers are an inherently limited, linear format. This research is perfectly suited for something more dynamic–like a digital map.

Anelise H. Shrout, Postdoctoral Fellow in Digital Studies, shared an experience similar to this in her workshop on Digital Pedagogy on Friday, October 16th. In this session, Shrout encouraged an interdisciplinary group of Fordham graduate students and staff to thoughtfully integrate digital assignments into undergraduate courses.

IMG_1119

Not only are some assignments better suited for digital media, but, according to Shrout, an online publication platform will give student work a life beyond the classroom. Student research doesn’t have to be limited to a conversation at the dry eraseboard or a document, stapled with one-inch margins. For example, if the aforementioned student created a Neatline map that tracked the growth of their family’s railroad over time, she could share her final product with her family and circulate it to people within the region of influence. Encouraging students to share the fruits of their research with people outside of academia might just spark intellectual curiosity and critical thinking in the vast elsewhere incorporated by the internet. Believe me, as a kid who grew up with spotty dial-up in the middle of nowhere, access and exposure to quality humanistic work can be transformative. And, yes, I’ll go there: if we are truly committed to “the discovery of Wisdom and the transmission of Learning” as our Jesuit mission would suggest, incorporating digital pedagogy can do a world of good.

Bringing computer power to old questions does not water-down the values humanists hold dear. Instead, digital innovation can help breathe new life into our teaching and research. As Shrout puts it, computers can help free up brain space for us and give us more mental energy to tackle big questions. Why not help our students understand humanistic inquiry through, against, and alongside the digital media that binds many of our social networks together?

Throughout the workshop, Shrout offered useful insights on evaluation and implementation of digital projects based on her extensive experience. She warned teachers that the guidelines need to be clear and evaluation must be explicit and fair. Even if you free yourself from the mountain of three-paragraph essays, you face new obstacles of evaluation. As someone who has enthusiastically embraced digital research and pedagogy, I’m with Shrout–I think these obstacles are worth taking on.

And in case you missed it, she offered several good avenues for the hows of digital pedagogy. I challenge you to take from this grab-bag of stellar digital tools (ranking from easiest implementation to most complex):

Post by: Christy L. Pottroff