Flesch reading ease for stylometry?

The Flesch reading-ease score (FRES, also called FRE – ‘Flesch Reading Ease’) is still a popular measurement for the readability of texts, despite some criticism and suggestions for improvement since it was first proposed by Rudolf Flesch in 1948. (I’ve never read his original paper, though; all my information is taken from Wikipedia.) On a scale from 0 to 100, it indicates how difficult it is to understand a given text based on sentence length and word length, with a low score meaning difficult to read and a high score meaning easy to read.

Sentence length and word length are also popular factors in stylometry, the idea here being that some authors (or, generally speaking, kinds of text) prefer longer sentences and/or words while others prefer shorter ones. Thus such scores based on sentence length and word length might serve as an indicator of how similar two given texts are. In fact, FRES is used in actual stylometry, albeit only as one factor among many (e.g. in Brennan, Afroz and Greenstadt 2012 (PDF)). Over other stylometric indicators, FRES would have the added benefit that it actually says something in itself about the text, rather than being merely a number that only means something in relation to another.

The original FRES formula was developed for English and has been modified for other languages. In the last few stylometry blogposts here, the examples were taken from Japanese manga, but FRES is not well suited for Japanese. The main reason is that syllables don’t play much of a role in Japanese readability. More important factors are the number of characters and the ratio of kanji, as the number of syllables per character varies. A two-kanji compound, for instance, can have fewer syllables than a single-kanji word (e.g. 部長 bu‧chō ‘head of department’ vs. 力 chi‧ka‧ra ‘power’). Therefore, we’re going to use our old English-language X-Men examples from 2017 again.

The comics in question are: Astonishing X-Men #1 (1995) written by Scott Lobdell, Ultimate X-Men #1 (2001) written by Mark Millar, and Civil War: X-Men #1 (2006) written by David Hine. Looking at just the opening sequence of each comic (see the previous X-Men post for some images), we get the following sentence / word / syllable counts:

  • AXM: 3 sentences, 68 words, 100 syllables.
  • UXM: 6 sentences, 82 words, 148 syllables.
  • CW:XM: 7 sentences, 79 words, 114 syllables.

We don’t even need to use Flesch’s formula to get an idea of the readability differences: the sentences in AXM are really long and those in CW:XM are much shorter. As for word length, UXM stands out with rather long words such as “unconstitutional”, which is reflected in the high ratio of syllables per word.

Applying the formula (cf. Wikipedia), we get the following FRESs:

  • AXM: 59.4
  • UXM: 40.3
  • CW:XM: 73.3

Who would have thought that! It looks like UXM (or at least the selected portion) is harder to read than AXM – a FRES of 40.3 is already ‘College’ level according to Flesch’s table.

But how do these numbers help us if we’re interested in stylometric similarity? All three texts are written by different writers. So far we could only say (again – based on a insufficiently sized sample) that Hine’s writing style is closer to Lobdell’s than to Millar’s. The ultimate test for a stylometric indicator would be to take an additional example text that is written by one of the three authors, and see if its FRES is close to the one from the same author’s X-Men text.

Our 4th example will be the rather randomly selected Nemesis by Millar (2010, art by Steve McNiven) from which we’ll also take all text from the first few panels.

3 panels from Nemesis by Mark Millar and Steve McNiven

Part of the opening scene from Nemesis.

These are the numbers for the selected text fragment from Nemesis:

  • 8 sentences, 68 words, 88 syllables.
  • This translates to a FRES of 88.7!

In other words, Nemesis and UXM, the two comics written by Millar, appear to be the most dissimilar of the four! However, that was to be expected. Millar would be a poor writer if he always applied the same style to each character in each scene. And the two selected scenes are very different: a TV news report in UXM in contrast to a dialogue (or perhaps more like the typical villain’s monologue) in Nemesis.

Interestingly, there is a TV news report scene in Nemesis too (Part 3, p. 3). Wouldn’t that make for a more suitable comparison?

Here are the numbers for this TV scene which I’ll call N2:

  • 4 sentences, 81 words, 146 syllables.
  • FRES: 33.8

Now this looks more like Millar’s writing from UXM: the difference between the two scores is so small (6.5) that they can be said to be almost identical.

Still, we haven’t really proven anything yet. One possible interpretation of the scores is that the ~30-40 range is simply the usual range for this type of text, i.e. TV news reports. So perhaps these scores are not specific to Millar (or even to comics). One would have to look at similar scenes by Lobdell, Hine and/or other writers to verify that, and ideally also at real-world news transcripts.

On the other hand, one thing has worked well: two texts that we had intuitively identified as similar – UXM and N2 – indeed showed similar Flesch scores. That means FRES is not only a measurement of readability but also of stylometric similarity – albeit a rather crude one which is, as always, best used in combination with other metrics.


“Presence in Comics” article published

detail of a panel from The Ultimates #3 by Mark Millar and Bryan Hitch

Remember the conference paper I announced on this weblog in 2012? It took some time, but now this paper has been published as an article in Studies in Visual Arts and Communication – an international journal and is available online for free: http://journalonarts.org/wp-content/uploads/2015/01/SVACij-Vol1_No2_2014-delaIGLESIA_Martin-Presence_in_comics.pdf

Here’s the abstract:

The term ‘presence’ is often used to denote a trait of an artwork that causes the feeling in a viewer  that a depicted figure is a living being that is really there, although the viewer is aware that this is not actually the case. So far, scholars who have used this term have not explicitly provided criteria for the assessment of the degree of presence in a work of art. However, such criteria are implicitly contained in a number of theoretical texts. Three important criteria for presence appear to be:
1. size – the larger a figure is depicted, the more likely this artwork will instil a feeling of presence.
2. deixis – the more the work is deictically orientated towards the beholder, e.g. if figures seem to look or point at the beholder, the higher the degree of presence.
3. obtrusiveness of medium – if there is a clash of different diegetic levels within an artwork, the degree of presence is reduced.

These criteria can be readily applied to a single image like a painting or a photograph. A comic, however, consists of multiple images, and the presence of each panel is influenced by the panels that surround it by means of contrast and progression. Another typical feature of comics is written text: speech bubbles, captions etc. do not co-exist with the drawings on the same diegetic level, thus betraying the mediality of their panels and reducing their degree of presence. A comic that makes striking use of effects of presence, which makes it a suitable example here, is the superhero series The Ultimates by Mark Millar and Bryan Hitch (Marvel 2002 – 2004). The characters in this comic are often placed on splash pages and/or seemingly address the reader, resulting in a considerable experience of presence.


Upcoming talk: presence in comics

ultimates_bannerI’m going to present a paper at the conference Presence and Agency: Rhetoric, Aesthetics and the Experience of Art, which takes place in Leiden from Thursday to Saturday this week. Here’s a very short abstract for my talk:

Presence is the feeling in a viewer that a depicted figure is a living being that is really there. Theoretical literature suggests that the size of the figures and their deictical orientation are the most important factors of presence in single images. Comics, however, consist of a sequence of images, so the degree of presence in a comic panel is influenced by its surrounding panels, either by means of contrast or progression. Another typical feature of comics is written text, which betrays their mediality and consequently decreases presence. The example used here is the superhero comic The Ultimates (Marvel, 2002-2004).

I’ll post further information about this paper and its publication by and by.