The 2017 edition of the documenta art show ended on September 17 with a slight increase in visitors, but also a financial deficit. While the danger of a discontinuation of the exhibition series seems to have been averted, many visitors (including this one) felt disappointed or at least underwhelmed with regard to the majority of art that was on display.
Like five years ago, the documenta didn’t include any proper comics as far as I could see, but lots of sequential artworks that fit Scott McCloud’s definition of comics. Here are some of them (only from the Kassel portion of the show, not from Athens which co-hosted this documenta):
Due to internet connection problems, The 650-Cent Plague had been on hiatus for some weeks, but now it’s back with another anime-related news item that I just can’t resist to share. In an otherwise serious and sad story, here’s a hilarious detail that doesn’t seem to have been picked up by any other media on the web: last month, the trial of Sascha L. from Northeim (Germany) began, a former supporter of the Islamic State who had built a bomb which he planned to use against German policemen or soldiers. This part of the story is well known and had also been reported in international media (e.g. Washington Post).
The regional daily newspaper, Braunschweiger Zeitung, revealed some details of the court hearing in an article by Johannes Kaufmann in its September 21 issue (not freely available online), including this one (my translation):
By now he [Sascha L.] would have renounced all radical plans, and he would be ready to participate in an opt-out program. Why, then, had he put up a flag of the Islamic State and an oath of allegiance to the ‘caliph’ Abu Bakr al-Baghdadi in his cell, assessor Petra Bock-Hamel wanted to know. ‘I don’t like white walls,’ was Sascha L.’s reply. After speaking to a psychologist he would have actually wanted to take the IS flag down, ‘but then there was Dragon Ball Super on television, and unfortunately I forgot about it.’ Later, judicial officers had photographed the walls of his cell – with flag and oath.
Several things are remarkable about Sascha L.’s statement, but the most striking of all is the way in which it is reported in the newspaper article: no explanation at all is given what “Dragon Ball Super” actually is. As comic experts, we know that it is a current anime series by Akira Toriyama, a sequel to his earlier series Dragon Ball / Dragon Ball Z, and even if we haven’t watched it ourselves, we have some idea what Toriyama’s art style looks like and what the story is about. But how many of the newspaper readers would know? One might have expected at least a gloss in brackets such as “… Dragon Ball Super [a Japanese animated series] on television…”, but no more is said about that subject in the article.
By leaving readers in the dark, Kaufmann relegates the nature of the TV show in question to an unimportant aspect – which it most likely is. But there are probably quite a few readers who wonder: what is this TV program that has the power to distract viewers from important tasks? And is there something about this Dragon Ball Super show that makes it particularly appealing to islamists? Then again, maybe we should be thankful for every moral panic that did not happen. One can all too easily imagine alternative newspaper headlines for the same subject along the lines of: “JAPANESE CARTOON CREATES ISLAMIST BOMBERS”…
Today we come full circle and return to comics. While most anime are adapted from manga, many original anime have been adapted into manga. Although I haven’t read that many manga based on anime, I’d like to recommend some that I found particularly interesting. As always in my comic reviews, “volumes reviewed” indicates volumes I’ve recently re-read specifically for this blog post and which the review text refers to, i.e. not counting those I’ve read only once.
Neon Genesis Evangelion (新世紀エヴァンゲリオン / Shinseiki Evangelion)
Language: German (translated from Japanese)
Authors: Yoshiyuki Sadamoto / Studio Gainax
Publisher: Carlsen (originally Kadokawa Shoten)
Years: 1999-2015 (originally 1994-2013)
Number of volumes: 14
Volumes reviewed: 1
Pages per volume: ~165
Price per volume: € 6,00
Website: https://www.carlsen.de/serie/neon-genesis-evangelion/18147 (German)
I’ve never quite got my head around why Evangelion has become such a cult anime series. Its popularity might be due to having done a lot of things right at the right time. (For more on this aspect, see Sean O’Mara’s blog post on the early years of Studio Gainax.) Looking at the manga (drawn by Gainax character designer Yoshiyuki Sadamoto), there are two actual assets that Evangelion has going for it:
- Shinji the emo kid: in the distant future of the year 2015, this troubled teenage protagonist has some issues that quite a few readers of today can probably relate to. On the very first page, Shinji thinks, “I don’t have any dreams, hopes or anything like that. […] That’s why I thought, I didn’t care if I had an accident or died.”
But then he gets to pilot a mecha…
- Mecha design: at its core, Evangelion is still a story about giant robots, and as such, it has to feature mechas that look cool. And they do. The biomorphic or humanoid shape of the EVAs sets them apart from more angular designs in e.g. Mobile Suit Gundam or Transformers.
That being said, there are also many silly ideas in this manga, both in story and design, and a plot that verges on a tedious ‘monster of the week’ pattern. Things get more interesting from around vol. 5 on, when a conspiracy within NERV (the organisation operating the EVAs) is gradually revealed.
Ame & Yuki / Wolf Children (おおかみこどもの雨と雪 / Ōkami kodomo no Ame to Yuki)
Language: German (translated from Japanese)
Authors: Mamoru Hosoda / Yū / Yoshiyuki Sadamoto
Publisher: Tokyopop (originally Kōdansha)
Years: 2013-2014 (originally 2000)
Number of volumes: 3
Volumes reviewed: 1
Pages per volume: 155 (vol. 1-2) / 210 (vol. 3)
Price per volume: € 6,95 (box set: € 16,95)
Website: http://tokyopop.de/programm-winter-2013-2014/ame-und-yuki-die-wolfskinder/ (German)
For some years, thanks to a string of successful all-ages theatrical anime films (The Girl Who Leapt Through Time, Summer Wars), it looked as if director Mamoru Hosoda was going to be ‘the next Miyazaki’, although recently his popularity seems to have been eclipsed by Makoto Shinkai’s. The 117 minutes of Hosoda’s 2012 film Wolf Children (original script by Hosoda himself, character design by the aforementioned Yoshiyuki Sadamoto) have been adapted into a >500 page manga drawn by a newcomer artist who calls herself Yū (優).
In the beginning, the narration seems very fast-paced, as we witness in quick succession how university student Hana falls in love with a fellow student who turns out to be a werewolf, the birth of their two children, and the death of the werewolf guy. But this isn’t the story of Hana, it’s the story of her two children who grow up with the secret of being werewolves too, and who ultimately (in later volumes) have to decide whether they want to spend their lives as humans or as wolves. The supernatural element of the werewolf transformations are neither satisfactorily explained nor excitingly depicted, but as an emotional drama manga, Ame & Yuki works really well.
FLCL (フリクリ / Furi Kuri)
Language: German (translated from Japanese)
Authors: Studio Gainax / Hajime Ueda
Publisher: Carlsen (originally Kadokawa Shoten)
Year: 2003 (originally 2002)
Number of volumes: 3
Volumes reviewed: 1
Pages per volume: 192
Price per volume: € 6,00
Website: https://www.mangaupdates.com/series.html?id=1532 (Baka-Updates)
The OVA series FLCL (Gainax / Production I.G 2000-2001) has a reputation of being one of the weirdest anime ever, and the manga adaptation lives up to that. It’s hard even to give a plot summary, because sometimes you just don’t get what’s going on, and it’s difficult to tell events that are important to the plot apart from those that are not (grandpa’s gateball match?!), and there’s a fair amount of non-linear storytelling and perhaps even unreliable narration involved. What we all can agree on, though, is that the story starts with teenager Naota getting hit in the head with a guitar by a woman on a scooter. To his surprise, he later finds this woman has moved in with his family as a housekeeper. Things become weirder and weirder for Naota as he is confronted with giant-robot attacks, an arson series, and romantic advances from two girls from his school.
All this is depicted in an art style that is really a multitude of art styles between which Ueda continually switches, often leaning to a seemingly crude look with broad, uneven outlines. A lot of the humour in FLCL operates on the verbal level – which works surprisingly well in translation -, for instance when the woman riding a Vespa scooter gets nicknamed “the wasp woman”.
Honourable mention: Some years ago I read the one-volume adaptation of Makoto Shinkai’s Hoshi no koe / Voices of a Distant Star (art by Mizu Sahara) and liked it, but I don’t have a copy at hand to read it again.
A major difference between anime and manga is the representation of dialogue: in manga it’s written in speech bubbles, whereas in anime it’s human speech recorded and played back as part of the audio track. It’s important to bear in mind that dialogue in anime is still only a representation of a fictional dialogue – we can’t actually hear an anime character’s voice; what we hear is merely an actor speaking lines in a recording studio.
That being said, individual voice actors contribute a great deal towards our perception of a character through his or her voice, in addition to scriptwriters and directors on the one hand and the dialogue director (a.k.a. Automated Dialogue Replacement (ADR) director) on the other hand. And just as with theatre actors and film actors, the distinction between voice actors and the characters they portray gets blurred in the imagination of some viewers, which is probably why the latter develop an interest in voice actors as the ‘actual people’ behind the characters.
In contrast to voice actors in other countries, Japanese seiyū cater to this public interest and, in addition to their voice acting, often become pop singers, TV actors, radio show hosts, or generally ‘media personalities’, and some even become idols. If you’d like to get a more complete picture of a seiyū and his or her media appearances, try the following procedure: look up the voice actor you’re interested in on MyAnimeList (via the entry for the anime in which you’ve come across him or her), then enter his or her name in YouTube. Search for both the romanised and the kanji form of the name, as they will often lead to different results. Here are some examples of what you might discover (some of which might have been uploaded illegally, mind you):
- Yumi Uchiyama is a prolific seiyū in her late twenties who is currently perhaps most famous for having voiced the cat spirit, Puck, in Re:Zero, although personally I found her performance as Top Speed, a cackling witch in Magical Girl Raising Project, more memorable. She also performed many anime theme songs – here’s a live performance of “Next Legend” (written by ZAQ) from the Saki Achiga-hen anime:
- Hiroshi Kamiya is a veteran voice actor who has performed in a staggering number of famous anime such as One Piece and Attack on Titan, but also in less well-known anime like Fune wo Amu in which his mischievous supporting character provides a striking contrast to the earnest protagonist. Together with his colleague Daisuke Ono, Kamiya hosts a radio show called Dear Girl Stories of which there are episodes with English subtitles:
- Konomi Kohara is the youngest of the three and has starred in her second main role (in Tsuki ga Kirei) only this year, so there’s not much on YouTube except for this one talk show appearance that was uploaded multiple times. It’s hard to figure out what they’re talking about if you’re not fluent in Japanese (no subtitles here), but at least you can hear how similar Kohara’s way of speaking is to her very natural-sounding, sometimes ‘breathless’ voice acting performance in Tsuki ga Kirei:
This is the second blog post of a series on the occasion of ‘100 Years of Anime’. Read the first post here.
On this day three months ago, the memorial service for Jaden F. was held in Herne, Germany. Jaden had been the first of two victims stabbed to death by Marcel H., whom the media has linked to anime. One German news magazine in particular, Stern (No. 12, March 16), has emphasised the ostensible connections of the murders to anime.
The events were also covered by international media (e.g. Daily Mail, Telegraph, Independent), but none of them even mentioned anime. Therefore, the (thankfully limited and short-lived) ‘moral panic’ regarding anime doesn’t seem to have reached the Anglophone anime blogosphere either, which is why I’ll sum up the story here.
These are the facts: Marcel H. is a 19-year old NEET who had unsuccessfully applied to join the Army in February. On March 6, he lured the nine-year old neighbours’ son into his house and killed him with a knife. Then he went to an acquaintance’s, 22-year old Christopher W., and killed him early in the morning on March 7. Marcel H. stayed at Christopher W.’s apartment until March 9, when he set it on fire, went to a Greek diner, told the owner to call the police, and let himself be arrested.
So far, these events have nothing to do with anime. But Barbara Opitz and Lisa McMinn, the authors of the Stern article, point out the following details: when Marcel H. was arrested at the diner, he carried an umbrella and a bag of onions with him. These items are mentioned in other news articles too, but only Stern offers an explanation, according to which the umbrella and the onions refer to two cards from the Yu-Gi-Oh! Trading Card Game, “Rain of Mercy” and “Glow-Up Bulb” (“Aufblühende Blumenzwiebel” in German; “Zwiebel” can also mean “onion”), respectively. Furthermore, on one of the pictures Marcel posted online on which he poses with a knife, a poster of the anime series Yu-Gi-Oh! GX can be seen in the background. (Interestingly, in the Daily Mail article, the image – pictured below on the right hand side – was altered so that the poster doesn’t refer to Yu-Gi-Oh! anymore.)
Another connection to Yu-Gi-Oh! is Christopher W., Marcel H.’s second victim, who ran a Yu-Gi-Oh! site on Facebook; apparently they got to know each other through the game and used to play Yu-Gi-Oh! video games together. Finally, Stern points out that there are two characters in the Yu-Gi-Oh! anime with the same first names as Marcel H. and Jaden F.: Yu-Gi-Oh! GX protagonist Jaden Yuki and his antagonist Marcel Bonaparte. Stern implies that Marcel H. identified with the villain and acted out the Yu-Gi-Oh! story by attacking Jaden. The only detail that doesn’t quite fit is that the Stern article also says that Marcel H. had been learning Japanese in order to be able to read manga and watch anime in their original language; in the Japanese original version of Yu-Gi-Oh! GX, however, Jaden is called “Jūdai” and Marcel “Marutan” or “Martin”.
Apart from the Yu-Gi-Oh! connection, there’s not much that links Marcel H. to anime. Some chat messages have surfaced in which Marcel H. talks to another person about the murders at the time when he committed them, and in one message he says, “See you space cowboy”, which indeed is a quote from the anime Cowboy Bebop.
The other things mentioned in the Stern article are rather vague connections to Japan than to anime specifically: at the time of committing the murders, Marcel H. posted a picture of a handwritten note on which he had signed his name in Japanese, and he owned “bamboo swords which he kept under his bed like a treasure. Furthermore a wooden bow and five Japanese ceremonial knives” (all translations mine).
The sad and disturbing thing (apart from the murders themselves, of course) is how Stern chose to focus on Marcel H.’s anime fandom, instead of e.g. his obsession with martial arts, computer games, or 4chan (as other news outlets did, sometimes inaccurately calling it “darknet”). For instance, the entire Stern article is titled, “‘Viel Spaß in der Anime-Welt” (“‘Have Fun in the Anime World'”), which isn’t even a quote by Marcel H. but by his unnamed chat partner. The way in which the Stern authors desperately try to link the content of anime to the murderer is simply journalistically unethical: “‘Space Cowboy’ refers to a character from the anime series, ‘Cowboy Bebob’ [sic], in which a hero says sentences like this one: ‘I don’t go to die, but to find out if I’m still alive.’ Marcel H. is obsessed with the world of anime, Japanese animated films, often dark dystopias, the protagonists have spiky hair and shiny, big eyes. […] the heroes […] are often outsiders, but with hidden powers. Quirky, awkward and at the same time infallible. Outsiders like Marcel H.”
Luckily, the Stern article has failed to start a witch hunt on anime fans like the ones that e.g. video gamers and heavy metal fans have had to endure in past decades. But the article shows that anime has still a long way to go before it can be said to be part of the mainstream.
Last month, “the most comprehensive exhibition about the genre to be held in Germany” opened at the venerable Bundeskunsthalle in Bonn, where it can be visited until September 10. Curated by Alexander Braun and Andreas Knigge, it is a remarkable exhibition, not only because of its size (300 exhibits) but also because it tries to encompass the whole history of comics without any geographic, chronological or other limits. To this end, it is organised in six sections.
The first section is about early American newspaper strips. The amount of original newspaper pages and original drawings on display here would be impressive if there hadn’t been another major exhibition on the same topic not even a year ago. Still, it’s always interesting to see e.g. a Terry and the Pirates ink drawing alongside the corresponding printed coloured Sunday page (July 24, 1942). Another highlight in this section is an old Prince Valiant printing plate, or more precisely, a letterpress zinc cliché which would be transferred on a flexible printing plate for the cylinder of a rotary press, as the label in the display case explains.
Section 2 stays in the US but moves on to comic books. In its first of two rooms we find mainly superhero comics, again often represented through original drawings e.g. from Watchmen or Elektra: Assassin. The second room of this section is about non-superhero comic books; outstanding exhibits here are the complete ink drawings to two short stories: a 7-page The Spirit story by Will Eisner from July 15, 1951, and a 6-page war story from Two-Fisted Tales by Harvey Kurtzman from 1952.
The next section of the exhibition is dedicated to Francobelgian comics. There’s an interesting display case with a side-by-side comparison of the same page of Tintin in various original and translated editions, and there are also original drawings by Hergé, but perhaps even more impressive is an original inked page from Spirou et Fantasio by Tome and Janry, who revitalised the series in the 80s. In the same section, half a room contains examples of old German comics, both from East and West Germany.
And then we get to section 4, the manga section. The biggest treat here are several Osamu Tezuka original drawings from Janguru Taitei, Tetsuwan Atomu and Buddha. There’s original Sailor Moon art by Naoko Takeuchi as well. Most of the other exhibits, however, are from manga that are far less famous, at least outside of Japan. In this section there’s also the only factual error I found in the exhibition: a label on Keiji Nakazawa’s Hadashi no Gen says, “Barefoot Gen is one of the earliest autobiographical comics ever.” While Hadashi no Gen was certainly inspired by Nakazawa’s own experiences, it is a fictional story, not an autobiography – that would be Nakazawa’s earlier, shorter manga, Ore wa Mita.
Section 5 is about underground and alternative comics from both the US and Europe. The highlight here is the famous Cheap Thrills record by Big Brother and the Holding Company, which can be listened to via headphones. Most comics enthusiasts are familiar with the record cover by Robert Crumb, but perhaps not with the music on the album.
The sixth and last section is titled “Graphic Novels”. It is already unfortunate enough to make the dreaded ‘g-word’ part of the exhibition title, but this section makes things worse by not actually problematising the term or even analysing the discourse around it. Instead, “graphic novel” is meant here to comprise a vast range of contemporary comic production, including Jirō Taniguchi’s manga, pamphlet comic books such as Eightball and Love & Rockets, and Raw magazine.
The exhibition as a whole offers a lot of interesting things to see, but maybe its aim to represent the whole comics medium was too ambitious in the first place. Nowadays, no one would dare to make an exhibition about the whole history of film, or photography, but apparently comics are still considered peripheral enough that the whole medium can be squeezed into one wing of a museum. The general public, at whom this exhibition is presumably targeted, will probably discover many new things about comics, but for people who are already comic experts, the knowledge to be gained from this exhibition will be much smaller.
Latent Dirichlet Allocation (LDA) is one of the most popular algorithms for Topic Modeling, i.e. having a computer find out what a text is about. LDA is also perhaps easier to understand than the other popular Topic Modeling approach, (P)LSA, but even though there are two well-written blog posts that explain LDA (Edwin Chen’s and Ted Underwood’s) to non-mathematicians, it still took me quite some time to grasp LDA well enough to be able to code it in a Perl script (which I have made available on GitHub, in case anyone is interested). Of course, you can always simply use a software like Mallet that runs LDA over your documents and outputs the results, but if you want to know what LDA actually does, I suggest you read Edwin Chen’s and Ted Underwood’s blog posts first, and then, if you still feel you don’t really get LDA, come back here. OK?
Welcome back. Disclaimer: I’m not a mathematician and there’s still the possibility that I got it all wrong. That being said, let’s take a look at Edwin Chen’s first example again, and this time we’re going to calculate it through step by step:
- I like to eat broccoli and bananas.
- I ate a banana and spinach smoothie for breakfast.
- Chinchillas and kittens are cute.
- My sister adopted a kitten yesterday.
- Look at this cute hamster munching on a piece of broccoli.
We immediately see that these sentences are about either eating or pets or both, but even if we didn’t know about these two topics, we still have to make an assumption about the number of topics within our corpus of documents. Furthermore, we have to make an assumption how these topics are distributed over the corpus. (In real life LDA analyses, you’d run the algorithm multiple times with different parameters and then see which fit best.) For simplicity’s sake, let’s assume there are 2 topics, which we’ll call A and B, and they’re distributed evenly: half of the words in the corpus belong to topic A and the other half to topic B.
What exactly is a word, though? I found the use of this term confusing in both Chen’s and Underwood’s text, so instead I’ll speak of tokens and lemmata: the lemma ‘cute’ appears as 2 tokens in the corpus above. Before we apply the actual LDA algorithm, it makes sense to not only tokenise but also lemmatise our 5 example documents (i.e. sentences), and also to remove stop words such as pronouns and prepositions, which may result in something like this:
- like eat broccoli banana
- eat banana spinach smoothie breakfast
- chinchilla kitten cute
- sister adopt kitten yesterday
- look cute hamster munch piece broccoli
Now we randomly assign topics to tokens according to our assumptions (2 topics, 50:50 distribution). This may result in e.g. ‘cute’ getting assigned once to topic A and once to topic B. An initial random topic assignment may look like this:
- like -> A, eat -> B, broccoli -> A, banana -> B
- eat -> A, banana -> B, spinach -> A, smoothie -> B, breakfast -> A
- chinchilla -> B, kitten -> A, cute -> B
- sister -> A, adopt -> B, kitten -> A, yesterday -> B
- look -> A, cute -> B, hamster -> A, munch -> B, piece -> A, broccoli -> B
Clearly, this isn’t a satisfying result yet; words like ‘eat’ and ‘broccoli’ are assigned to multiple topics when they should belong to only one, etc. Ideally, all words connected to the topic of eating should be assigned to one topic and all words related to pets should belong to the other. Now the LDA algorithm goes through the documents to improve this initial topic assignment: it computes probabilities which topic each token should belong to, based on three criteria:
- Which topics are the other tokens in this document assigned to? Probably the document is about one single topic, so if all or most other tokens belong to topic A, then the token in question should most likely also get assigned to topic A.
- Which topics are the other tokens in *all* documents assigned to? Remember that we assume a 50:50 distribution of topics, so if the majority of tokens is assigned to topic A, the token in question should get assigned to topic B to establish an equilibrium.
- If there are multiple tokens of the same lemma: which topic is the majority of tokens of that lemma assigned to? If most instances of ‘eat’ belong to topic A, then the token in question probably also belongs to topic A.
The actual formulas to calculate the probabilities given by Chen and Underwood seem to differ a bit from each other, but instead of bothering you with a formula, I’ll simply describe how it works in the example (my understanding being closer to Chen’s formula, I think). Let’s start with the first token of the first document (although the order doesn’t matter), ‘like’, currently assigned to topic A.
Should ‘like’ belong to topic B instead? If ‘like’ belonged to topic B, 3 out of 4 tokens in this document would belong to the same topic, as opposed to 2:2 if we stay with topic A. On the other hand, changing ‘like’ to topic B would threaten the equilibrium of topics over all documents: topic B would consist of 12 tokens and topic A of only 10, as opposed to the perfect 11:11 equilibrium if ‘like’ remains in topic A. In this case, the former consideration outweighs the latter, as the two factors get multiplied: the probability for ‘change this token to topic B’ is 3/4 * 1/12 = 6%, whereas the probability for ‘stay with topic A’ is 2/4 * 1/11 = 4.5%. We can also convert these numbers to absolute percentages (so that they add up to 100%) and say: ‘like’ is 57% topic B and 43% topic A.
What are you supposed to do with these percentages? We’ll get there in a minute. Let’s first calculate them for the next token, ‘eat’, because it’s one of those interesting lemmata with multiple tokens in our corpus. Currently, ‘eat’ in the first document is assigned to topic B, but in the second document it’s assigned to topic A. The probability for ‘eat stays in topic B’ is the same as the same as for ‘like stays in topic A’ above: within this document, the ratio of ‘B’ tokens to ‘A’ tokens is 2:2, which gives us 2/4 or 0.5 for the first factor; ‘eat’ would be 1 out of 11 tokens that make up topic B across all documents, giving us 1/11 for the second factor. The probability for ‘change eat to topic A’ is much higher, though, because there is already another ‘eat’ token assigned to this topic in another document. The first factor is 3/4 again, but the second is 2/12, because out of the 12 tokens that would make up topic A if we changed this token to topic A, 2 tokens would be of the same lemma, ‘eat’. In percentages, this means: this first ‘eat’ token is 74% topic A and only 26% topic B.
In this way we can calculate probabilities for each token in the corpus. Then we randomly assign new topics to each token, only this time not on a 50:50 basis, but according to the percentages we’ve figured out before. So this time, it’s more likely that ‘like’ will end up in topic B, but there’s still a 43% chance it will get assigned to topic A again. The new distribution of topics might be slightly better than the first one, but depending on how lucky you were with the random assignment in the beginning, it’s still unlikely that all tokens pertaining to food are neatly put in one topic and the animal tokens in the other.
The solution is to iterate: repeat the process of probability calculations with the new topic assignments, then randomly assign new topics based on the latest probabilities, and so on. After a couple of thousand iterations, the probabilities should make more sense. Ideally, there should now be some tokens with high percentages for each topic, so that both topics are clearly defined.
Only with this example, it doesn’t work out. After 10,000 iterations, the LDA script I’ve written produces results like this:
- topic A: cute (88%), like (79%), chinchilla (77%), hamster (76%), …
- topic B: kitten (89%), sister (79%), adopt (79%), yesterday (79%), …
As you can see, words from the ‘animals’ category ended up in both topics, so this result is worthless. The result given by Mallet after 10,000 iterations is slightly better:
- topic 0: cute kitten broccoli munch hamster look yesterday sister chinchilla spinach
- topic 1: banana eat piece adopt breakfast smoothie like
Topic 0 is clearly the ‘animal’ topic here. Words like ‘broccoli’ and ‘much’ slipped in because they occur in the mixed-topic sentence, “Look at this cute hamster munching on a piece of broccoli”. No idea why ‘spinach’ is in there too though. It’s equally puzzling that ‘adopt’ somehow crept into topic 1, which otherwise can be identified as the ‘food’ topic.
The reason for this ostensible failure of the LDA algorithm is probably the small size of the test data set. The results become more convincing the greater the number of tokens per document.
For a real-world example with more tokens, I have selected some X-Men comics. The idea is that because they are about similar subject matters, we can expect some words to be used in multiple texts from which topics can be inferred. This new test corpus consists of the first 100 tokens (after stop word removal) from each of the following comic books that I more or less randomly pulled from my longbox/shelf: Astonishing X-Men #1 (1995) by Scott Lobdell, Ultimate X-Men #1 (2001) by Mark Millar, and Civil War: X-Men #1 (2006) by David Hine. All three comics open with captions or dialogue with relatively general remarks about the ‘mutant question’ (i.e. government action / legislation against mutants, human rights of mutants) and human-mutant relations, so that otherwise uncommon lemmata such as ‘mutant’, ‘human’ or ‘sentinel’ occur in all three of them. To increase the number of documents, I have split each 100-token batch into two parts at semantically meaningful points, e.g. when the text changes from captions to dialogue in AXM, or after the voice from the television is finished in CW:XM.
I then ran my LDA script (as described above) over these 6 documents with ~300 tokens, again with the assumption that there are 2 equally distributed topics (because I had carelessly hard-coded this number of topics in the script and now I’m too lazy to re-write it). This is the result after 1,000 iterations:
- topic A: x-men (95%), sentinel (93%), sentinel (91%), story (91%), different (90%), …
- topic B: day (89%), kitty (86%), die (86%), …
So topic A looks like the ‘mutant question’ issue with tokens like ‘x-men’ and two times ‘sentinel’, even though ‘mutant’ itself isn’t among the high-scoring tokens. Topic B, on the other hand, makes less sense (Kitty Pryde only appears in CW:XM, so that ‘kitty’ occurs in merely 2 of the 6 documents), and its highest percentages are also much lower than those in topic A. Maybe this means that there’s only one actual topic in this corpus.
Running Mallet over this corpus (2 topics, 10,000 iterations) yields an even less useful result. The first 5 words in each topic are:
- topic 0: mutant, know, x-men, ask, cooper
- topic 1: say, sentinel, morph, try, ready
(Valerie Cooper and Morph are characters that appear in only one comic, CW:XM and AXM, respectively.)
Topic 0 at least associates ‘x-men’ with ‘mutant’, but then again, ‘sentinel’ is assigned to the other topic. Thus neither topic can be related to an intuitively perceived theme in the comics. It’s clear how these topics were generated though: there’s only 1 document in which ‘sentinel’ doesn’t occur, the first half of the CW:XM excerpt, in which Valerie Cooper is interviewed on television. But ‘x-men’ and ‘mutant’ do occur in this document, the latter even twice, and also ‘know’ occurs more frequently (3 times) here than in other documents.
So the results from Mallet and maybe even my own Perl script seem to be correct, in the sense that the LDA algorithm has been properly performed and one can see from the results how the algorithm got there. But what’s the point of having ‘topics’ that can’t be matched to what we intuitively perceive as themes in a text?
The problem with our two example corpora here was, they were still not large enough for LDA to yield meaningful results. As with all statistical methods, LDA works better the larger the corpus. In fact, the idea of such methods is that they are best applied to amounts of text that are too large for a human to read. Therefore, LDA might be not that useful for disciplines (such as comics studies) in which it’s difficult to gather large text corpora in digital form. But do feel free to e.g. randomly download texts from Wikisource, and you’ll find that within them, LDA is able to successfully detect clusters of words that occur in semantically similar documents.