This is the P2PU Archive. If you want the current site, go to www.p2pu.org!

Open Journalism & the Open Web

Week 1: Fundamentals of Journalism on the Open Web

Phillip Smith's picture
Fri, 2010-09-17 20:57

Fundamentals of Journalism on the Open Web

We are not here to tell the same stories in a different way. We are here to find new stories, using better tools, to tell better stories that are not being told now.

Topic leader:  Chris Amico, Interactive Editor at PBS NewsHour


Live meeting (audio & chat): Monday, September 20th, 1PM eastern (attendance highly recommended)

Goals:

  1. Identify areas where programmers and programmatic thinking can produce better journalism
  2. Introduce programmers to the fundamentals and values of journalism
  3. Introduce journalists to the possibilities of programming, and what can be accomplished with coders in the newsroom

 Core questions:

  1. What journalistic problems can programmers and programmatic thinking help solve?
  2. What problems can’t programming solve?
  3. What civic problems are journalists trying to solve?

 

Skills:

  1. Use machines for automation; use humans for creativity and editorial thinking
  2. Organize sprawling stories into mixable bites

 

Tools:

  1. Yahoo Pipes
  2. What else? http://toolkit.snd.org/
  3. API Playground? http://apiplayground.org/

 

Readings for journalists (pick 2):

  1. Adrian Holovaty: A fundamental way newspaper sites need to change
  2. Cindy Royal: The Journalist as Programmer
  3. Stijn Debrouwere: We’re in the information business
  4. Chicago Reader: Everybody’s a Reporter

 Reading for programmers:

  1. Jacqui Banaszynski: AIDS in the Heartland (pdf)

 

Questions to think about when writing reaction paper to the readings

  1. If you identify as a journalist: reflect on how you organize the information you gather for a story, both before and after producing it.
  2. If you identify as a hacker: read AIDS in the Heartland and reflect on what you can infer about the reporting process from the piece.

Assignments:

  1. Reading & reaction paper (due Monday, September 20th by noon eastern)
    1. If you identify as a journalist, please read 2 of the articles listed above for journalists. While you're reading, reflect on how you organize the information you gather for a story, both before and after producing it.
    2. If you identify as a programmer, please read  Jacqui Banaszynski: AIDS in the Heartland (pdf) and reflect on what you can infer about the reporting process from the piece.
    3. Write a short, 100-300 word, reaction to the piece(s) that you read and post it as a comment on this page.
  2. Larger assignment: Will be posted on Monday after the lecture.

Comments

Reflections on Jacqui

david mason's picture
david mason
Sat, 2010-09-18 19:10

Reflections on Jacqui Banaszynski's AIDS in the Heartland

Speaking honestly, I can't say I enjoyed this piece. I found it quite long (particularly since it needed to reintroduce the story for the serial format), I didn't enjoy the writing style and it didn't connect me to the individuals, and I felt in 22 pages it overemphasized some points. I found myself wondering if it might have been reduced to 140 characters.

Of course I'm being facetious. It is quite clear the writer spent a great deal of time and attention learning about the people and circumstances involved in an important period of time. It is presented as a very personal story, although the emphasized achievements of the individuals were never very clear to me. Therefore it seemed to be aimed at predisposed constituencies, although educative and humanizing for any receptive reader.

My reflections would be how journalism can engage people, with appropriate resonance, without pandering or being shallow. Stories have to be real, connecting, perhaps challenging, but they are rarely universal. With the Internet many more people can be included, so combining data with individual perspectives is most interesting to me. Hans Rosling's work combining interactive statistics with individual experience [1] works in this direction, although representations don't have the depth of a journalistic article and are limited to a few dimensions.

1. http://www.ted.com/talks/hans_rosling_reveals_new_insights_on_poverty.html

Not that I don't already

Marlon x's picture
Marlon x
Sun, 2010-09-19 05:01

Not that I don't already respect journalists, but reading AIDS in the Heartland by Banazynski made me respect an aspect of journalism which I don't often think about: the deeply human, personal storytelling, which lets us as readers connect with a world very different from our own.

My personal reaction to the story itself aside, what struck me most about the piece was how interdisciplinary an endeavor it must have been to create. This is a story that mixes medical science, regional politics, rural folk-ways, radical activism, and gay culture (to name a few) all into one coherent narrative. You could even say that despite the "old media" long-form text format, this piece is actually a collage of datapoints and vignettes from many different perspectives.

Compiling and managing all that information must be a huge task, and I can imagine two main stages. The writer first has to learn enough about various and sundry specialized fields to be able to give readers a competent account of the relevant highlights (for example: what AZT is and what it does in 1 paragraph). They then have to take all of this information presented in diverse forms and formats and distill it into a single written story.

I wonder if the first part, the learning, is transforming due to the web. Quick, easy access to specialized information is a given on the web. Perhaps more importantly, useful introductory information on virtually any subject is easily available especially in places like Wikipedia and specialized online messageboard communities. This makes it much easier for anybody to become a "15-minute expert" on a given subject.

The second part cries out for discussions of multimedia, mashup, non-linear formats for the presentation of a story. I'll be honest, I did not read every word of AIDS in the Heartland. At points I found myself scanning, picking out interesting sections and drilling into them for more detail. This is the default mode of information absorption on the web, and I wonder how a piece like this could be presented in a way that preserves its literary long-form majesty, but also acknowledges the reality of how people like me are going to interact with the story.

Both the Holovaty and

Rick Martin's picture
Rick Martin
Sun, 2010-09-19 07:01

Both the Holovaty and Debrouwere articles illustrate a point that I expect will be pretty key throughout this course. Viewing news as structured data that lends itself to repurposing seems to be the most logical way to proceed when building news sites, as Debrouwere points out in another post:

"A piece of news just doesn’t generate enough value on its day of birth to be worth the expense."

The story-centric view that Holovaty spoke of tends to result in yesterday's news being buried by today's, with few threads attached to it so we can pull it back up again. Categories and tags work to an extent, but organizing your information as Holovaty indicates ties it all together in an extensive web that lies just under the surface of your front page. Now when one string is pulled it brings many more to the top exposed.

I'm pretty fascinated by some of the sports examples mentioned so far, particularly the ones that Rich mentioned (Narrative Science for example). I have to cover a volleyball tournament next month, and I'm wondering if there might be a way to plot rally scores to graphically show lead changes and point runs at a glance. CSV format to Google Fusion Tables' line graphs work (example), but it's far from elegant. Having read these posts, I need to give more thought to how to best structure the data (should points relate to sets? sets to matches? fields for notes, players).

Perhaps it's time for some MYSQL study.

“Everybody’s a reporter” My

Matt Carroll's picture
Matt Carroll
Sun, 2010-09-19 19:31

“Everybody’s a reporter”
My takeaway is: How can existing news sites take advantage of this trend? More people are reporting – blogs, tweets, video – from the site of everything from shootings to Selectmen meetings. When New England Patriots quarterback Tom Brady was in a car wreck a week ago, The Boston Globe/boston.com used a picture on page 1 from a bystander. (Who doesn’t have a camera in their pocket these days?)

More reporters (or commentators) means a broader base of information for readers. Hopefully that translates into a more transparent society.

On the flip side, as more people report, the competition for readers’ eyeballs becomes more intense, which leads to more aggressive reporting, which leads to… what? It will be interesting to see which types of reporting/reporters predominate in the coming years.

”A fundamental way newspaper sites need to change.”
Everything Holovaty says makes sense. I do data nearly every day, and understand well how it can be used to enhance storytelling. It kills me how little data is used in the paper or online. For instance, I do a simple chart called “Snapshot” once a week that shows, for instance, avg taxes for towns in Greater Boston. The chart is popular online, for a day or two, then disappears. But the information could be recycled – or “repurposed”, as Adrian put it – endlessly in stories, pictures etc. There should be links every town a specific town is mentioned (or at least when taxes are mentioned) to the chart. That’s a simple example, but there are tons of data that could be constantly used to help flesh out people’s understandings of the complex world we live in.

“The journalist as programmer”
Two years ago I was speaking with Aron Pilhofer at a conf at MIT and we both agreed that if we could do college over again, we’d at least minor in computer science. This report does nothing but confirm my own feelings. I want to improve my programming skills. This is a trend that’s been building for two decades.

Whew. As a programmer from

Michael Roberts's picture
Michael Roberts
Sun, 2010-09-19 20:01

Whew. As a programmer from the heartland (rural Indiana in my case) Banaszynski's article was hard to read - which is of course the point. I can only imagine how much work went into it. It's easy to get lost in the notion of all that data out there that could be managed, and entirely forget the necessity to tell the human stories behind the data. Banaszynski has done just that, not simply getting to know Hanson and his SO during his final days (and after!) but obviously talking to Hanson's entire family, his preacher, even his dentist! Then coming back to us and presenting all that in a mere 22 pages of condensed humanity, telling us a little of what it was to love Hanson, a little of what it was to be mortally wounded by his openness about his condition (his younger brother Tom), a little of what this big-city disease means in a small-town context.

I understand the desire to break this presentation down into something you can digest in three seconds, like everything else around us today - but you know? You can't. Sure, you could have popup pictures of Hanson and soundbites and did-you-know text blurbs like a children's museum presentation, but short of going and living next to Hanson and his family for a year, like Banaszynski, I think you need to just sit down for the hour and read. Even that is an extremely condensed version of the true reality; condense it any more and I think you'd just lose it entirely, leaving you just with the statistics. Male, white, aged 38, dead of AIDS in Minnesota.

I am part of a generation of

Mariano Blejman's picture
Mariano Blejman
Sun, 2010-09-19 20:02

I am part of a generation of journalists who grew up having to traditional media as a reference. Or so we had to for some time. It's strange, but we attends a special period where the media are competing without any geographical boundaries, and where the loyalty of readers, the possibility of building what we call "reading contract" is questioned every day. While readint the four articles recommended by the course, I got four or five articles more. I'm not sure where they came from, but they were there. I do not know if I will ever find them when I try to remember where they were, I don't know where will I archive the materials that this course is offering to me. While looking for information for an article, beyond the conventional access to sources (interviews, reports), we get saturated information from local, regional and international uncertain sources. Many times, in the net, I honestly do not know how I got the information I was seeking, and often get information that was not looking. The possibility of working with programmers, or think as a programmer in a journalistic content, is a way to build new types of reading contracts also with digital readers: the own look. While Twitter seems to be breaking the way we relate with the news, journalists have to think about new ways to get readers to stay in front of our stories. Undoubtedly, the creation of digital tools and special crews to think content for the online world will be a long task against the inertia of traditional media, especially media peripherals such as Página/12, which I work in Argentina . However, my main concern is how can we think of new content without losing our valuable time to think for ourselves.

Holovaty and Royal are in

Fernando Alvarez's picture
Fernando Alvarez
Sun, 2010-09-19 20:40

Holovaty and Royal are in fact part of the same story. Holovaty stress the need to use the data around news. There is plenty of data in topics usually published in media. When we don´t correctly process it we are just spoiling the opportunity to give a better user experience to our readers.
By processing the data that surround our story we can give the chance to the readers to build their own mind, to understand the whole thing. Plus, when we give them access to the raw data, the creativity is unleashed. The published articles are just the first step. Readers will take the info and reuse it, creating new content. That is the way how media organitations becomes hubs of information, building a network around them.
But to get to this point some cultural changes are needed. Changes in how work flow is arranged, in how people is hired. Burocracy should be kept as low as possible, work-teams should have a flat hierarchy. Flexibility is a must and people should be self motivated to learn on the fly whatever is needed to take their projects to an end.
Work teams in media innovation departments can learn a lot from the way open-source communities get the job done. These communities are able to cooperate, to teach one another, to solve problems and to keep a continuous flow of innovation.
Now we are living a time of great changes, if we don´t use the avalaible technology we will be losing opportunities to succeed. And in a very competitive world (like ours) there are not many opportunities...

Adrian Holovaty and Stijn

Mai Hoang's picture
Mai Hoang
Mon, 2010-09-20 00:25

Adrian Holovaty and Stijn Debrouwere both have the same fundamental question and issue: What is the best way to portray information to our readers and why do news organizations believe there is only one way to do it?

In Holovaty’s 2006 piece, “A fundamental way newspaper sites need to change,” he writes that news organizations need to stop the “story-centric” worldview, or that the only way to provide content is by collecting information and putting it into a story.

During our weekly meeting, my editor asks, “What stories are you working on?” Usually, I don’t think twice about that question. But after reading Holovaty’s piece, I think that perhaps my editor should ask, “What information are you seeking and how do you plan on sharing that information once you get it?”

Here is an example: There’s been a ton of written stories about
bank failures
, but for me, The Wall Street Journal provides the information best by this regularly-updated interactive graphic.

Four years later, Debrouwere writes in “We’re in the information business” that most news organizations still have that “story-centric” worldview. He also points out that news organizations need to develop a structure, a content management system, that makes it natural for news organizations to repurpose content for a variety of platforms and consumer needs.

The key problem, Debrouwere writes, is that most content management systems are really web publishing tool. It simply takes a news story and puts it on the web, with no considerations of other new ways to present that content.

This piece, simply affirms why we need to have this course. There is a clear disconnect between content providers (journalists) and content producers (programmers). We have a real opportunity to work together to provide models that will better meet the goal in Holovaty's piece: "...more concern for important, focused information that is useful to people's lives and helps them understand the world."

***
On a separate note, regarding Matt's earlier post (see above) about minoring in computer science, when I began college a decade ago, my mother encouraged me to double major in computer science noting my interest in technology. I brushed her off, saying "I need to focus on my journalism career." Now I wish that I took her advice!

>Many times, in the net, I

David Medinets's picture
David Medinets
Mon, 2010-09-20 01:35

>Many times, in the net, I honestly do not know how I got the information I was seeking, and often get information that was not looking.

All lot of people experience this issue. There is a Firefox plugin called Zotero (http://www.zotero.org) that is trying to provide a solution to this issue. From what I can tell, it's feature-set is oriented toward research. Their motto is "Goodbye 3x5 cards, Hello Zotero".

>Perhaps it's time for some

David Medinets's picture
David Medinets
Mon, 2010-09-20 02:08

>Perhaps it's time for some MYSQL study.

Surely not. With websites like GeoCommons (http://geocommons.com/) you don't need to learn any database details. My only caveat is make sure you are not giving rights to your information when you use online sites. For example, when a mashup is created with Google Maps, Google has the right to use that data forever without payment (I heard this from a trusted source, but have not read the licenses myself.

Buried towards the end of

Sarah Laskow's picture
Sarah Laskow
Mon, 2010-09-20 03:24

Buried towards the end of Debrouwere's post is an insight that should make a believer out of anyone who's ever had to redesign a media company's website:

"Structured content is like a big undo button that allows you to reverse decision and change how your website looks and behaves."

Working on the web, parts of stories that are static in other media can move. That can create endless frustrations, or it can be an opportunity for flexibility and creativity.

Debrouwere's insight is also about what users see when they go to a website, whereas Holovaty's points, which Debrouwere's riffing off of, point the focus, at least initially, to ways of thinking about information before anyone sees it. That seems like an important distinction to me, and it comes up again in Cindy Royal's study of the New York Times' interactive news team. She reports that a project generally has "at least one front-end and one back-end person assigned to it." The front-end person has more responsibility for what users sees; the back-end person has more responsibility for organizing the information before anyone sees it. When considering how to re-jigger reporting practices to leverage the tools we have on the web, how important is it to think about the front-end and back-end separately? How much overlap is there between them? I'm not sure.

It seems like the structured content that Holovaty and Debrouwere are advocating for would have uses not just for communicating information to an audience but for reporting as well. Debrouwere talks about grouping information according to relationships between entities. I can imagine a CMS where you could tag quotes as you bold text and that would later let you easily pull up every unique quote from a certain source. This sort of information might not be useful to an audience but it could be a powerful piece of evidence to bring into an interview.

On the other hand, I think it's important to consider where these tools are most efficient. The New York Times team works on stories that lend themselves to the sort of work the team does. I'm not altogether convinced that every story or every reporter's work would reap incredible benefits from employing structured content ideas, either. How can we incorporate these ideas without creating busy work for reporters and editors (or, say, asking interns to tag every interview in every story published)?

I'll be honest. I was bored

David Medinets's picture
David Medinets
Mon, 2010-09-20 03:36

I'll be honest. I was bored by the AIDS article after the first ten pages. At 22 pages, it seemed predictable. Predictable is the wrong concept. Perhaps it was that the writing style or tempo didn't change? I wanted to see that picture of them when they were happy so that I could connect to them. I'm not sensitive by nature and so probably not the author's targeted demographic.

I'd be more interested to read about the contrast between two sets of couple - one gay and one not - and how their experiences shaped their reactions and situations.

I don't know if my comments are going to help this forum or are relevant to the class. Let me know if I head off into tangential areas. Or simply don't reply. I won't be insulted.

In 2006, Holovaty wrote 'A

Chip Oglesby's picture
Chip Oglesby
Mon, 2010-09-20 05:01

In 2006, Holovaty wrote 'A fundamental way newspaper sites need to change.' We're taking this class in 2010 and there are still some newspapers who would have never thought of this type of transition.
Most of the work we do answer the basic questions of "Who, What, When, Where, Why, and How" but we never think of storing these in any sort of structured or linked data format.
The idea of the hyperlink is great, right? You can instantly get related info on any piece of information you want, but how many papers take the time to create topic pages for the same type of information?
All information around is data and it's structured, and I believe that 'data is the answer to everything.' So why aren't stories done the same way?
To quote Mai "This piece, simply affirms why we need to have this course. There is a clear disconnect between content providers (journalists) and content producers (programmers). We have a real opportunity to work together to provide models that will better meet the goal in Holovaty's piece: "...more concern for important, focused information that is useful to people's lives and helps them understand the world.""
Mai is right, this problem is exactly why we need this course. TBL (Tim Berners-Lee) was on to something we he spoke at TED about the ideas of linked data. This is a data rich world and we're missing out.
I can also hear in my head the voice of my old boss asking "How are we going to make money off of this?" My rebuttal is always "Imagine the money we'll lose if we don't implement this functionality."

The idea that most grabbed

Michael Morisy's picture
Michael Morisy
Mon, 2010-09-20 05:53

The idea that most grabbed me, from all the four "journalist" articles, was Debrouwere's suggestion for a journalistic Markdown-like language, one that would finally capture the historical databases journalists slowly build over the months and years, helping spot trends that might otherwise go unnoted.

That piece and Holovaty's were obviously tackling a very complex, cumbersome problem, and while both were inspirational I'm not sure either really illuminated how we put the semantic into the semantic web.

Take one example: "A college graduate has a home state, a home town, a degree, a major and graduation year." This person also may or may not have parents, a job, debt, an original graduate year, a second degree, an arrest records, or millions of other data points that any given news article may or may not touch upon. And those data points can be incredibly useful, in fact critical, down the line if the paper decides to put together a LilSis or Muckity.com, for example.

If you're just putting the data away semantically for the sake of doing it, I worry you're creating busy work on an already taxed team, particularly since so many databases are started and then shortly abandoned, or else the methodology of how they are created is improved, thus breaking any sort of valid historical analysis that once could have been made.

I guess I'm just extra skeptical, because MuckRock.com has recently started juggling salary and budget data sets, and even those are so inconsistent across agencies as to be incomparable, so I wonder how we're possibly supposed to make these future datasets as anything but neat gimmicks without some serious re-working of core journalist practices.

Those questions aside (and they're definitely questions, not criticisms!), the pieces were all fantastic and thought provoking.

I realized from the readings

Tim Groves's picture
Tim Groves
Mon, 2010-09-20 06:15

I realized from the readings that a lot of the work of journalist programmers is to work on the presentation of data. Up until now, I have mostly been thinking of the value of using programming in journalism in a different way. I think that in CAR data is a tool to help you find the story, but not that the data is the story. For example the FAA database of flight was used to track the planes that flew extraordinary rendition flights, but the data just gave journalists enough to ask the right questions.

I can see the value of structured information. I often want to dig deeper into a story or see how the subject of the story compare to other things. I clearly see how longitude and latitude data can be particularly useful in telling stories, and with less clarity I can see other ways that the data can be the story. But it is sometimes hard to picture concrete examples. I am interested to see how structured data becomes a more commonly used in journalism, but I think of it as just one way to approach how programming and data can be integrated into journalism.
The piece on the New York Times gave a lot more insight into the practical way these projects work. A team like that can no doubt do amazing things. I like the model, but wonder what is possible for outlets don't have the funds and resources of the Times. I was left asking myself what the pros and cons of opening the programing side up for anyone on the web to participate in.

I read Holovaty's digital

Josh Wilson's picture
Josh Wilson
Mon, 2010-09-20 10:23

I read Holovaty's digital prescription for newspapers, and the Chicago Reader item on the expanding definition of who exactly is a journalist.

Both pieces were about how media is changing, and about the resulting social friction. (Incidentally: They also embody that change, with a rich set of reader comments adding context, inquiry, clarification and further intellectual depth.)

However, I feel that the sense of social conflict identified by Holovaty -- of intransigence on the part of the journalists -- has turned out to be fairly fleeting. The news industry has been trashed, and with so many jobs gone, all bets are off. Folks are taking up the medium with more of a sense of appetite and opportunity -- and no doubt urgency.

On the other hand, the social conflict around "who is a journalist" is underplayed in the Chicago Reader piece. Remember that Feinstein and Schumer's federal shield law is being modified in the wake of the WikiLeaks affair to include only "traditional news-gathering activities," according to the New York Times. That discussion is far from over.

The real point of Holovaty's piece is the "atomization" and remixing of discrete, characteristic data that appear in newspaper articles ... to some degree, it seems like something of an technological inevitability. The comments pointed out that this might ultimately not require a human actor, just a good script that can scrape articles.

Others pointed out that this sort of interconnectivity between datums is exactly the point of the semantic Web, though whether newspaper websites embrace the open, semantic Web or do it all on an walled or tiered in-house system remains to be seen.

Data-linked story structure is a powerful narrative tool, but it's only in an open-Web context that the structure will give readers true agency.

(An introduction: I publish

Ikaika Hussey's picture
Ikaika Hussey
Mon, 2010-09-20 11:34

(An introduction: I publish The Hawaii Independent, a new local news startup in Honolulu. We do investigative reporting, arts & entertainment, and 17 hyperlocal community pages. I handle marketing, editorial direction, and also much of our tech needs.)

I'm familiar with Holovaty and Debrouwere's writings, and have used their formulations to revamp our newsgathering practices. We started out 2 yrs ago as a simple local news blog, but I've been building out our CMS [we use ExpressionEngine] to accomodate multiple weblogs with custom fields for various data elements. So for example, we have a single weblog called "Story," which contains our basic articles. We then have a separate weblog called "People," with separate entries for the individuals in our stories, and with fields to document relationships between people, between people and corporations, dates of birth, etc. Other weblogs track "Organizations and Companies," "Places," "Events," etc.

We're in the beginning stages of implementing this new system --- the next step is boiling it down into a series of clear procedures for our journalists, and integrating it into their existing workflow.

ExpressionEngine has worked well for us, particularly given that Django (and Python, a language with which I have zero familiarity) seemed pretty daunting.

I'm looking forward to this discussion!

"Everybody’s a Reporter"

David Crandall's picture
David Crandall
Mon, 2010-09-20 14:10

"Everybody’s a Reporter" discusses how in Chicago they are redefining who is considered to be a report. The past definition of this that was very narrow and limited to the few well established new outlets. Nowadays with the advent of the internet and explosion of media outlets there is a strong need to modernize the definition of what a journalist is to be more inclusive. While I think the steps taken in Chicago are good I think it is important that there still be some limits in place. The requirement to provide proof that someone is connected to a media outlet should provide that check.

"A fundamental way newspaper sites need to change" lays the direction that the author believes news paper web sites need to go. It is important to remember most newspapers have a long history of being just that, papers. Not to long ago simply being on the web was a huge step forward for many. If more papers pushed hard to do “repurposing” I think that would go a long ways towards making the news accessible to more people. While the point of “structured data” and not posting everything as a “news article” has some merit I think it also has some drawbacks. I wouldn’t want to start reading news articles that were simply assembled based on a handful of data pieces. To be effective news needs to be so much more then just a handful of data points put together. Maybe it is just semantics but it feels like the story still needs to be the foundation and then added layers of data could be available for those that want to learn more or dig deeper.

Hi All, These reactions are

Phillip Smith's picture
Phillip Smith
Mon, 2010-09-20 15:22

Hi All,

These reactions are *great*. Please keep them coming. :-)

One question that I'd like to present to Chris today, taken from a reaction above, is:

+ Should editors be asking information-focused questions at story meetings, like “What information are you seeking and how do you plan on sharing that information once you get it?”'

The other, based on feedback to the AIDS in the heartland piece, is:

+ What opportunities are there (if any) to package that story in a variety of formats, where each format plays to a different type of audience / reader?

If questions came up for you during the readings, please take a moment to note them down and be ready to put them in the chat box during today's lecture.

Thanks for all the hard work folks. Greatly appreciated.

Phillip.

Adrian Holovaty, in clear,

Terri  Langford's picture
Terri Langford
Mon, 2010-09-20 15:36

Adrian Holovaty, in clear, concise language, presents how today’s media establishments refuse to see the wealth of information contained in the data that make up stories. His call for this change in his “A Fundamental Way Newspaper Sites Need to Change” asks media managers/reporters to find ways to collect that data and repurpose it in a way that gives readers actual information instead of a daily text lecture about what is going on in the world.

Four years later, Stijn Debrouwere, weighs in, discussing how in four years since Holovaty’s call to arms, there’s been no large movement on the structured data front by the media. But instead of offering a concrete program or method as to how to how to loosen the bottleneck, goes on at length at how the media needs to get even more micro about the data and find ways to repurpose it.

In other words, no one has come up with an easy, CMS answer to how any one in a newsroom could capture this data and repurpose it. And no, Caspio isn’t that plan no matter how much some newsroom managers say it is.
So what’s missing here, is the unwritten third essay with “the plan.”

Not the “plan” that saves newsrooms (the old how-do-we-make-news profitable plan that every one waits around for someone else to come up with the answer we can copy like someone’s 10th grade geometry homework). I’m talking about a yet, unwritten “plan” that gives us some sort of template, an add-on that incorporates in the publishing tools already installed in newsrooms to allow anyone in a newsroom – from the lowliest agate clerk to the highest-ranking editor – to both mark and cull these objects in a story for repurposing now or later.

This lack of incorporation of Excel and Access into a publishing system is what is keeping newsrooms from being more efficient. No one has come up with that. That’s the essay I want to read. While I’ll always read the countless essays in which another academic infocrat proclaims that MSM is “reluctant to change” (as if this industry is the only one where that’s a problem), what I want now is a real plan, a real template that does more than identify the problem.

Burt Herman comes closest to making the sort of publishing/culling tool a reality with Storify. He’s taken an old AP verb and made it new it again. He’s done what many newsrooms need: a tool that places the placing the culling process into a publishing template that anyone can use.

It's not that newsrooms don't want to change and don't want to repurpose the massive amounts of data that comes at them in the reporting process. It's that they have a profound lack of efficient tools that can be utilized quickly. The first programmer to fix the efficiency problem, where marking and culling the data is as easy as writing the story that provides context in one program (no whipping out of Excel or a database manager is needed) will be the one credited with truly bringing publishing into a new era.

The Old way 1. Know what I

nancy cardozo's picture
nancy cardozo
Mon, 2010-09-20 16:40

The Old way

1. Know what I want to do: Inform, explain, influence, entertain, deceive, or some combination of those.

2. Know who I'm writing for: Readers have different 'learning styles' and how I present information varies on who I'm talking to. Some people can't get into a science article unless there's a human interest hook, other people just want the research abstract. Knowing this lets me know who I have to interview.

3. How much background is needed for context (by both reader and writer)?

4. What is the trust/suspicion level? How weird, unfamiliar or unpopular is the idea I'm presenting and who do readers and editors trust?

5. How much time and space do I have?

Which of these have relevance to the systems we're talking about? I'm totally "story-centric." To me, data is information without purpose or context. Writers are the original repackagers, collecting information and giving it life. Some of Holovaty's ideas are good. I love the idea that info from a piece can then be extracted and reused, but why all the animosity toward journalists? Blob-of-Text writers are arrogant Luddites standing in the way of sleek graphics and mesmerizing animations depicting the ages of Guantanamo detainees who got married? Ha.This is the kind of article that deepens the divide between the Two Cultures.

Contrast that with the anthropology in the Times piece. The team there looks for stories that lend themselves to graphics and webby stuff. Both writers and computer people get better at finding new ways to present information. The cultural change seems more organic and collaborative. Language matters when different cultures merge.

Is there any precursor to this in the old journalism world?
In print times, at some publications, a fact checker would take an article that had been turned in, and highlight everything that needed to be checked. The things that would be checked seem very close to the types of things that would go into the fields in the CMS.  It might look like this:

"With an expected completion date of 2013, 1 World Trade Center is the most expensive skyscraper ever constructed in the United States, with a price tag currently estimated at $3.3 billion. By contrast, the spanking new [Bank of America Tower] in Midtown Manhattan cost about $2 billion. That is pretty much the going rate for building new skyscrapers in New York City. Just to break even, 1 World Trade Center will require rents far higher than the [going rate in Midtown], much less downtown New York, where the building is located and where rents are considerably lower."

They'd check the dates, the costs, find out whose expectation, whose estimate. They'd make sure the Bank of America Tower was really "spanking" new and find out whether or not the t in Tower was really supposed to be upper case. They'd find out what it would cost for 1 WTC to "break even" and make sure that all those assertions about rents in NYC were correct. They would not be looking at the structure of the article or the style in which the article was written. By looking at a news article as a deliberately ordered collection of discrete pieces of  information, which is the way fact checkers do,  it becomes much simpler to put each element in the proper category.

Other questions: Are the CMSs people have mentioned standardized in a way that they will be able to feed into other data management systems? I've played around with Wolfram/Alpha [http://www.wolframalpha.com/] and it made me wonder if the databases and programs and systems being discussed here are being designed with the idea that they would feed into a giant central program that would hold all kinds data from everywhere. (Kind of scary and cool to me.)

How long until a standard emerges? Are we close?

After plowing through

Jeff Severns Guntzel's picture
Jeff Severns Guntzel
Mon, 2010-09-20 17:05

After plowing through everything on the journalist reading list there are roughly one zillion threads I want to pursue with the group and my colleagues. One thing I want to put right up top, however, is the issue of passion.

I was struck by this nugget from "The Journalist as Programmer":

"The department sought and hired people with a passion for journalism, an interest in telling an important story and the ability to work across departments. These are not necessarily traits that automatically come with someone from a computer science or software development background."

Hell, these are not necessarily traits that automatically come with someone from a journalism background. I'm of a mind that the world doesn't need more people who call themselves journalists--it needs more people who are passionate about journalism. Thinking of it this way it becomes easier to bring hacks and hackers together and call what they do journalism rather than some clunky hybrid like "programmer journalist."

If you broaden things a bit to include people with a passion for storytelling, all of the resistance and issues of arrogance discussed in "A fundamental way newspaper sites need to change" and "We're in the information business" can be beat, so long as we are talking about storytelling by any means necessary.

Reading Holovaty and Debrouwere, I thought of the bins and bins of printed data I have in my basement--information I collected while reporting stories for newspapers and alt-weeklies before I was giving any thought to story forms beyond the "blob of text."

The information in those bins is what I didn't use in dozens of news stories and investigative pieces. I keep them to remember exactly the kinds of information that I wasted when I did stories the old-fashioned way and to remind myself to always be thinking of new ways for using that kind of information. In some cases, I am actually going back to that data in my new job with MinnPost and building something new from it.

I'll stop there, though it's not for any lack of a burning desire to go on and on and on...

Thoughts on Holovaty and

Steve Myers's picture
Steve Myers
Mon, 2010-09-20 17:45

Thoughts on Holovaty and Debrouwere

If I hadn't seen the date on Adrian's post, I could've thought that it was written yesterday. The problems he identifies persist because the fundamental ways that journalists do their jobs haven't changed in decades, gathering information and constructing a narrative around it.

In the process, as Adrian points out, they often take structured data -- or data that could easily be structured -- and transform it into unstructured data. They see a date of Sept. 19, 2010 on a police report, for example, and write that the crime happened "Sunday."

Consider how that one change makes the data much more difficult to repurpose. To transform Sunday back to "Sept. 19, 2010," a piece of software (or a human) would have to know that most journalists write dates within a week of publication (in the past or the future) as the corresponding day of the week. Software would then have to then process the date of publication and the day of the week in the story. But how would the program figure out if the day was in the future or the past, by searching for verb tense?

I hadn't thought of this as an information architecture problem; before reading Stijn Debrouwere's "We're in the information business," I thought this was mostly a matter of getting journalists to change how they work.

Workflows vary widely from journalist to journalist, and so much of the working material -- notes, phone calls -- are pretty much impossible to catalog and repurpose. So, as Derek Willis wrote in 2005, "Like much of the other information that passes through a newsroom, data too often falls through the floor instead of being reused."

Editors don't care too much about the varying practices of journalists as long as they get a product -- a story of some kind -- that does adhere to their standards (which includes being factually correct).

If we think about how journalists do their jobs in terms of entities and relationships, the same way you'd model a database, I think we could make some strides toward structuring how journalism is done. The structuring has to occur in the process of doing journalism; it can't be done retroactively.

By the way, that post of Derek's is part of a series of essays called "Fixing Journalism," and they're all worthwhile. Other relevant posts are:

* How newsrooms lose institutional knowledge when people leave the staff and potential solutions
* The need to create more powerful archives for employees -- annotated, inter-linking archives that help provide the context that journalists need to report today's news with the proper historical background

Something that strikes me

Chris Nicholson's picture
Chris Nicholson
Mon, 2010-09-20 17:52

Something that strikes me about this piece is the polemic employed. It's less common for inflamed or emotional polemic in journalistic pieces that are liberal, which is why someone like Michael Moore is noted as standing out in recent documentary film-making. Emotional and inflammatory polemic tends to come in the shape of illiberal rabble-rousing, by the likes of Fox News and Rush Limbaugh. Indeed, the thing that strikes me about "AIDS In The Heartland" is exactly this very emotional polemic running through a story tale structure.

This brings me to my next point. Is this a good thing to do? This is crass generalisation, but bear with me; part of the theory behind the success of right-wing campaigns, such as the Tea Party movement, comes from the fact that a lot of them are run with an impassioned emotional narrative with scarce regards for facts and figures. In contrast, a lot of liberal viewpoints are illustrated rationally with frequent recourse to statistics (e.g. this is why a disadvantaged poor family would do better under us, with the figures to prove it). As sad as this is, a lot of people tend to be persuaded by the irrational narrative, even if it's against their best interests. But would "AIDS In The Heartland" really benefit from being reduced to an article about the facts and figures of people living with AIDS? Well, it wouldn't obviously. This is why the author has given the article that title; to show that AIDS isn't some metropolitan liberal disease, but can strike in the heartland of rural America. This is a hook to people who might not otherwise read it, rather than just something that might look like preaching to the converted.

I'm imagining that, as a web developer, reading "AIDS In The Heartland" is to do with ascertaining how useful social media is in promoting news stories and journalistic pieces. On one extreme, do we really need a human factor in reporting some stories? As was illustrated in the second lecture, you can actually assemble a sports story with a mix of open (sport) data and clever news feeds. You effectively make the human sports journalist redundant. It doesn't really surprise me that sports journalism *could* be as reductive as this. However, does this mean we can translate any news story into atomistic data? The other extreme is to figure in only the human factor with no access to data. Then you're likely to get either introspective pieces, which would either produce the journalistic equivalent of navel-gazing, daily gonzo journalism (which would eventually just kill the journalist) or the equivalent of an extreme version of Fox News (some people would say there's no such thing as an "extreme version of Fox News"!).

The conclusion? Journalism by it's very nature is human, so the storytelling narrative is vital to get other people to actually read it for the emotional content. But in the current world of expanding open data, journalism that doesn't tap the Open Web will seem increasingly poor and irrelevant. Journalism doesn't need to abandon its human heart either; the richness of the data should enhance the richness of deeper storytelling. Despite the fact that "AIDS In The Heartland" is a great example of heartstring-tugging observational documentary and confessional interview (which you can agree with or not), there's still enough journalistic integrity in hanging real facts* on the tapestry, albeit sparingly.

* "He has his blood tested to determine his white blood cell count his body’s natural defense system. It often is below 1,000; a healthy person’s count would be closer to 5,000".

Though I read all of the

Kay Steiger's picture
Kay Steiger
Mon, 2010-09-20 17:52

Though I read all of the pieces, I'll write my reaction on two of them, Adrian Holovaty's "A fundamental way newspaper sites need to change" and Cindy Royal's "The Journalist as Programmer" [PDF]. There are three main themes I saw emerge from these two pieces: 1) the breakdown of traditional "walls" in newsrooms coupled with increased collaboration, 2) a need for more diversity in approaches, and 3) the expansion of skills required. (Apologies in advance for making my 100-300 word reaction into a 1,000+ word analysis).

Breakdown of traditional "walls" in newsrooms and increased collaboration across departments

Royal addressed this best when she wrote that the objective of the New York Times' interactive department was to collaborate across all departments and elevate the role of the programmer. Newspapers -- and journalism more generally -- had become predicated on the theme of competition. Newspapers and magazines compete with one another to write the definitive story on a given topic, collect Pulitzer prizes, and be "first" to report something. In some ways, the modern Internet era has amplified some aspects of this idea of competition. The difference of posting a story ten minutes earlier can be all the difference in attracting traffic to one's site.

Still, as the New York Times found, the principles of Open Web -- particularly transparency and increased collaboration -- could be more beneficial to journalism than competition has been. Furthermore, as Holovaty pointed out in his essay, he quickly realized that inter-department collaboration could be hugely beneficial when little league information was coupled with weather forecasts. The result was a small but extremely beneficial cross-collaboration for that can fundamentally change the user experience for the better. Fundamentally, the biggest change required on the part of journalists here is to change the way they think about their jobs. Royal's message seems to be that instead of competing with one another, the user most benefits from collaboration instead of the deeply integrated attitude of competition. Secondly, As Holovaty points out, journalists need to stop thinking of themselves at producers of "big blob of text" but rather to break down what they do into pieces of information that can be re-purposed and re-integrated into other kinds of media in other ways.

Perhaps one of the most shocking findings of Royal's (which I can attest with my own experience in journalism school) was that "only about half of journalism programs were teaching spreadsheet and database skills."

A need for more diversity in approaches

One of the key themes in both of these pieces is that journalism has gotten itself into an attitude that is narrow. What is needed from both programmers and journalists is to think about what they're doing in new ways, and media today demands a constant revision of standards. The pace of the changes in user experiences is quicker than ever -- and neither journalists nor programmers can continue thinking about media in old ways. This is the "blob of text" error that Holovaty found was so limiting.

Furthermore, that likely means that journalism and programming must both face an old nemesis: Working to be more inclusive to people who have not traditionally been included in either journalism or programming. Royal -- the only female author in our assigned readings -- was the only one to point out that programming has a severe lack of women. The NYT's interactive department, when she began her study, found that "the entire team, including it's leader, were male." Royal made no note of racial, ethnic, religious, political, or sexual orientation diversity among the staff, but it's been pretty widely realized that "nerd culture" (the kind that many programmers might identify with) is overwhelmingly white and male.

One of the problems might be in the path many have taken into programming. Royal herself points out that "Most described their skill acquisition as 'self-taught.'" While this may seem a wide-open path for women and people of color to enter the field (I have often heard the same argument made of the political blogosphere, that it is fundamentally a "meritocracy," with little to no barrier to entry), there are certainly invisible barriers to women particularly entering such a programming area.

As I've written before, women are less and less likely to pursue STEM (science, technology, engineering, and math) fields, and the number only diminishes with age. There seem to be two areas of dropoff: First at the entry point, when women are first considering what types of careers they might pursue, then once they have families and realize the increased demand in time required for such fields isn't compatible with their expected time devoted to family responsibilities. Fundamentally, the "self-taught" notion requires a great deal of free time (i.e. working at learning something without pay), which is often not an easily accessible commodity for some.

Fundamentally, as journalists and programmers consider diversifying skills and critically examining the roles of journalists and programmers, I hope they consider some of the more fundamental questions of diversity.

The expansion of skills required

Finally, as was discussed in our lecture on Friday, the types of skills required of both journalists and programmers is expanding. Even if a journalist doesn't know XHTML or javascript, for example, he or she must understand what these technologies can do so he or she can think about the user experience in a holistic way. That is, even if a journalist cannot build the story in a completely interactive way, he or she must consider the ways that story can be interactive. Holovaty's breakdown of stories that consist of pieces of information was useful in illustrating this principle. Furthermore, as Royal noted, "While RoR skills specifically were not required of new hires, most had worked in an environment where they were introduced to object oriented programming concepts." Specific skills can be taught on the job, but it is the concepts behind them that make a difference.

It is for this reason that I am most excited to be in this class. I came from a journalism school that taught me journalism in a good but rather antiquated way. The disdain for blogs and web more generally was palpable among many of my instructors. It's clear that many of them, pushed out of the journalism field in various ways, viewed the web as a threat rather than as an incredibly transformative tool that can make the user experience richer. I look forward to helping to break out of that mentality.

I presently work in a shop

Joe Johnson's picture
Joe Johnson
Mon, 2010-09-20 18:06

I presently work in a shop that has been in survival mode for the past three or four years, which is not really a new concept for newsrooms across the country. It is a challenge every day to do more with less but when you don't have the tools or people around to implement the ideas brought up in the readings, it is discouraging to see your effort withering on the vine.

My paper still is behind the curve of being able to develop these data-driven reports. Our web presence is less than compelling, though it does look nice. It leaves me with the "So what" that is supposed to come next. Being able to collect good data, write interesting stories and then have it all saved into a newsroom system that can find new and innovative ways to present the information should be the goal of journalists. News organizations that can achieve this goal will flourish (find a way to make money), while those that don't will be shuttered.

Even though I'm a newspaper guy, newspaper websites face increasing competition from television news websites as more put their content online. TV news websites can horn in on the news space newspapers once held almost exclusively but it is harder for newspapers to do the reverse. You can get headlines and stories from both but most newspaper websites aren't going to have the volume of video.

Newspapers can reclaim their news gathering superiority by capitalizing on their in-house talent to produce interesting content that can be delivered in print or electronically. But there has to be that commitment on the part of newspapers to delivering the "So what does it mean to me" aspect to their readers.

Reaction: Journalists, you

Grant Hamilton's picture
Grant Hamilton
Mon, 2010-09-20 18:11

Reaction: Journalists, you aren't what you thought you were

Having read "A fundamental way newspaper sites need to change" and "The Journalist as Programmer", I find that there is both pessimism and optimism in abundance.

But I also found both pieces to be somewhat limited, and a little flawed in how they looked at journalism.

Adrian Holovaty's "fundamental" change to newspapers was the suggestion that they drop the story-centric worldview. I work in a CMS program that is as limiting as he says, and it would be wonderful to have more flexibility in content types.

But a "story" in journalism terms is more about the interpretation and contextualization of data, and less about its presentation in words. Most journalists I know would be perfectly happy to take information, repurpose it, remix it, and present it in a non-story form. Of course, they're not trained in how to do that, but they're not against it.

What they would be against, however, is the notion that they be reduced to merely tagging data, making sure that different databases, from different sources, can be cleanly brought together. They don't want to be data custodians, merely handing out to reader/users the information that they ask for, they want to be data interpreters, finding important information that reader/users didn't even know they wanted. That's news.

Better was the look at the New York Times' INT department, which produces a lot of work that I admire.

Small teams of programmer/journalists work with the rest of the editorial department to create data-driven graphics and interactive pieces. There is room for user input and for people to remix the data themselves, but it is the team that makes editorial decisions, and which picks the best way to interpret the data.

From the perspective of a small-paper employee, though, I don't see news organizations spending a lot of time or effort (or money) replicating what the New York Times is doing.

Far better, I think, would be a data framework that journalists could just plug in to.

No journalist would build his or her own typewriter. No journalist would code a Microsoft Word clone from scratch. Journalists are looking for tools that will let them quickly absorb new data, to find the interesting parts, and to pass on those parts to their readers.

Create that killer app and journalists will flock to it. But I don't think it's realistic for even programmer-journalists to create a new app every time they need to present data.

I would argue that the

Jorge Rivas's picture
Jorge Rivas
Mon, 2010-09-20 18:15

I would argue that the easiest way to win an argument is to establish a pattern that illustrates "the issue" your trying to prove. Often times when I gather information for news stories I try to identify any patterns because I find my work can be that more useful and compelling when there is a pattern identified. For example, if there is a car accident at intersection X, I'd be interested in reporting the news but then also look at other reports to see if there is a pattern there, have other accidents happened there and why?

In "A fundamental way newspaper sites need to change" Adrian Holovaty argues newspapers should have a system in which readers can access raw data and facts alongside the print story. Although Holovaty wrote this piece in 2006 parts of his proposal are still innovative. Allowing readers to identify patterns for themselves can help build strong emotional connections to both newspapers and the stories they print. In "Everybody's a Reporter" Fullerton writes about the importance of press passes but just as important is access to data, having a system where readers can access raw facts like the one Holovaty writes about can also make a tremendous difference for bloggers and independent writers trying to access historical data to write more comprehensive stories.

As the web continues to

Jason Dean's picture
Jason Dean
Mon, 2010-09-20 19:03

As the web continues to change and evolve, newspaper websites remain static and unchanged. Most continue to post stories on the web the same way the do in a print version.

Adrian Holovaty and Stijn Debrouwere discuss the importance of the date that makes up an article. As Debrouwere says there are two main parts to what he calls the domain model - entities and relationships. While these may not be important to traditional journalism, they are the backbone of creating new and innovative web journalism.

Publishing a story quickly and have it look nice for the readers versus having the data easily available throughout a site to create something more later. Holovaty uses the example of pages for little league teams that used the weather forecast from a local weatherman he created earlier.

Debrouwere takes the conversation a step further and discusses the importance of building a content management system (CMS) from the ground up using a framework like Django or Ruby on Rails. Using these rather than a traditional CMS makes it easier to get information both in and out of the system.

As new news websites continue to experiment and try new ways to present the news, it is imperative for newspapers to advance their technology as well. “Innovation should be aimed at the heart of what we do,” says Debrouwere. “Innovation should allow us to do exciting things with everyday news.”

Response to: Holovaty piece,

Solomon Lieberman's picture
Solomon Lieberman
Mon, 2010-09-20 19:13

Response to: Holovaty piece, and Royal piece.

It was interesting to read Holovaty's torch-bearing 2006 piece, and then to read the formalized case study from earlier this year; the former is short and sweet, but it hits upon most (if not all) of the areas discussed in the latter: platform vs. content, foundational news culture vs. evolving news culture, and user experience.

Looking at my own opening paragraph here, it's telling that I naturally use the adversarial "vs." to describe the concepts being explored in these pieces. For most veteran journalists, the rules for finding, refining and constructing a story have been, until recently, absolute. But now, journos must re-evaluate that process and adapt in order to survive; the ones who can't get beyond their aversion, will not.

One concept that I responded to was User Experience: Holovaty acknowledges this by emphasizing that a "big blob of text" has value, and should not be wholly replaced by data; Royal hits upon this by describing how the NYT team builds its interactives with the user experience at the forefront: they limit instructions, avoid excessive manipulation, etc. This is key, as the fundamentals of a "news story," i.e. inverted pyramid, nut graph, etc., need parallel conventions in the technological realm.

Here's my question: What are the new rules in hacker journalism that guide the structure of the final product? (THEN: Inverted Pyramid; NOW:?)

Good Stuff!

Sol

My (late) comments about AIDS

Vítor Baptista's picture
Vítor Baptista
Mon, 2010-09-27 02:14

My (late) comments about AIDS in the Heartland.

I felt connected to Hanson and Henningson. The writing style is very descriptive, and I could see both at their farm in the 80s. But, even though I was enjoying the text, at some point I began to skim over it. It's clearly not meant to be read in a monitor screen, trying to do so began to hurt my eyes. Unfortunately, I couldn't print it.

As many noted, there's much data condensed in those 22 pages. As I was reading, I wasn't just knowing their story, but also a bit about their city's structure, the reason of the people at that time. At some times, it felt like parts of a history book. I liked that.

Organize all those informations, all the different people involved, is a skill that I don't have (yet), but that I admire and I would really like to learn.

All in all, I enjoyed the text.