I live in worlds outside of this

So far this year, I’ve had the privilege of attending three #FlashHacks events, data liberation hackathons run by OpenCorporates, which runs the world’s largest open companies database. Attending #FlashHacks has brought me into contact with the world of corporate data, which isn’t something I would ever have considered or had contact with otherwise.

In our Data Journalism classes at City University, we’re taught how to scrape websites and social media networks to get at data; to look for stories in scraped or released data; to visualise data in a number of different ways. But we’ve never really needed to liberate data, that is comb the internet for data sources just for the sake of cataloguing them, or trawl through endless company files looking for connections. That work is more akin to Investigative Journalism than Data Journalism, but it’s also all about data and I think may become part of the everyday work of a data journalist as the data journalism and open journalism movements both advance. Here’s what I’ve learned so far by dipping my toe into the world of corporate data.

Liberating data involves a lot of legwork

This is true of all forms of data journalism, where stories can result from hours or even days of poring over spreadsheets, scraping data, cleaning data, looking for connections. But liberating data in particular is about 90% legwork, 10% payoff. #FlashHacks events are usually split into two sections, or teams; one team works on coding bots, which are programs that crawl over pdfs containing data and parse it into a human-readable format. One day I aspire to learn how to code a bot, but my coding background isn’t quite strong enough yet. It can take hours to push (run) a bot that will convert one pdf into a human-readable format, to say nothing of actually doing anything with the data.


An example of data parsed by a bot during a #FlashHacks meet-up

Then again, the non-bot method of liberating data is even longer. It involves simply trawling through pdfs, registers and other places which house company data, finding what’s relevant and inputting manually it into a huge spreadsheet, from which the data can be visualised and connections can be found. At past #FlashHacks, we’ve made finding data sources into a competition, with a prize awarded to the person who can log the most entries on the group spreadsheet.

Liberating corporate data is best done collaboratively, in other words, with a group of people to share leads, divide up the work, and egg each other on.

The prize for the most spreadsheet entries in this #FlashHacks was an Easter Egg. You can see I never really stood a chance...

The prize for the most spreadsheet entries in this #FlashHacks was an Easter Egg. You can see I never really stood a chance…

Corporations won’t make it easy on you

This one might not come as a surprise, but even here in the UK – which is considered a world leader in terms of making data accessible, thanks to Companies House – corporations don’t make it easy on the public to use their data. The vast majority of companies, for example, file their tax returns in PDF format, which as I’ve mentioned needs special measures before it can be easily edited and exported elsewhere. HM Revenue and Customs recommends that companies file their tax returns in the more readable XBRL format, which they could just as easily do, but the majority continue to publish in PDF format in spite of this.

You also need to know where to look, which again is why collaboration is a key part of liberating corporate data: recording good data sources and advising one another on where to look for the right documents. Data journalism can often seem like a solitary activity (though hopefully less so in the future when data skills are more commonplace and data journalists seen as less of a world apart from regular reporters), but liberating corporate data is always best done in groups (hence why #FlashHacks exists!)

Visualisation can be a challenge (but a rewarding one!)

Visualising data isn’t always possible or necessary when working with corporate data, but it can be a useful tool, for example in mapping companies and their connections to better understand their relationships. OpenCorporates uses an in-house visualisation tool called Octopus to visualise company connections.

An in-progress map of the connections between different companies owned by Aviva PLC, visualised using Octopus

An in-progress map of the connections between different companies owned by Aviva PLC, visualised using Octopus

This map was visualised after an evening’s hard graft at the most recent #FlashHacks event, and as you can see, it’s still far from complete. A company with as many different connections as Aviva PLC (and most multi-national corporations for that matter) can be extremely challenging to visualise and scrutinise, and any map of its connections would probably need to be interactive before it can be explored properly. It can also take a lot of work before any of the links and hierarchies begin to appear.

However, it’s extremely satisfying once it starts to come together, and visualisation can be the best way to get an overview of interlocking corporate networks like these – not to mention it looks cool!

It’s loads of fun!

This might seem odd to say after all the “drawbacks” I’ve listed here, but I find #FlashHacks events really, really fun. I enjoy working in a group towards a larger goal, especially as that larger goal is one of social good which benefits everyone in the long run. I also love the sensation of being part of a greater movement towards open data, which is making some huge strides in 2016 in the UK with the creation of a centralised register of beneficial ownership (showing exactly who owns and controls what). It gives me the chance to work alongside other interested minds in the world of data and go “behind the scenes” with data in a way I normally wouldn’t as a journalist.

Also, the free snacks are a big bonus!

If all this sounds like your cup of tea as well, head on over to Meetup and join us!

You can also read my interview with Hera Hussain, organiser of #FlashHacks events, about the need for open data and the role of journalists in the open data movement, on the Interhacktives website.

Doffy Weir is an artist and photographer who specialises in transforming derelict industrial sites and canals in east London into “beautiful, tranquil other worlds”. Earlier this month she held an exhibition at the Hundred Years Gallery in Hackney which experimented with a new form: improvised live music interpretations of her surreal photography.

The exhibition was titled ‘Lesney’s Ghosts I‘, and will be followed up by a second performance in the same format, both held in the bare downstairs basement of Hundred Years Gallery. Speaking before the exhibition began, Doffy explained how the name for her series of photographs was inspired by a man she met whilst out taking photographs in east London.

“I bumped into a man who was looking for Lesney’s Matchbox Factory,” she recalled. The factory had in fact been demolished in 2010, to make way for a colourful apartment complex known as Matchmakers Wharf. Many of Doffy’s photographs focus on the bold colours of the apartment buildings reflected in the nearby river.

“We chatted for a while, and then went our separate ways,” she said. “My husband said to me, ‘Why don’t you call the collection ‘Lesney’s Ghosts’?

“Afterwards, I looked again at the photos, and started to see something in them.”

The photograph shows jumbled red, yellow and blue streaks reflected in a rippling river

A multi-coloured close-up of Matchmakers Wharf, reflected in the surface of the river

One film, three versions

Different individual interpretations of the images was a key theme of the exhibition. Three musicians in turn played improvised live accompaniments to a short, silent film made up of Doffy’s photographs, played three times in succession. First, Marcio Mattos played a haunting cello accompaniment; this was followed by John Butcher‘s saxophone interpretation, which mixed guttering and rushing noises with piercing screeches and whistles. Finally Dave Draper played a melodious, unearthly accompaniment using a guitar and a laptop. Each time, the images seemed to tell different stories. A picture that looked peaceful in one version would suddenly become sinister the second time around. Sometimes, a period of silence was all that was needed to let the images speak for themselves.

Doffy spoke of the challenges that came with assembling her vast array of photographs into a film sequence. The soundless film was originally commissioned by Steve McInerney for the launch of Psychè Tropes Record Label 2014, and took four months to put together. The experience was long and frustrating. “Some days I just sat and cried,” Doffy admitted candidly.

She decided that after so much hard work, the film deserved to be shown at more than just one event, and started to conceive of an experimental new idea: improvised live soundtracks, conceived by the musicians on the spot, with no rehearsal before the event.

Timing the film’s transitions without knowing what the musicians might do was an even bigger challenge, but Doffy was thrilled with the results.

As for her inspiration in taking the photographs to begin with, she enthused, “I’m fascinated with rubbish, with the social history of it. Even though I come from suburbia!”

Without a doubt, no-one who viewed her photography could ever look at rubbish in quite the same way again.

#mynameis: A Timeline

It’s been a few days since the climax of Facebook’s “real name” saga, and the furore seems to have mostly died down. Facebook has officially apologised to the hundreds of drag queens, members of the LGBTQIA community, DJs, stage performers and others who use pseudonyms on Facebook for the policy which forced them to switch to their “real”, legal names on Facebook or face being locked out of their accounts and networks. It has made also made noises to the effect of revising, reinterpreting or otherwise adapting the policy to account for the ways in which a good portion of Facebook’s demographic use its service.

A lot took place in a short space of time, by way of protests, petitions, heartfelt personal accounts, hashtags, a suddenly viral new social network and more. So how exactly was it that we got from then to now? And where does the future of identity on Facebook really stand? In this post, I’ll attempt to break down what happened with a timeline of key events, and get to the bottom of where the policy is going.

Read the rest of this entry »

If you’ve been following events in the crowdfunding or online tech startup worlds at all, you might have heard of the controversy surrounding Healbe GoBe, an over $1m Indiegogo campaign to fund a device that medical science says can’t possibly work. A startup-focused news site called PandoDaily has been leading the charge on the investigation into Healbe’s possibly fraudulent campaign, and it hasn’t looked good on Indiegogo at all.

A little under a fortnight after the first article ran, Indiegogo responded by modifying the anti-fraud guarantee on its website so that the wording was less absolute. A day later, Pando reported on another Indiegogo campaign which had been funded less than six months earlier, also appears to be medically impossible if it can do everything it claims, and for which the company responsible for the device, TellSpec, has since completely reset the clock on the development of a product they originally claimed to have nearly perfected. A parody of the Healbe GoBe campaign was created called ‘Miracle Health Bracelet: Vaguely Track Your Health, Fitness and More’ which made it past Indiegogo’s supposed anti-fraud algorithm, though it has since been removed. Finally, to cap everything off, Pando reported yesterday that the undisclosed donation which pushed Healbe’s calorie counter over the brink of $1 million came from none other than Indiegogo’s chief of hardware, Kate Drane. Clearly, Indiegogo is determined to throw its full weight behind this campaign in spite of all the negative press, scientific debunking and waves of requests for backer refunds.

There have been arguments made on both sides, some saying that Indiegogo needs to take responsibility for the campaigns promoted on its platform and others saying that it can’t be held liable for what the crowd decides to put its money behind. Either way, the Healbe controversy is bound to have a knock-on effect on Indiegogo’s credibility and the willingness of consumers to back other products on its site. After all, there are plenty of other crowdfunding sites out there with innovative projects, products and ventures. But how can you be sure that they won’t have the same problem?

Read the rest of this entry »

As you might have gathered from previous posts on this blog, I’m constantly fascinated by the innovative ways in which different projects on the Internet make use of crowds. I’ve covered Tomnod’s use of crowdsourcing in global crises and the “crowdplaying” of Twitch Plays Pokémon. Then on 1st April, xkcd – “A webcomic of romance, sarcasm, math and language” – upped the ante with an interactive, crowdsourced comic strip called Lorenz.

Edward Lorenz, after whom the comic is titled, was an American mathematician and meteorologist. He was a pioneer of chaos theory and coined the term “butterfly effect”, which describes a tiny variable altering events to eventually produce a much more dramatic result, such as a butterfly flapping its wings and eventually causing a hurricane. The comic’s title text (a caption produced by hovering over the strip with your mouse) directly references the butterfly effect, reading, “Every choice, no matter how small, begins a new story.” The comic’s storyline is also dependent on user-submitted dialogue and click statistics and is therefore chaotic in nature.

The comic begins with an image of an individual (their gender has been the subject of much discussion on the Explain xkcd Talk Page, and they are largely agreed to be female based on them being referred to as a “lady” in one line of dialogue. However, as the dialogue is user-submitted, this is not definitive) at their computer. The user is given a choice of four phrases, randomly ordered, for the character to say:


Read the rest of this entry »

Last Friday was my birthday (hooray, another year older!) and as I move more into the world of journalism with the aim of hopefully doing a postgraduate course in it next academic year, I asked for a couple of books on the subject. One of those was We the Media: Grassroots Journalism By the People, For the People by Dan Gillmor. First printed in 2004, with the paperback edition that I now own printed in 2006, it’s a little out of date for the fast-moving world of information technology, but is still held in extremely high regard, and I think it’ll be a worthwhile read. It was also written with a rather different audience to myself in mind: one for whom the idea of receiving real-time journalistic updates from inside a press conference or significant world event would be a revelation; who would find the idea that ordinary people might have a voice and views of importance equal to that of designated newsmakers to be controversial; and to whom the Internet must seem like a chaotic, volatile upstart muscling in on the orderly, established world of media.

I grew up on the Internet, and its language and ways are second nature to me. The journalism I’m most familiar with is not heavily-regulated and polished ‘Big Media’ but ever-fluctuating, ever-evolving online journalism, which is not a lecture but a conversation, and one in which the people play a part as important as that of the newsmakers. Online, news breaks first via the people, and newsmakers have to listen to them to find out what’s going on, instead of the other way around.

Read the rest of this entry »

In an amazing feat, the chaotic channel chatters at Twitch Plays Pokémon have succeeded in completing Pokémon Crystal in just thirteen days. That’s all sixteen badges, plus a win against the Johto League, rival Silver, and finally Red – and not the Red of the canon Pokémon games, but Red of Twitch Plays, with the iconic team of Zapdos, Lapras, Nidoking, Venomoth, Omastar and Pigeot. As with the first Twitch Plays, a wealth of fanworks has been created around the new team and their individual personalities, their struggles and their losses. If you missed the action, here are ten fancomics which together tell the story of Gold’s – and his Pokémon’s – journey across Johto and Kanto, all the while struggling with the legends of their predecessors and a constant stream of contradictory feedback from “the voices”. (I might have sneaked in an epilogue as well ;) )

Read the rest of this entry »


Get every new post delivered to your Inbox.