In case you hadn’t heard, the United States’ National Security Agency (NSA) has been gathering metadata from U.S. domestic communications about U.S. citizens and people world-wide, without a specific warrant, as one part of a surveillance project called PRISM. As the scandal has unfolded, a bad joke among us Information and Library Science (ILS) professionals has been, “at least now the average person knows what ‘metadata’ is”.
I lived, breathed, ate and slept the Dublin Core Metadata Element Set (DCMES) and the Open Archives Protocol for Metadata Harvesting (OAI-PMH) when I wrote my master’s paper. I thought it might be worthwhile to post some information about metadata for those who would like to learn more about it.
In general, “metadata” is “data about data”. So, what exactly does that mean? Let’s say you go online to purchase a book you would like to read. You want to buy The Hunger Games by Suzanne Collins, which was published in 2008. You fill out a search form. You enter “The Hunger Games” in the title field, “Suzanne Collins” in the author field, and “2008″ in the publication date field. In this example, the data are “The Hunger Games”, “Suzanne Collins”, and “2008″. The metadata are “title”, “author”, and “publication date”. Metadata fields stay the same, while the data entered may change. For example, you may wish to search on a different book, by a different author, that was published on a different date. You will enter this different data into the same metadata fields (title, author, and publication date) you used to find the first book. You may find an example of this on the Advanced Search page on Amazon.com.
The Guardian (UK) defines metadata within the context of technology and the PRISM scandal as follows.
Metadata is information generated as you use technology, and its use has been the subject of controversy within the general public since Edward Snowden revealed the NSA’s secret surveillance program. Examples include the date and time you called somebody or the location from which you last accessed your email. The data collected generally does not contain personal or content-specific details, but rather transactional information about the user, the device and activities taking place. In some cases you can limit the information that is collected – by turning off location services on your cell phone for instance – but many times you cannot.
The author of the article then provides details about what metadata you generate by the technology you use. The examples given include a Web browser, cell phone, email, camera, search engine, Twitter, and Facebook.
To use a prior scandal as an example, investigators discovered via Internet Protocol (IP) addresses that Paula Broadwell was the source of the threatening emails sent to the woman she perceived as a rival for the affection of then CIA-Director General Patraeus. She thought she had protected her identity by using anonymous email accounts. The IP addresses revealed the physical address of the author of the threatening emails, and the one person common to all locations was Paula Broadwell.
Prior to the PRISM scandal, information technology professionals referred to this type of technology-related metadata as one’s “digital exhaust” or “digital footprint“. One’s digital footprint or digital exhaust is simply the trail you leave behind as you use various technology tools. This information can tell someone when and where you used a particular tool, such as email.
The use of “digital exhaust” by technology companies, such as Google, has been controversial within the technology and law communities for years as many people have been concerned about citizens’ privacy rights. There is even a “Data Privacy Day“, which is held every year on January 28th. This day has not received much attention from the general public or the press. It has been an event celebrated by lawyers, information professionals, and geeks. (I imagine that next January 28th, Data Privacy Day will receive a lot of attention from the press and the general public.)
The National Information Standards Organisation (NISO) offers this 20-page document called “Understanding Metadata” that provides a more standard, Information and Library Science-oriented view of metadata. The document covers metadata from a non-technology standpoint, and includes sections that define metadata, what it does, the different metadata schemes and element sets, future directions, and examples. If you have ever looked up something from a library Web site or in an old-fashioned card catalog, then you have used this kind of metadata.
If you prefer not to read a document about metadata and would rather watch a video, here are a few I found. Each video is only 3-5 minutes long.
Timelapse: What is Metadata? by TortoiseButler
What is Metadata? by B2Bwhiteboard
What is Metadata? Dr. Jane Greenberg*, Director SILS Metadata Research Center, by UC3M
The Electronic Frontier Foundation (EFF) has posted this list of free software that may provide enough protection to enable users to “opt-out” of PRISM. The list includes search engines such as DuckDuckGo, that don’t collect information about their users. (DuckDuckGo has seen their user base soar since the NSA PRISM scandal became public knowledge.)
I referenced the following links above, but I’ll provide them again. If you are concerned about your privacy in general, in addition to the EFF’s list of free software tools that enable privacy, this link will take you to information about your online privacy rights. This link provides general information on how to stay safe online. If you would like to calculate your online digital footprint, EMC offers this personal Digital Footprint Calculator.
Were you familiar with the term “metadata” prior to the PRISM scandal?