Plenary debates of the European Parliament as Linked Open Data |
|
The LinkedEP datasetThe Talk of Europe project curates Linked Open Data about the European Parliament (EP). The dataset covers all plenary debates held in the EP between July 1999 and July 2017, and biographical information about the members of parliament. The dataset includes: information on the monthly sessions of the EP, the agenda of debates, the spoken words and translations thereof in 23 languages; the speakers, their role and the country they represent; membership of national parties, European parties and commissions. LinksLinkedEP contains links to GeoNames, DBpedia and the official RDF database of the Italian parliament. The European Union Data Portal provides links between Member of Parliament instances in LinkedEP and their named entity resource JRC-Names, available through their SPARQL endpoint. EnrichmentsIn the second Talk of Europe Creative Camp, Adam Funk and Wim Peters (University of Sheffield) used their in-house text engineering infrastructure GATE to annotate the speeches with the concepts in them and their degree of occurrence across the proceedings. They also interconnected these concepts based on their semantic relationship. The resulting RDF (n-triples) is available for download here. Origin of the dataTo obtain data about the plenary debates, we generated RDF from the HTML pages published on the official website of the EP. We collaborated with the Political Mashup project by Maarten Marx at the University of Amsterdam, who provided scripts to scrape the HTML pages. The biographical data about members of parliament come from the Automated Database of the European Parliament of the University of Oslo [Høyland et al., 2009]. We translated this database to RDF, linked it to the debate data, and made it available as Linked Data as part of the LinkedEP dataset. |
UPDATES & CHANGES2 May 2018: Rerun of the entire dataset using the new biographical data from new version of the Automated Database of the European Parliament. 6 March 2018: The Talk of Europe dataset (1999-2017) is permanently stored at DANS, the Netherlands Institute for Permanent Access to Digital Research Resources. DOI: https://doi.org/10.17026/dans-x62-ew3m July 2017: Major update. Added data up to July 2017, fixed many known and reported bugs. Note that all lpv:number triples have been removed as the numbering was found to be influenced by small changes in the source data. 6 December 2016: The publication about the Talk of Europe dataset has been published:
28 January 2016: We had a film made about the Talk of Europe project! Available on YouTube (5 min.) 26 January 2016: The example SPARQL queries below are not clickable. Click to see the results in the YASGUI SPARQL editor. 23 June 2015: We have had to reset the server but all is up and running again. 15 April 2015: The data are now marked up with provenance information and other metadata using the PROV, VoID and OMV vocabularies. 2 March 2015: Problems with incorrect language tags fixed. On http://europarl.europa.eu/, speeches are sometimes displayed in other languages than the user-selected language. This happens when translations are not available. Until now, this problem persisted in LinkedEP. In the current version, we have fixed the majority of the incorrect language tags of speeches, although some remain. 18 Feb 2015: The dataset now covers the complete fifth, sixth, and seventh term (1999-2014) of the European Parliament. Note that the declared prefixes have changed, see the updated model depiction and example queries below. |
Access to the dataWe provide access in several ways:
The concepts of 'Linked Dataset' and 'named graph' are registered in the CLARIN Component Registry . The CMDI file describing these resources can be found here. |
|
Data modelThe schema of classes and properties used in the LinkedEP dataset is displayed in the figure below. For a description see here. |
|

Example queriesRDF can be queried using the SPARQL query language. This data portal implements SPARQL version 1.1.Example query 1:Select max 100 English spoken texts in a given date range, ordered by date, agenda item and speech. (click to run)
SELECT ?date ?speechnr ?text
WHERE {
?sessionday rdf:type lpv_eu:SessionDay .
?sessionday dcterms:date ?date.
?sessionday dcterms:hasPart ?agendaitem.
?agendaitem dcterms:hasPart ?speech.
?speech lpv:docno ?speechnr.
?speech lpv:spokenText ?text.
FILTER ( ?date >= "2009-05-06"^^xsd:date && ?date <= "2010-05-06"^^xsd:date )
FILTER(langMatches(lang(?text), "en"))
} ORDER BY ?date ?speechnr LIMIT 100
Example query 2:For a particular agenda item (the fourth item of 16 December 2010), find the frequency distribution of the speaking slots over the EU parties of the speakers involved. (click to run)
SELECT ?partyname (COUNT(DISTINCT ?speech) AS ?speechno)
WHERE {
<http://purl.org/linkedpolitics/eu/plenary/2010-12-16_AgendaItem_4> dcterms:hasPart ?speech.
?speech lpv:spokenAs ?function.
?function lpv:institution ?party.
?party rdf:type lpv:EUParty.
?party lpv:acronym ?partyname.
} GROUP BY ?partyname
Example query 3:Count the agenda items in which at least one MEP from France spoke out. (click to run)
SELECT (COUNT (DISTINCT ?ai) as ?count)
WHERE {
?ai rdf:type
Example query 4:Get the transcript of (an arbitrary selection of 10) speeches that contain the word "agriculture". This query uses efficient indexed text search by deploying ClioPatria's Text Property Functions (tpf) SPARQL extensions. (click to run)
SELECT ?speech ?text
WHERE {
?speech tpf:match (lpv:text 'agriculture' ?text)
} LIMIT 10
Example query 5:Get the number of speeches held in each language (counting only speeches of which the languages was indicated). (click to run)
SELECT DISTINCT ?language (COUNT(DISTINCT ?speech) AS ?speechno)
WHERE {
?speech dcterms:language ?language .
?speech a lpv_eu:Speech .
} GROUP BY ?language
Example query 6:Get the 10 most recent agenda items annotated with the Eurovoc SKOS concept "Syria". (click to run)
SELECT ?date ?agendaItem
WHERE {
?concept skos:prefLabel "Syria"@en .
?annot dcterms:subject ?concept .
?agendaItem lpv:topicAnnotation ?annot .
?agendaItem dcterms:date ?date .
} ORDER BY DESC(?date)
License & citationsThe LinkedEP dataset is available under a CC0 license. To acknowledge us, please cite us as:
ReferencesBjørn Høyland, Indraneel Sircar, Simon Hix. An Automated Database of the European Parliament. European Union Politics, 2009, Vol 10, Issue 1, 143 -- 152. |

