I have compiled some data from the U.S. Federal Election Commission website. This dataset represents federal electoral campaign donations in the United States for the election years 1980 through 2006. The data are available to the public on the FEC website, but in several files in varying formats, so parsing takes some time.
The data, fully built, will form a tripartite, directed graph. Donors (individuals and corporations) make contributions to Committees, who then in turn make contributions to Candidates. There is a many-to-many relationship between Donors and Committees, and also a many-to-many relationship between Committees and Candidates. Each donor, committee, and candidate has a unique integer in this dataset.
While my work involves graph mining/network analysis, I can imagine this data being useful for a number of machine learning/data mining projects. Please feel free to use it for these purposes. If you find something interesting, let us know and we can cite you here. This is currently available to CMU only, to avoid helpdesk getting mad at me because of bandwidth. In the future it will be hosted on the DBLab webserver, and available publicly. Corresponding R files may also become available.
campaign-contribution-matlab.tar(353MB) Matlab variables.
campaign-contribution-text.tar.gz(309MB, ~1.4GB unzipped) Identical to matlab, just in tab-separated text format.
The file election-matlab.tar is an archive of 8 files, each saved as a .mat file to be loaded into Matlab. The files are:
campaign-contribution-text.tar.gz is an archive of similar files, only in text format. Here instead of donors#.mat the donor index is named contributors#.txt. Column headers are described in the readme.
The file readme.doc explains what some of the columns in the indices mean.
Additional files, notes
9/25/07- I now have available two new files, which are the same candidate and committee indices, only here there are multiple entries for each candidate/committee if they are listed in the FEC's files for multiple election years. Each entry is labeled with a year. I did this because sometimes committee/candidate data changes over the years (new treasurers in committees, different election seats for candidates). This is explained in the (now updated) readme. The files are here:
candidates2.mat(4MB) Big candidate file- matlab
candidates2.txt(11MB) Big candidate file- text
committees2.mat(12MB) Big committee file- matlab
committees2.txt(41MB) Big committee file- text
10/1/07- At the last update I accidentally deposited the wrong candidates.mat file. The .tar archive should be updated now. However, if you downloaded before 10/1, you can get just fixed version of candidates.mat here:
candidates.mat(2MB) New smaller candidate file- matlab
10/23/07 I modified the readme to have the correct edge file for donors to committees-- the edge files are committee_id, donor_id, amount, date.
DisclaimerI am not affiliated with the FEC. I was as faithful as I could be to the original data, but there well could have been errors in the parser, not to mention the raw FEC data. I did not do very much cleaning of the data, since I'm not a domain expert. If you run into any issues, please do not hesitate to contact me, email@example.com.