by Andrew Pendleton and Bob Lannon
Dec. 16, 2014, 1:23 p.m.
A letter-writing campaign that appears to have been organized by a shadowy organization with ties to the Koch Brothers inundated the Federal Communications Commission with missives opposed to net neutrality (NN), an analysis by the Sunlight Foundation reveals.
Over the past several months, the Federal Communications Commission has been working towards a new set of rules around net neutrality, and a large part of that process has been accepting comments from the public. In September, we reported on our analysis of the comments from the first comment period of this rulemaking, and we’d now like to take a look at the comments from the second, which the FCC released in bulk in October. We again used natural language processing techniques to examine the approximately 1.6 million comments we successfully extracted from this batch of comments, helping to expose important topics discussed in the comments, and to group similar comments together.
Among our key findings from round two:
In marked contrast to the first round, anti-net neutrality commenters mobilized in force for this round, and comprised the majority of overall comments submitted, at 60%. We attribute this shift almost entirely to the form-letter initiatives of a single organization, American Commitment, who are single-handedly responsible for 56.5% of the comments in this round.
Who’s behind the group that flooded the FCC with anti-net neutrality comments?
American Commitment, the group behind a majority of the recent anti-net neutrality comments, is affiliated with the Koch brothers’ network. Read more.
In large part because of this campaign, the percentage of comments submitted that we believe to have been form letter submissions was significantly higher for this round than the last one, at 88%.
Non-form-letter submissions had a similar sentiment distribution as comments in the first round, at less than 1% opposed to net neutrality.
In general, many more comments were difficult to classify in this round than in the first round. Some of the new campaigns on the anti-net neutrality side appear to have been crafted to use similar language to the successful pro-neutrality campaigns of the first round, while supporting opposite conclusions, and many non-form-letter comments used talking points from both camps, making their ultimate intents unclear.
As with the last round, the corpus also included submissions on behalf of telecommunications firms, advocacy organizations, etc., which were written using formal legal language that set them apart from the bulk of the comments. Again, these were a tiny fraction of a percent of overall comments.
Combined with the first round comments, we characterize 41% of the total comments submitted as being anti-net neutrality (with the balance being a mix of pro-NN and comments with no clear opinion), and we estimate that 79% of submissions came as part of form letter campaigns.
Below is a revised version of our comment visualization tool, this time exploring the data from the second comment period.
Graphic credit: Sunlight Foundation. To embed this on your site, click “embed this widget.”
We again did a deep dive into the topics that came to light from this model. As expected many of the same topics recurred in the comments in this round:
Opposition to paid priority or tiered speed was again commonly discussed in pro-NN comments. Form letter campaigns discussing this topic included those from FreePress, BattleForTheNet, Credo, Daily Kos and the Sierra Club.
Many commenters again discussed various legal rationales for net neutrality, with phrases like “common carrier,” “title II,” and “public utility.” Such phrases occurred in about half of comments in this iteration.
Arguments about the economy were common in both pro-NN and anti-NN comments, with disagreements as to which policy best favored economic growth.
Additionally, particularly on the now-better-represented anti-net neutrality side, some new framings were apparent:
Similar to, but less ambiguous than, the messaging that emerged from tea-partier groups in round one, was a set of arguments that dominated the anti-NN comments in round two, and that we believe originated with conservative activist organization American Commitment. Comments from this campaign had a shared template, with different targeted messages inserted between the second and third paragraph. Those targeted messages centered on topics as far ranging as personal freedoms, economic threats, the poor state of US public utilities, and the characterization of pro-NN advocates as extreme leftists (Free Press’s Robert McChesney is portrayed as a Communist).
A separate, smaller contingent that opposed FCC action on net neutrality suggested that while net neutrality regulation might be within the government’s purview, it would be better left to Congress. Most of the comments in this group came from a form letter campaign organized by TechFreedom.
Our identification of form letters followed the same approach as last round: identify clusters with particularly low variance and peruse them to confirm shared boilerplate language. This task was much easier with the second round, however, because there was less noise within each cluster. Because the corpus as a whole contained mostly form letters, partitioning it into clean “neighborhoods” was not difficult. Also, the uniformity of the comments submitted through campaigns like American Commitment’s, TechFreedom’s and BattleForTheNet’s made clustering them together fairly straightforward. American Commitment’s clusters were very well behaved because their shared boilerplate was distinctive enough to exclude them from other groupings, hence the large blue supercluster that houses nearly all of their clusters. American Commitment’s tendency to have clusters of approximately 32,000 comments made spotting them easy, too.
Graphic credit: Sunlight Foundation. To embed this on your site, click “embed this widget.” Note: this visualization shows groupings of textually similar comments. Organized letter-writing campaigns that didn’t involve form letters won’t appear here.
For comparison purposes, here are simplified versions of the form letter visuals from parts one and two, side-by-side:
A new get-out-the-comments player
The clear takeaway in examining the comments from round two is the way in which the campaigns we attribute to American Commitment completely changed the balance of opinions expressed. With their comments excluded, the corpus would have looked quite a bit like the first round:
About 728,000 total comments (vs. about 800,000 in round one)
75% of comments would have been form letters (compared to about 60% from the first round)
About 4% of comments would have opposed net neutrality, only a slight increase from the first comment round
Perhaps just as striking as the scale of American Commitment’s efforts was the breadth; most form letter organizers drove large-scale submission of a single comment template, and while many allowed submitters to customize their comments, most submitters apparently chose not to do so. This resulted in one group of nearly identical submissions for most campaign organizers (this kind of behavior is also typical of our experience with form letters in other regulatory arenas). A few more sophisticated campaigners had more than one template, or allowed submitters to plug variant sentences into a single template, but this was generally the extent of the per-submitter variation.
American Commitment, by contrast, had at least 30 different comment variants, many offering wildly different rationales justifying their positions, and taking positions across the political spectrum in their specifics. The number and timing are almost identical across comment templates, which we believe most likely suggests random assignment of prospective submitters to different comment pools, perhaps as a means of testing which messaging drew more submitters, or possibly to try and evade the kind of automated form letter grouping we and others did in the first round. Here is the comment template:
Dear Mr. Wheeler,
As an American citizen, I wanted to voice my opposition to the FCC’s crippling new regulations that would put federal bureaucrats in charge of internet freedom, and urge you to stop these regulations before they’re enacted.
If the federal government goes through these plans to regulate the internet, I know that the internet will change — and not for the better.
[ INSERT VARIANT PARAGRAPH COMMENT HERE ]
Like many Americans, I believe that the internet should remain free of government control and unnecessary regulation — just as it has for the last twenty years of unprecedented growth.
Please stop the FCC’s dangerous new regulations, and protect the future of internet freedom here in America.
[APPLICANT HOME ADDRESS]
…and here’s a sampling of the variant comments, along with their submission counts and timelines:
The Internet is not broken, and does not need to be fixed. Left-wing extremists have been crying wolf for the past decade about the harm to the Internet if the Federal government didn’t regulate it. Not only were they wrong, but the Internet has exploded with innovation. Do not regulate the Internet. The best way to keep it open and free is what has kept it open and free all along — no government intervention. 150654
Americans have been getting faster and faster Internet speeds because of competition in the free economy, not because of anything the government has done. The Internet does not need the federal government’s “help” and neither the American people nor their elected representatives are asking for the federal government to place political controls over the Internet. The people calling for government control over the Internet are a tiny minority of far-left political activists, and the FCC knows it. Any effort by the FCC to regulate the Internet will be seen by the vast majority of the American people for what it is — another lawless Obama Administration power grab. 32281
The Internet is the biggest economic, intellectual, and artistic success story of the century, and it rose up because of free people, not stifling government. The federal government needs to keep its hands off the Internet. It is not broken, and it does not need to be fixed. It is the federal government, not the Internet, that is broken, and in need of fixing. 32257
Before our government can handcuff a citizen, it must have some reasonable evidence that they have done something wrong. Before the FCC places regulatory handcuffs on Internet providers, shouldn’t the government present evidence that they have actually done something wrong? If the police were to handcuff someone because they might, theoretically, maybe, kind of do something wrong someday, there would be justifiable outrage. Such is the case with the FCC’s attempt to place regulatory handcuffs on Internet providers — just in case they might do something wrong someday. The FCC’s rulemaking in the absence of any actual problem, any actual misbehavior on the part of Internet providers, or any consumer harm is beneath the dignity of an expert agency. 32412
The ideological leader of the angry liberals calling for you to reduce the Internet to a public utility is Robert McChesney, the avowed Marxist founder of the socialist group Free Press. In an interview with SocialistProject.ca, McChesney said: “What we want to have in the U.S. and in every society is an Internet that is not private property, but a public utility…At the moment, the battle over network neutrality is not to completely eliminate the telephone and cable companies. We are not at that point yet. But the ultimate goal is to get rid of the media capitalists in the phone and cable companies and to divest them from control.” In a country of over 300 million people, even an extremist like McChesney can find, perhaps, millions of followers. But you should know better than to listen to them. 32198
Estimating sentiment percentages in non-form comments
Our overall estimate for the roughly 60/40 split between anti-NN and pro-NN was relatively easy to make, since we could confidently classify 88% of comments after reading the 50 form letters that served as their respective prototypes. Still, we were curious to see what the makeup of the non-form-letter comments was. Not only do the remaining documents represent a significant chunk of the corpus, but they’re also potentially the most interesting. These comments reflect the personal interpretations of their authors and give a sense as to how different advocacy messages are shaping how the public thinks about this complex issue.
A brief aside: of the 12% of documents that were not form letters, 14,999 (about 1% of the corpus) looked like this:
To Chairman Tom Wheeler and the FCC Commissioners,
No Content Found — Please specify some content
This submission is obviously an error. Submissions like this appear as the lone gray circle in the form letter visual above. It appears that all of these submissions were just filled in with name and address information, and no actual content. We were able to locate what we think was the source of this phenomenon: Daily Kos specifically directed participants to write their own comment, rather than using a form letter, in this campaign. It appears that about 15,000 respondents didn’t read the instructions and submitted what were essentially blank documents.
The final 11% of comments (184,120 documents) presented a problem. There were only two of us working on this project, and reading the whole bunch would have kept us busy for quite a while. We decided, instead, to manually read and classify a random sample of 1,840 documents (about 1% of the 11%) to make a training set for an automatic classifier, which is a typical text-mining approach to addressing this type of problem. We trained a similar text classifier in our earlier post to try to estimate the number of expert and non-expert comments in non-form letters.
We selected a random 20% of comments from each of the high-variance clusters, which were predominantly non-form-letter clusters. Of those, we selected a random 1,844 documents to classify by hand. Unfortunately, anti-NN examples were very rare (9 documents) and the rest of the set was split between pro-NN (1575 documents) and those that were either too vague or inscrutable (260 documents). This is data that is too unbalanced for training an automatic clustering algorithm, and so we treated it as a rough estimate of the makeup of the non-form comment pool: 85.4% pro-NN, 14% unclear, and 0.6% anti-NN.
This is hardly a scientific approach, but it’s not very surprising to find a preponderance of pro-NN sentiment in the non-form-letter comments. Free Press organized the submission of over 100,000 comments that included the applicant’s name and a short, unprompted message. Furthermore, as mentioned above, there is evidence that Daily Kos charged its participants specifically to write non-form-letter comments.
Public dialogue or public rant?
Our experience analyzing these comments has given us a unique vantage point on the public’s relationship to regulatory bodies like the FCC, and the role that advocacy organizations play in mediating that relationship. The FCC’s Electronic Comments Filing System is not primarily designed to serve as a platform for debating regulators’ role in serving the public. Nonetheless, when the public was invited to comment upon rules that many believe would have serious consequences for the business community and consumers alike, it naturally gave rise to one of those elusive “national conversations” about a complex and contentious issue.
The term “conversation” might be a bit generous in this case, but if there was one, it’s easy to imagine that the original participant — the FCC itself — might consider it to have completely de-railed. Very few of the comments address specific elements of Chairman Wheeler’s proposed rules. Instead, they focus on the general notion that network neutrality is something that should be either protected or eschewed, depending on a commenter’s personal or professional concerns. These concerns, however, are not always directly relevant to the issue at hand.
On the pro-NN side, arguments include network neutrality’s role in protecting our right to free speech and preventing Internet providers from charging consumers higher fees for faster service.
There can be no freedom if you favor one product over another. Net neutrality is important for protecting free speech, innovation, and healthy competition. Don’t let something this unjust happen to the world. (6018210841-8285)
Without net neutrality, people of the lower middle class wouldn’t be able to afford internet fees, so they’d be stunted on their growth as a race of technology. In this day and age, internet connection is so unbelievably vital to being with the goings on, whether it be email, internet news, articles, job applications, the internet play an enormous role in today’s society, that if most of America couldn’t afford, we’d be setting back our progress as a nation. Plus, you might get riots. (6018211177-9593)
Needless to say, private companies are under no obligation to uphold the First Amendment, and it’s already an ISP’s prerogative to charge its customers more or less according to the speed of their connections. These areas of discussion are at best secondary to the main issues that Wheeler’s proposed rules would tackle. The FCC has also shown no willingness (or, frankly, technological capacity) to fulfill the surveillance-culture nightmares mentioned in other pro-NN comments:
Protect A Free Net. We Don’t Need The FCC To Turn Into An NSA 2.0. (6018211039-5702)
Arguments from anti-NN commenters are at times similarly outside the scope of the FCC’s request. Some commenters seem to understand an “open Internet” to be an Internet without any security:
Open Internet sounds in theory like the right thing to do. Of course! But what about terrorists who creep into our every day lives, no matter how much we protect ourselves? Who’s going to protect the Open Internet?
Anti-NN comments are also sometimes fearful of invasions of privacy:
The internet is fine the way it is. please leave it alone so the common people can enjoy it. big brother NSA should concentrate on the true enemy of the country not all Americans! (6018305588)
But on the other hand, the majority of anti-NN comments seem primarily to take issue with the fact that the FCC regulates anything at all:
I do not understand my government’s “need” to fix what isn’t broken. Please keep your hands and your laws off the Internet. I see the Internet as a place where the best and worst can exist side by side without hurting anyone. Please, considering it is possibly the last vestige of free speech in the world, allow it to create itself according to the needs of its varied users. Thank you. (02-047-005216)
As is often the case with complex issues in the public sphere, framing is everything. Both sides in this digital debate appeal to universally cherished values like freedom, personal choice, security, and economic prosperity. It’s easy to see how those foundational American ideals can be used to generate the submission of millions of passionate responses. What’s less clear is whether or not these concerns, often tangential to the issue at hand, are likely to aid the FCC (which, as we’ve pointed out before, is under no obligation to read all comments) in making its final ruling.
A note about data quality
As with the first round of filings, the number of comments we’re including here is short of what the FCC says it released. The bulk download on which this archive was based contains, according to the FCC, about 2.5 million comments, but as best as we can determine, there simply aren’t that many comments in the archive. It’s difficult for us to be sure, however, because the format in which the comments were released was extremely challenging to parse.
The first seven files in the zip archive contained about 725,000 comments, which aligns with what the FCC announcement told us was the number of submissions posted to the agency’s Electronic Comment Filing System. But the FCC also said it is including email comments that didn’t make it into their main system, and we surmise that the remaining files in the zip archive were these comments. This chunk of comments, however, was concatenated together and then arbitrarily chunked into output files, with no delimiter characters between either one comment and the next or one metadata field and the next, such that it was almost impossible to separate comments from one another.
As was true in round one, we fail to see how the FCC arrived at the count that was widely publicized. Clearly, 1.67 million documents is far short of 2.5 million (the number reported in the commission’s blog post). We spent enough time with these files that we’re reasonably sure that the FCC’s comment counts are incorrect and that our analysis is reasonably representative of what’s there, but the fact that it’s impossible for us to know for sure is problematic, and while we laud the FCC for its good intentions in releasing this data in bulk, we expect better-quality releases from federal agencies to the public. The technical difficulties plaguing the FCC that have hampered their collection of public feedback in this rulemaking are, at this point, well-documented, and it’s clearer now than ever that the FCC needs to make a serious investment in technical infrastructure if it wants the community to seriously engage with its data. Thankfully, it seems that FCC technical staff is aware of these problems, because this kind of release just isn’t good enough.
As with the first round, we’re pleased to make available a cleaned up version of the bulk comments for this round of comments. We’ve split the comments from the FCC dump into individual JSON files (one per comment), including both the ECFS comments and the mangled email messages, and also parsed and split an aggregate submission from FreePress representing several thousand comments that showed up as one unintelligible comment in the FCC data.
As a general rule in processing and counting documents, we treated each document submitted to ECFS or received by email as one submission. In certain cases where it was clear that a single submission contained large numbers of distinct comments aggregated together, we made a best effort where feasible to separate those comments into individual records in our data. Petitions, or other circumstances where a single comment was paired with a list of names, were treated as single comments.
We weren’t able to explore many of the ideas we had during the first round about possible avenues for further investigation, and would heartily encourage researchers interested in this data to download the scrubbed versions and consider doing so.
We’d again like to thank Radim Řehůřek, maintainer of the gensim library, which was crucial to our text analysis.
5 powerful python librariesIf you have decided to learn Python as your programming language.
“What are the different Python libraries available to perform data analysis?”
This will be the next question in your mind. There are many libraries available to perform data analysis in Python. Don’t worry; you don’t have to learn all of those libraries. You have to know only five Python libraries to do most of the data analysis tasks. I will give a short introduction to each of these libraries, and I will point you to some of the best tutorials to learn them.
So let’s get started,
It is the foundation on which all higher level tools for scientific Python are built. Here are some of the functionalities it provides:
N- Dimensional array, a fast and memory efficient multidimensional array providing vectorized arithmetic operations.
You can apply standard mathematical operations on arrays of entire data without writing loops.
It is very easy to transfer data to external libraries written in a low-level language (such as C or C++), and also for external libraries to return data to Python as Numpy arrays.Linear algebra, Fourier transforms and random number generation
NumPy does not provide high-level data analysis functionality, having an understanding of NumPy arrays and array-oriented computing will help you use tools like Pandas much more effectively.
Scipy.org provides a brief description to Numpy package.
Here is an amazing tutorial that completely focuses on usability of Numpy
The SciPy library depends on NumPy, which provides convenient and fast N-dimensional array manipulation. The SciPy library is built to work with NumPy arrays, and provides many user-friendly and efficient numerical routines , such as routines for numerical integration and optimization. SciPy has modules for optimization, linear algebra, integration and other common tasks in data science.
I couldn’t find any good tutorial other than Scipy.org. This is the best tutorial for learning Scipy.
It contains high-level data structures and tools designed to make data analysis fast and easy. Pandas are built on top of NumPy, and makes it easy to use in NumPy-centric applications.
Data structures with labeled axes, supporting automatic or explicit data alignment. This prevents common errors resulting from misaligned data and working with differently-indexed data coming from different sources.
Using Pandas it is easier to handle missing data.
Merge other relational operations found in popular databases (SQLbased, for example)
Pandas is the best tool for doing data munging.
Quick intro to pandas
Alfred Essa has a series of videos on Pandas. These videos should give you a good idea of basic concepts.
Also don’t miss this tutorial by Shane Neeley, this video gives you a comprehensive intro to Numpy, Scipy and Matplotlib.
Matlplotlib is a Python module for visualization. Matplotlib allows you to easily make line graphs, pie chart, histogram and other professional grade figures. Using Matplotlib you can customize every aspect of a figure. When used within IPython, Matplotlib has interactive features like zooming and panning. It supports different GUI back ends on all operating systems, and can also export graphics to common vector and graphics formats: PDF, SVG, JPG, PNG, BMP, GIF, etc.
Show me do has a good tutorial on Matplotlib
I also recommend the cook book from pack publishers. This is an amazing book for someone getting started in Matplotlib.
Scikit-learn is a Python module for Machine learning built on top of Scipy. It provides a set of common Machine learning algorithms to users through a consistent interface. Scikit-learn helps to quickly implement popular algorithms on your dataset. Have a look at the list of algorithims available in scikit-learn, and you can quickly realize that it includes tools for many standard machine-learning tasks (such as clustering, classification, regression, etc).
Introduction to Scikit-learn
Tutorials from Scikit-learn.org
There are also other libraries such as Nltk(Natural language Tool kit), Scrappy for web scraping, Pattern for web mining, Theano for deep learning. But if you are getting started in python, I would recommend you to first get familiar with these 5 libraries. I have mentioned the tutorials that are beginner friendly, before going through these tutorials ensure that you are familiar with basics of python programming.
|Jetty provides a Web server and javax.servlet container, plus support for SPDY, WebSocket, OSGi, JMX, JNDI, JAAS and many other integrations. These components are open source and available for commercial use and distribution.|
Jetty is used in a wide variety of projects and products, both in development and production. Jetty can be easily embedded in devices, tools, frameworks, application servers, and clusters. See the Jetty Powered page for more uses of Jetty.
The current recommended version for use is Jetty 9 which can be obtained here: Jetty Downloads. Also available are the latest maintenance releases of Jetty 8 and Jetty 7.
The Jetty project is hosted entirely at the Eclipse Foundation and has been for a number of years. Prior releases of Jetty have existed in part or completely under the Jetty project at the Codehaus. See the About page for more information about the history of Jetty.
You can benefit from committer knowledge and get training, consulting services, professional support and even production SLAs, just ask us about it!
Open source software has emerged as the driving force of technology innovation, from cloud and big data to social media and mobile. The Future of Open Source Survey, sponsored by Black Duck and North Bridge Venture Partners, is an annual assessment of open source industry trends that drives broad industry discussion around key issues for new and established software-related organizations and the open source community.In addition, the three industries expected to be impacted most by open source were identified as education (76 percent), government (67 percent), and health care (45 percent), demonstrating how entrenched OSS has become to our social fabric. New Technologies – As data from the Black Duck® KnowledgeBase™ shows, with nearly one million open source projects to date, the rate of innovation is spurring new technologies such as the Internet of Things (IoT) and the continued rise of Software as a Service (SaaS). When asked what industries open source technology was leading, 63 percent cited cloud computing/virtualization, 57 percent answered content management, 52 percent selected mobile technology, and 51 percent answered security. A change in the way enterprises view open source was signaled by 56 percent of respondents expecting corporations to contribute to more open source projects in 2014.
Cranston Software rose out the ashes like the phoenix in 1988 to be at the service of humanity.
We were in the early years a pure software firm,designing and writing software for the legal industry.In 1989 we incorporated “Shareware” as part of our business model. And in the early 90s Linux as part of our networking component.
Our roots became more hardware centric in those days.We are a leading advocate in the Open Movement.
In the mid nineties it was the Internet.
Then security as cornerstone of our business model to this day.
Open Source was a logical taking off point at this junction.
Security being central to all Open Source products.