The H-1B program has seen its share of controversies over time. Every time there is a recession or some major layoffs announcement it seems there is a lot of noise made about the pros/cons of this program. Here are some quotes…
“There is no doubt that the [H-1B] program is a benefit to their employers, enabling them to get workers at a lower wage, and to that extent, it is a subsidy” — Nobel economist Milton Friedman
“Allowing more skilled workers into the country would bring down the salaries of top earners in the United States, easing tensions over the mounting wage gap, Greenspan said” — Bloomberg News, March 14, 2007, quoting Fed Chair Alan Greenspan; the median wage for software engineers at the time was $83,130
“Some employers said that they hired H-1B workers in part because these workers would often accept lower salaries than similarly qualified U.S. workers; however, these employers said they never paid H-1B workers less than the required wage — GAO, 2003; note the italicized portion (emphasis added), illustrating that the abuse is due to loopholes, not violations of the law
You can find the quotes above and more like that compiled by Norm Matloff at UC, Davis. We will not take any sides here but try to see what the published data says about the trends and such – good or bad is left for the reader to decide. The objective as always in my posts here is dual – go through the exercise of employing some tools & techniques for data analysis while gaining some insights into an interesting dataset. To those who may not know what this H-1B program is, here is a bit from Wikipedia:
The H-1B is a non-immigrant visa in the United States under the Immigration and Nationality Act, section 101(a)(15)(H). It allows U.S. employers to temporarily employ foreign workers in specialty occupations… a “specialty occupation” as requiring theoretical and practical application of a body of highly specialized knowledge in a field of human endeavor including but not limited to biotechnology, chemistry, architecture, engineering, mathematics, physical sciences, social sciences, medicine and health, education, law, accounting…
Those who oppose the program claim that the program depresses wages for American workers, and that the businesses employ H-1B workers supplied by outsourcing firms as a means to cut costs even while laying off similarly skilled American workers. The proponents of the program point to any number of high profile tech firms being led or driven by employees who came here on a H-1B visa. The point being that the new ideas/energies they bring, and the new products they help create enhances the well being of all Americans, not to mention the creation of net new jobs & wealth.
A thorough examination of these claims & counter claims is beyond the scope of this series of blogs. It will need more specific data than I can get my hands on at this time. For now, we lay the groundwork with an analysis of the disclosure data published by the department of labor. The idea to publish this blog came from reading this excellent write up by Robert Seaton on interesting data sets for statistical analysis. Specifically I am looking at the LCA (Labor Condition Application) data that reports on who the sponsors are for these foreign workers, what they are offering to pay (a range), the job title/area, where they will work etc… The data is laid out in spreadsheets/csvs by the year, for the period 2002-2015. Kudos to DOL for taking the pains to piece this information together from their e-filing system. But I guess they may have simply extracted the details and did not have the resources to ‘clean’ the data for typos, variant spellings/abbreviations etc… For example a sponsor like the ‘Hewlett Packard’ is reported at various times as ‘HP’, ‘Hewlett-Packard’ , ‘Hewlett Packard Enterprise’, ‘Hewlett Packard Company’ and so forth… While we can easily read all these as the same sponsor, it is a problem for our automated analysis. Some massaging of the data is needed before we can reliably answer a question like how many applicants did HP sponsor in year 2013? Regex scripting comes in handy for repeat massaging & testing for oddities. In the end, the available data allows one to ask questions like:
- Who are the most prolific sponsors over the years & what do they pay? Also if you are a foreign worker, which companies would you target so as to maximize your wages?
- What are the top job areas that the sponsors are seeking foreign workers for?
- What are the top destinations and how they are distributed across the states
- etc…
Is it useful in getting preliminary answers to questions like these? Useful or not let us just say that I am curious to find out given all the drama. But it is definitely useful later to take this analysis further and ask questions like
- Does the stock price of the sponsor firm show any correlation with the number of foreign workers it seeks?
- Have the wages of the American workers gone down in these sponsoring firms?
With this background & preamble out of the way we can get started now in earnest with the mechanics of obtaining some answers to these questions. The second post in this series will be devoted to that. The third and future posts will use that mechanism to answer some of these questions. Those who are reading this for the analysis can safely skip the second post.
Thanks for reading & feel free to leave a comment if you liked it (or not!)