I’m a senior data scientist at The Washington Post. At The Post, I’m a member of the newsroom engineering team, where we are currently focused on building elections infrastructure. The things I work on include:
building live forecasting models for election nights using partial data. You can read blogposts about the model we used for the general election in Virginia in 2019 here, the model we used for the 2020 Democratic primary here here and the model we built for the 2020 general election here.
parsing and processing voterfile data. I also build tools to help journalists access the data and find interesting story leads. Along with Nick Diakopoulos and Madison Dong, I co-authored a paper on this topic, which you can read here.
building and maintaining our Federal Election Commission campaign finance database.
Previously I was on The Post’s Big Data and Personalization Team. There I worked with stakeholders from across our newsroom and marketing team to build products and develop algorithms. My main focus was on trying to integrate artificial intelligence into newsroom processes. Projects I worked on included:
algorithmic metadata tagging for articles, including topic modeling to support our recommendation and ad targeting services
text-based image recommendation, to allow journalists to quickly find relevant images for their stories
unstructured data search tools that let journalists quickly search over multiple data sources to find relevant ledes
Generally, I’m interested in natural language processing and quantitive text analysis, though my current focus lies with election related data. Prior to joining The Post, I was a part-time data scientist at Sidewire, a San Francisco-based media startup that sadly no longer exists. My main focuses at Sidewire were recommender engines and network analyses of users.
I have an M.Sc in Statistics and a B.Sc in Mathematical and Computational Science, both from Stanford University.
If you want to learn more about me or have a cool project you’d like to collaborate on, please feel free to send me an email at email@example.com.
Click here to download my CV.
Here are some of my most recent projects and posts
One of my favorite things to do with political datasets is to ideologically scale them. The underlying assumption of scaling is that some high dimensiona...
With guidance from Ariel Schwartzman and Ben Nachmen from the Stanford Linear Accelerator (SLAC) and the European Organization for Nuclear Research (CERN...
Here you can find the writeup for my first natural language processing project. It was the final project for CS224d: Deep Learning for Natural Language P...