In this project, I use a corpus of earnings announcements sourced from the websites of large financial institutions. For each of these institutions, I attempt to automatically generate a short summarization of their quarterly earnings announcements. The academic financial literature has shown that the textual information embedded in firm earnings announcements significantly explains firm stock returns, volatility, and earnings. I try three approaches to generate single document summarizations:
      I used the 2017 Yelp Academic dataset in order to build out a recommendation system using only NLP methods. In the past, Yelp recommendation systems have been built using matrix completion methods and restaurant attributes. However, I wanted to show that it was possible to achieve similiar recommendation results using measures of linguistic tone and topic models. Specifically, I use sentiment analysis based on the Hu & Liu 2004 word dictionary, Latent Dirichlet Allocation, Latent Semantic Analysis, and a 2 word n-gram TF-IDF feature matrix. The machine learning models that I test on each user are Random Forest, Linear Support Vector Machine, and Naive Bayes.
      A quick little script that pulls the entire history of corporate earnings for any user-specified companies listed on the NASDAQ stock exchange. The script takes stock tickers as the user input. Note that this script relies on the selenium, BeautifulSoup, and pdfkit modules. These can be installed using the line 'pip install [module_name]'. The usual opensource/noncommercial license legalese apply.
      A brief exercise in which we explore the time series properties of UK Inflation in the period from 1986-2015. We plot the annual and quarterly break downs of the series, look at the autocorrelation, and estimate the power spectral density. We use the Phillips-Perron and augmented Dickey-Fuller tests and find that we cannot reject the null hypothesis of the presence of a unit root in the series at a 5% significance level. We use a VAR and OLS techniques to develop an understanding of the relationship between UK unemployment and inflation. We also run Granger-Causality tests for the relationship of inflation on unemployment and vice versa. We find that we can reject the null of each series not Granger-causing the other at a significance level of 5%. In addition, we plot the impulse response functions of each series responding to a shock in the other series as well as decomposing the forecast error variance for both series. We conclude by developing a state space time-varying parameter model, setting inflation as the state and unemployment as the latent variable.
      We test basic time series models against UK Unemployment data from 1986 to 2015. We first note that the null hypothesis, the presence of a unit root in the time series, cannot be rejected. This indicates that the time series is not stationary, which precludes the use of ARMA models without first integrating the data. Nonetheless, as an exercise, we demonstrate the results using various AR and ARMA processes. We used Monte Carlo methods to verify our estimation procedures. Next, we performed a model selection exercise using the ARMA(p,q) model. We tested a combination of lag structures, p and q, from 0 to 5. We find that the AIC and AICC criteria indicate the use of an ARMA(5,5) model with ARMA(4,3) having very close values. The BIC criterion indicates very large lag structures but the ARMA(2,1) model is not too far off from the minimum value. Finally, we conclude by performing a forecasting exercise of the data using the AR(2) and ARMA(1,1) processes.