Click to image to enlarge.
As you might guess, Integrity Research has interacted with many buy-side investors who either have already started a “big data” initiative or who are considering implementing one in the near-term. So, what issues should buy-side investors be aware of as they plan to develop a “big data” effort to enhance their current investment research processes?
Volume: Consistent with the term “big data” one of the obvious characteristics of any big data initiative is the volume of data that investors must be prepared to collect and analyze. As you can see from the slide above, 2.3 trillion gigabytes of data are created every day, with estimates suggesting that 43 trillion gigabytes of data will be created by 2020. Consequently, buy-side investors looking to develop a big data strategy must be prepared to warehouse and analyze huge amounts of data – considerably more than they have ever worked with in the past.
Velocity: Not only is the volume of data huge, but most big data initiatives require that investors analyze this data in real-time to identify meaningful signals. Fortunately, most buy-side investors are used to working with real-time data.
Variety: One of the key characteristics of “big data” initiatives is the variety of data types that buy-side investors can collect, including both structured and unstructured data. A few of the major external data types that we have identified for buy-side clients include data from public sources, social media sites, crowd sourcing efforts, various transaction types, sensors, commercial industry sources, primary research vendors, exchanges and market data vendors.
Veracity: All investors understand the problem of poor data quality when trying to build a reliable research process. Clearly, this becomes an exponentially more difficult issue for buy-side investors as they try to identify and ingest terabytes of data from numerous public and private sources, all who have different data collection and cleansing processes. Consequently, investors often have to implement sophisticated data quality checks to make sure that the data they warehouse is reasonably accurate.
Validity: One important concern for buy-side investors when deciding what data they want to acquire and or collect is whether this data is actually useful in helping predict the movement of securities or asset prices. Warehousing irrelevant data only increases cost and complexity without contributing value to the research process. Consequently, buy-side investors need to clearly think through the potential validity of a dataset before it is acquired.