(Excerpt from original post on the Taneja Group News Blog)
Google’s AlphaGo program has just whipped a top human Go playing champion (Lee Sedol – a 9 Dan ranked pro and winner of many top titles) four games to one in a million dollar match. You might shrug if you are not familiar with the subtleties of playing Go at a champion level, but believe me that this is a significant milestone in Machine Learning. Software has long proven to be able to master games whose moves can be calculated out to the end game with enough computing power (like checkers and chess), but Go is (as of yet) not fully computable due to its board size (19×19) and seemingly “intuitive” beginning and even mid-game move options.
…(read the full post)
Image via Wikipedia
If you think of an HTML page as a structured “marked up” document, it’s basically a form of data. The structure, in this case represented by HTML tags like <a> and <div>, identifies various document elements which can be interpreted as data fields. In fact, strict HTML is good XML (referred to as XHTML), a well-understood data format.
Think of “documents” as structured data where the structure is included in the document itself. The structure is free-form in the sense that the document author decides what data fields are included, how they are organized, ordered, nested, related, and so on. If you are object-oriented, you can also view each document as an object (technically for web pages this is referred to as the DOM – “Document Object Model“).
Big Data in a Small World
Google perhaps lead the way into this brave new world with their proprietary “Big Table” database architecture backing many of their services. Apache’s Hadoop project (includes HDFS and HBase) is fundamentally based on Google’s open papers.
Most big data sets in our small world are going to be produced by numerous (countless?) authors and applications. Most big data is going to be in the form of documents rather than standardized (i.e. described by external schema) transactional data. Since most current data handling, storage, and analysis technology is aimed at transactional schema controlled data, I’m thrilled to explore today’s emerging market of new “big data” solutions.