English
Français

01/08/2009 VectorWise from Ingres promises a database big step for analytics

I was fortunate enough to speak with Marcin Zukowski earlier about VectorWise.  If you missed it, VectorWise came out of stealth mode a day or two ago.  The have announced a joint partnership with Ingres and essentially are claiming impressive analytic RDBMS performance gains on conventional hardware.

To start with, a key message that I think needs to be communicated here is that this is not a product announcement.  Ingres and VectorWise have announced a partnership in which they of course plan to build products together, today those products are still in the works.

VectorWise is a spin out of CWI based on research that was undertaken by Marcin and others, research that centered on MonetDB.  Explaining the essence of VectorWise is difficult because it is largely internal DBMS data storage & processing logic, but I will have a go.

The modern RDBMS is based around design principles that stem from general purpose OLTP roots and historical hardware architectures (this is partially true even for some of the newest analytic platforms).  These design principles in a nutshell focus on the fact that disk is slow & CPU is fast.  Data is seeked or partially scanned off disk and cached.  Row-by-row (tuple-by-tuple) operators process that data, passing the outcome of each operator to the next as part of a queries execution plan until ultimately producing the result. 

Traditionally I/O is the main bottleneck, so to make the database faster you add more I/O bandwidth.   Today, disk requirements may be up to 100x the actual capacity needs, so many disks are necessary to achieve the I/O bandwidth to provide performance for an analytical RDBMS implementation.  Even though the RBDMS’s may parallelize query operators across cores, this typically works by partitioning data between cores, yet each is still processing on a tuple-by-tuple basis.

Conventional wisdom?  Well maybe.  You see disk is only really “slow” when it is doing random seeks.  Give a disk something sequential to do on the other hand and things are very different.  Modern disks are able to sequentially scan in the range of 150MB per second.  An array of 10 disks should therefore be able to return sequentially read data in the range of 1GB per second. 

When it comes to databases, column based storage has been found to effectively structure data for a) high levels of compression and b) sequential access.  VectorWise makes use of both of these technologies to help it achieve high levels of sequential I/O.  The problem now however is that disk may no longer the bottleneck.  While we can get 1GB a second sequentially off disk relatively easily & cheaply, processing tuple-by-tuple at this rate is very difficult.  As it turns out, a RDBMS’s may only achieve a data processing rate of 50MB a second per CPU core.  This makes the CPU processing limitations a big bottleneck for analytics data sets, assuming the above figures we would need over 20 cores to keep up with 10 disks (and of course CPU cores don’t scalability linearly).

If we step out of the database world for the moment into the world of high end computer games, or high end scientific processing, we find their use of current CPU technology is much more advanced than what we are used to.  They are using new CPU extensions (MMX, SSE, SS2, SSE4.2 etc) to parallize & pipeline computation within a CPU’s core meaning they are processing orders of magnitude more instructions per core that what a traditional RDBMS typically has been able to. The exact details are too low level to discuss here (many of the research papers are available online) but it is fair to say, modern CPU architectures contain advanced features that to date haven’t effectively been exploited by database vendors.

Enter VectorWise.  Their aim is to marry storage technologies which allow high levels of sequential I/O to occur with query processing logic which is designed for modern CPU architectures.  Rather than process tuple-by-tuple they are processing “vectors”, groups of tuples, leveraging modern CPU extensions and high levels of on-chip cache to allow the CPU to carry out higher data processing throughput.  The result is instead of the 50MB a second in a tuple-by-tuple approach, VectorWise are able to achieve processing rates in the range of 500Mb-1GB a second per core in some situations.  This means processing rates of 8GB a second or more could be possible with relatively low end hardware.

“In some situations” is the key point to stress here, this obviously isn’t a blanket gain that applies to all analytic data sets, workloads and query requirements.  Just what those situations are will be the key to their technologies success, how well it actually applies to real world data sets and queries.  I wouldn’t expect to see too many specific examples on this until a product beta appears.  But the theory is VectorWise can offer high levels of processing capabilities with existing mainstream hardware.  At this point VectorWise isn’t even focusing on MPP instead they are single node focused.  If their scalability claims pan out you can imagine how this could allow a single node solution to be competitive with existing low to mid scale MPP solutions that are based on a more conventional query processing architecture.

This isn’t VectorWise’s only trick up their sleeve.  They are also are leveraging research around column based storage, compression, piggy-backed (shared) scans and so on.  Much of the research that has been adopted by VectorWise is referenced from their web site.

So VectorWise have impressive technology, so why then partner with Ingres rather than a larger vendor (or going at it alone)?  Marcin offers a few reasons.  Firstly, as academics they feel strongly that open source is cool so this path was greatly preferred over a relationship with a non-open vendor.  Secondly Ingres will allow them to deliver their technology in an uncompromised fashion.  Marcin mentioned that if they had partnered with one of the big three vendors, that vendors existing product strategies and investments would have likely meant their ideas could have only been implemented in partial form.  Ingres on the other hand is going to allow them more of a green field.  And of course, a partnership with Ingres makes sense from a go to market perspective as Ingres already has a worldwide reputation, a global customer base, sales & marketing capabilities etc.

Marcin confirmed that Ingres have an exclusive license to their technology, and first option to acquire them for a certain period of time.  This allows Ingres to really invest in the relationship without the fear of the carpet being pulled out from under them. 

VectorWise clearly are applying innovative research to analytical RBDMS requirements.  But as interesting as the technology sounds, the proof in the pudding will be how well these design principals translate to real-world analytical processing requirements in mainstream product form.  This remains to be seen, but Ingres and their community clearly has high hopes.

VectorWise is clearly differentiated when comparison with a traditional mainstream RDBMS running on mainstream hardware.  However in this current market we have lots of different approaches to the problems described.  Kickfire for example use their own SQL Chip processor to increase data processing rates and other appliance vendors are using FPGAs etc for similar purposes.  The comparison of these different approaches and the relative effectiveness of each approach still need to be examined, however a mainstream hardware approach has obvious benefits.

To top

13/07/2012 The stealth success of PostgreSQL - 24/04/2012 Is the Xerox Mobile Scanner Good? - Silex Expands Product Line With High-Performance Gigabit and 802.11n Wireless USB Device Server ConnectivitySolutions - 24/03/2011 Outback Imaging launches EzeScan 4.2.135 with Alfresco Integration - 20/12/2010 Ingres Set to Close 2010 with Strong Financial Results and Unique Market Position - 03/12/2010 Ingres Expands Global Presence and Launches Into Japan - 02/12/2010 Three Xerox DocuMate Scanners Receive Editor’s Choice Awards from Better Buys for Business - 30/11/2010 Visioneer Strobe 500 Receives Product of the Year Award  - 30/11/2010 Nuance eCopy ShareScan Suite 5 Available. - 23/11/2010 Ingres Expands Executive Management Team With New Hire in Sales - 12/11/2010 Propalms TSE 6.5  - 09/11/2010 Garnett and Helfrich Capital Acquires 20 Percent Stake in Ingres Corporation - 26/10/2010 New Xerox DocuMate 3920 Network Scanner Simplifies Secure Document Capture and Critical Information Sharing - 20/10/2010 Pentaho and Ingres Team for Agile Business Intelligence and High Performance Data Warehouse - 12/10/2010 Visioneer NetScan 4000 Awarded Editor’s Choice from Better Buys for Business iGuide  - 12/10/2010 Ingres Database 10 Pulls Out All Stops With Further Migration and Performance Enhancements - 14/10/2010 The end of the hourglass - 11/10/2010 Business analytics at the speed of thought - 07/09/2010 Ingres and Univention Sign Strategic Cooperation Agreement - 01/09/2010 Silex S1 End of Life Notification - 17/08/2010 New Xerox DocuMate 3460 Delivers Fast and Easy Scanning of Documents and Plastic ID Cards - 06/08/2010 Webcast: Blazing Fast Business Intelligence with Ingres VectorWise and Pentaho - 03/08/2010 Ingres and Global Open Source Leaders Converge in Australia - 29/07/2010 Datamatics Opts for Ingres Worldwide - 27/07/2010 Xerox DocuMate 3115 Receives “Outstanding Achievement” Award from Buyers Laboratory - 26/07/2010 EnterpriseDB Announces Immediate Availability of Newest Version of Postgres Plus Advanced Server - 20/07/2010 Propalms, Inc. Receives Additional Order for TSE 6.0 Licenses from Ohio Based Health Care Company - 07/07/2010 Propalms, Inc. Receives Purchase Order for TSE 6.0 from Saudi Arabian Ministry of Education - 07/07/2010 Silex SX-200-0213 End of Life Notification - 29/06/2010 Visioneer Strobe 500 Receives “Outstanding Achievement” Award from BLI  - 22/06/2010 Visioneer Launches New OneTouch Application for Visioneer and Xerox Scanners - 17/06/2010 Ingres Supports the Dutch Police in Effort to Prevent Football Related Violence Accros Europe - 25/06/2010 Ingres: Vector Databases Are Real Faster databases and big progress on open source - 08/06/2010 INGRES VECTORWISE live speed demo - 08/06/2010 Ingres Shows Faster Queries With VectorWise - 08/06/2010 Ingres VectorWise Delivers Business Analytics at the Speed of Tought - 04/06/2010 PostgreSQL 9.0 Beta 2 Now Available - 01/06/2010 Continuent Tungsten Offers Scale-out Solution for PostgreSQL 9 - 21/05/2010 Silex SX-550 End of Life Notification - 20/05/2010 Postgres Announces DA-SOFT AnyDAC Release 3.0.1 - 19/05/2010 Servoy moves to PostgreSQL, goes Open Source - 18/05/2010 Propalms Receives Purchase Order for its New Upgraded TSE 6.0 from Leading Indian Overnight Courier Company - 18/05/2010 Cybercluster 2.0 - Synchronous replication for PostgreSQL - 18/05/2010 Ingres and SEP AG Provide backup for the city of Schwäbisch Hall using Novell Solution - 17/05/2010 Postgres announces DbWrench Database Design & Synchronization v1.6.4 - 12/05/2010 Primekey Integrates Ingres Database into Global Security Solutions - 10/05/2010 Postgres announces release of Navicat version 9 - 07/05/2010 EMS Data Comparer for PostgreSQL version 3.0 released - 07/05/2010 Database .NET 3.2 has released! - 12/05/2010 PrimeKey Integrates Ingres Database Into Global Security Solutions - 05/05/2010 Ingres Announces Strategic Partnership with Bendigo Partners  - 04/05/2010 Scan, Share and Recycle with the New Visioneer® Strobe™ 400 - 03/05/2010 Ingres CEO Roger Burkhardt to Give Keynote as Ingres Supports Red Hat as Premium Sponsor  - 01/05/2010 Ingres Vectorwise smokes it! - 29/04/2010 PostgreSQL 9.0 Beta 1 Now Available  - 29/04/2010 PostgreSQL 9.0b1 announcement, with built-in replication support - 27/04/2010 Peerless Foods reports savings of $300,000 to $400,000 a year in licensing fees by using Ingres OpenROAD to build its ERP system - 27/04/2010 Ingres Customer to Showcase Revolutionary Blackberry Mobility Application Built on Ingres Database  - 20/04/2010 Ingres Community Come Together For 2010 Ingres OpenROAD Code Sprint  - 15/04/2010 Xerox Announces DocuMate 3115® - Two Powerful Scanners In One Small Device - 15/04/2010 Ingres and Engineering Release New Analytical Appliance For The Business Intelligence Market  - 14/04/2010 Oracle Tries to Reassure the Open Source Community, But They Aren't Buying - 29/03/2010 100 Mbit/s pour chaque Luxembourgeois - 24/03/2010 Novell and Ingres to jointly offer database solutions to ISVs and SIs - 16/03/2010 Visioneer Strobe 500 Receives PC Magazine Editors' Choice Award - 26/02/2010 Ingres to move open source database to beta - 25/02/2010 DAI Rubicon Puts Supply Chain Delivery Solution in the Cloud - 23/02/2010 Ingres Makes Rain for Partners by Powering Application Development in the Cloud - 22/02/2010 Xythos launches version 7.2 - 22/02/2010 Why PostgreSQL is a better enterprise database than MySQL - 08/02/2010 INGRES CEO EXPLORES THE CLOUD ON INFORMATIONWEEK PANEL - 02/02/2010 Ingres' VectorWise rises to answer Microsoft - 27/01/2010 Why Open Source is Needed to Combat Climate Change - 27/01/2010 Is Oracle protecting core business from MySQL? - 26/01/2010 INGRES announces technology preview program for Ingres VectorWise - 24/01/2010 Sept prévisions pour l'open source en 2010, par Roger Burkhardt - 22/01/2010 Postgres Community Responds to EU Decision to Approve Oracle's Acquisition of MySQL - 21/01/2010 Les "secrets" des mots de passe... - 12/01/2010 Government of Jordan selects Ingres to drive OpenSource adoption across the country - 29/10/2009 Red Hat débarque sur le marché des SGBDR - 15/10/2009 Open source database to wean companies off Oracle - 01/08/2009 VectorWise from Ingres promises a database big step for analytics - 29/07/2009 Ingres claims massive database performance boost - 05/07/2009 Open Source Flies In A Recession - 02/07/2009 OpenSource security opinion by Kapersky expert - 13/05/2009 INGRES partners with RedHat, offers end-to-end OpenSource solutions - 27/04/2009 Ingres announces Icebreaker BI integrated with SalesForce - 20/04/2009 ORACLE buys SUN, pushing MySQL users to INGRES OpenSource offers - 17/03/2009 La gendarmerie, cas d'école d'une migration à grande échelle vers les logiciels libres - 14/01/2009 Roger Burkhardt's predictions for 2009 and after - 25/11/2008 Journée du Logiciel Libre du Luxembourg -