More or less exactly two months after the second developer preview, I'm delighted to announce that we've shipped the first (and hopefully only) beta release of the Couchbase Spark Connector. It is a major step forward, bringing Spark 1.4 support as well as official documentation and lots of smaller enhancements. In particular:

  1. Support for Spark 1.4
  2. Overhauled Spark SQL DataFrame support
  3. Java APIs
  4. saveToCouchbase() supports StoreModes

You can get it from the Couchbase Maven Repository right away:

Documentation is now officially available here!

Spark 1.4 Support

Spark 1.4 has been selected as the target Spark version for the 1.0 GA release. As a result, all the spark dependencies have been bumped. Since 1.4 brings a new API for DataFrames, the Connector modified its API as well to blend perfectly into it.

The DataFrame API has changed so that the underlying source works through the DataFrameReader and DataFrameWriter. Other than that, it feels very similar to the previous API.

Here is an example on how to read data out of the travel-sample bucket:

You can also write a DataFrame into couchbase:

Java APIs

Many people use spark through its Java API, so of course we also want to provide support for it. Since the API exposure of the connector is by design very small, not much API needs to be converted. The java API lives under the com.couchbase.spark.java namespace and can be used like this:

StoreModes

Previously the saveToCouchbase() method only used the underlying upsert method to store its data. Since there might be scenarios where you don't want to (or just) override documents, more flexibility is needed. This is why we've introduced the StoreMethod enum, which supports the following values:

  • UPSERT: Insert if it doesn't exist and override if it does.
  • INSERT_AND_FAIL: Try to insert and fail if it does exist.
  • INSERT_AND_IGNORE: Try to insert and ignore failures if it does exist.
  • REPLACE_AND_FAIL: Try to replace and fail if it doesn't exist.
  • REPLACE_AND_IGNORE: Try to replace and ignore failures if it doesn't exist.

Using it is very easy, the following correctly fails since the document already exists:

The Road Towards GA

The 1.0.0 GA release of the connector is planned a month from now, leaving room to fix bugs and improve documentation. Please help us kick the tires as much as possible so we can ship an awesome GA release!

Author

Posted by Michael Nitschinger

Michael Nitschinger works as a Principal Software Engineer at Couchbase. He is the architect and maintainer of the Couchbase Java SDK, one of the first completely reactive database drivers on the JVM. He also authored and maintains the Couchbase Spark Connector. Michael is active in the open source community, a contributor to various other projects like RxJava and Netty.

Leave a reply