JSON is a helpful way to transmit data. Now, N1QL can query it thanks to a CURL update in Couchbase 5.0. See how to connect to various endpoints to do just that.

N1QL has many functions that allow you to perform a specific operation. One such function that has been added into the new Couchbase 5.0 DP is CURL.

CURL allows you to use N1QL to interact with external JSON endpoints; namely, Rest API’s that return results and data in JSON format. This function will allow N1QL to have a conservative set of curl functionality built into the language. Interaction primarily consists of data transfer to and from a server using the http and https protocols. In short, the CURL function in N1QL provides you, the user, a subset of standard curl functionality (https://curl.haxx.se/docs/manpage.html) within a query language.

In order to retrieve data from different servers (such as Google Maps, Yahoo Finance etc), we can issue either GET or HTTP POST requests using the CURL function. This is seen in the diagram below.

Function Definition

CURL (URL, [options])

The first argument is the URL, which represents any URL that points to a JSON endpoint. Only URLs with the http:// or the https:// protocol are supported. Redirection is disabled.

The input arguments to the CURL() function can be both static values and N1QL expressions that can be evaluated.

Later in the article we shall see examples that query from the Google Geocode API, the Yahoo Finance API, Couchbase full text search and the Github API. The second argument is a list of options. This is a JSON object that contains a list of curl options and their corresponding values.

We support a variety of options that allow you to interact with any endpoint effectively. In general these can be categorized into security related options and general options. A table of the supported options is given at the end of the article.

 

Security features/enhancements for CURL

With the addition of the CURL function, to avoid security vulnerabilities and control and minimize the risks associated with it, multiple security measures have been implemented.

Using CA certificates with N1QL’s CURL function

Certificates used by the N1QL CURL function should be stored on every query node within the cluster in the n1qlcerts directory. The location where this directory needs to be created by the user depends upon the location of the couchbase installation. (It is OS specific). The following assume default installation location and show where the n1qlcerts directory has been created.

Linux                        /opt/couchbase/var/lib/couchbase/n1qlcerts                                                         
Mac OSX /Users/couchbase/Library/Application\ Support/Couchbase/var/lib/couchbase/n1qlcerts
Windows C:\Program Files\Couchbase\Server\var\lib\couchbase\n1qlcerts

For non-default installation locations, the relative path – “../var/lib/couchbase/n1qlcerts directory” from the bin directory where cbq-engine is run from needs to be created.

This directory must be created for every query node.

Once this directory has been created, add the certificate for CURL to use in here. In order to use this certificate, we use the option cacert and pass in the name of the certificate.

For example if n1qlcerts/user1.pem is the name of the certificate, use the cacert option –

“cacert”:”user1.pem”

Only names are valid, paths are invalid and passing one will cause an error. CURL() throws an error in the case of invalid and expired certificates.

NOTE : The n1qlcerts directory and its contents need to be replicated for each query node within the cluster.

Custom headers and user-agent

CURL() runs on the query node within a cluster. This enables the function to gain access to all REST endpoints that are accessible through the Query service (since that is where the function is executed). In order to avoid access to such insecure endpoints, a custom header, that cannot be changed by the user, is added to all requests sent using the N1QL curl function. This is of the format “X-N1QL-User-Agent: couchbase/n1ql/1.7.0-N1QL”.

A user-agent is also always set by default. This can be reset using the -user-agent option. The value set by default is “couchbase/n1ql/1.7.0-N1QL”.

Both these values are designed to allow both internal and external endpoints to check for the header/user-agent and disallow access in their code and deny access to the function. The one caveat when using this however, is that we still cannot protect against any software that doesnt check for this header and the existing versions of locally installed software (both Couchbase and non-Couchbase software). For such cases, the curl whitelist feature has been added (see below).

 

Creating a Whitelist to restrict CURL access.

A whitelist is a JSON document, that lists out the permitted REST endpoints and URLs for the CURL() function to access. The URL’s themselves, need to be a prefix match. The whitelist document is created within the n1qlcerts directory (for the location see above), and is named curl_whitelist.json (this name is fixed and cannot be changed by the user). The file (curl_whitelist.json) needs to be created by the administrator (or a user with access to the machine where couchbase is installed).

If the whitelist is not setup (…./n1qlcerts/curl_whitelist.json doesnt exist) or if it exists but is empty then the CURL function cannot be used.

Any whitelist needs to have the following fields –

Field Type Description Default value
all_access boolean This will decide whether the user has access to all urls or only the urls specified in the allowed_urls array. false
allowed_urls array List of prefixes for urls that we wish to allow. empty
disallowed_urls array List of prefixes for urls that will be restricted no matter what empty

If the all_access field is false, then the usage for the CURL function has been fully restricted. In order to be able to use CURL() with any endpoint in N1QL, the administrator needs to set allowed_urls and disallowed_urls accordingly. In order to allow access to all urls, we can set all_access to true. This essentially gives us full CURL access.

Say, for example we wish to allow access to all google maps api endpoints but restrict access to all other endpoints. The following will be the contents of the whitelist.

Curl_whitelist.json

Any url in CURL() prefixed by https://maps.googleapis.com/ will be allowed.

NOTE : The whitelist needs to be replicated for each query node within the cluster.

Role based access to the CURL function

An important thing worth mentioning here is that CURL is designed so that it cannot be arbitrarily invoked. In order to avoid injection of data from an external source using the UPDATE statement, a new role QUERY_EXTERNAL_ACCESS has been added. Only a user assigned this role has access to the CURL function. By default this role membership is empty. The CURL function can only be accessed by a FULL_ADMIN or any user that has been granted the QUERY_EXTERNAL_ACCESS role by the FULL_ADMIN. For previous versions of couchbase that dont support role-based access control, a password protected bucket can be used. Also for the CURL() functionality, internally a specific set of SSL ciphers (MEDIUM or HIGH) are supported. This is dependent on the COUCHBASE_SSL_CIPHER_LIST.

 

Restricting the result size for CURL()

An important concern with using the CURL() function is when a user crafts a really long file –  greater than 64 MB, and tries to read from it. Since the data is loaded into memory, if the result size is not capped the query service could crash. Due to this possibility, the maximum result size for data that can be returned by CURL() is 64MB (67 108 864 bytes). The user can restrict the amount of memory for CURL results by using the result-cap option.The minimum (default)value for the result-cap option is 20MB ( 20 971 520 bytes).

For example if the query sets result-cap to 20MB – “result-cap”:20971520 then any response that has a larger size will return an error.

 

Interaction With Different Endpoints

Let us see how to query different endpoints using the CURL function in N1QL.

Google Maps Geocoding API

The Geocoding API from Google Maps allows you to convert static addresses into coordinates and vice versa using HTTP request. (For more information refer to https://developers.google.com/maps/documentation/geocoding/intro )

Say you want to search for Santa Cruz in Spain using your Google Dev API key. In order to do this, you can use the following query:

Curl Request

 

Corresponding Query

This query retrieves the address and geographic location bounds of the address, Santa Cruz, ES. We use the address, components and key parameters from the Google Maps Geocoding REST API. The “data” option represents the curl data option that represents HTTP POST data.  However, because this is a get request we set the “get” option to true. You can provide values to all the REST parameters within the data option.

Now let’s search for Half Moon Bay

Yahoo Finance API

The Yahoo Finance API allows you to use the Yahoo Query Language (YQL) to fetch stock quotes (as seen in http://meumobi.github.io/stocks%20apis/2016/03/13/get-realtime-stock-quotes-yahoo-finance-api.html ). Below is the YQL SELECT statement to access the stock of Hortonworks Inc (HDP).

In order to get the results in JSON you can use the following URL:

https://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20yahoo.finance.quotes%20where%20symbol%20in%20(%22HDP%22)&format=json&diagnostics=true&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=

Curl Request

Corresponding Query

For this query, the value of the data option contains the Yahoo REST parameters, q (for the YQL query), format (to return data in JSON) and some other parameters.

Couchbase Full Text Search

Couchbase’s Full Text Search allows you to apply fuzzy search to data stored in Couchbase. For more information see https://blog.couchbase.com/2016/february/couchbase-4.5-developer-preview-couchbase-fts .

Suppose you create a FTS index called beers on the beer-sample bucket in Couchbase. You can now search for beer type pale ale using this index, using the CURL function in N1QL. It is important to note that FTS currently accepts HTTP POST instead of GET. To explicitely specify the POST request method, use the request option.

Curl Request

Corresponding Query

We give multiple options in this query. The header option allows you to pass a custom header to server. Content-Type : application/json tells the server that the data is provided in JSON format. If we have a password protected bucket in Couchbase, then we need to pass its credentials with the query. The user option can be used to pass in a colon-separated username and password. The request option specifies that POST request method is used.

If you want to retrieve only those documents from beer-sample that are returned by the search above, you can write a N1QL JOIN query as follows.

This will retrieve the ids of the documents returned by the FTS query that searches for pale ale, along with the total hits and all the details from the corresponding document in beer-sample.

Github API

Github’s API is a bit different from the previous API’s, in that it returns multiple results in the form of a JSON array of result values. This can be treated as multiple documents.Refer to the Github API docs in https://developer.github.com/v3/ for more details on what can be queried.

Say you want to find out all the repositories linked to a Github account. The following query does this

Curl Request

Corresponding Query

If the account has three repositories, the query gives three results (here I have added limit 1). The RAW keyword is used  to return the array of documents that the query returns, without a wrapper object. One point you will notice is that the header option contains the User-Agent with a github username. This is now mandatory for all Github API requests.

Now from this list, say you would like to know, what is the clone url for each of these repos. The following query accomplishes this

Summary

As you can see with the above examples, using the CURL function, N1QL users can now interact with any external API’s that return results in JSON format. This opens up many possibilities. For example, if Couchbase contains data corresponding to different hotels, then you can use the Google Maps API to find nearby locations to each of the corresponding hotels.

In order to have a secure environment with the addition of CURL() multiple security enhancements have been added. The following is a short list

  • CURL runs on the query node within a cluster.
  • CURL function is disabled by default.
  • CURL supports HTTP and HTTPS only. All other protocols are disabled.
  • Redirection of URLs is not allowed.
  • Custom header for N1QL CURL is “X-N1QL-User-Agent: couchbase/n1ql/1.7.0-N1QL”.
  • User-Agent is  “couchbase/n1ql/1.7.0-N1QL”.
  • Restrict amount of memory for CURL results using result-cap. Min result cap will be 20MB, max result-cap is 64MB.
  • FULL_ADMIN role will allow access to CURL. QUERY_EXTERNAL_ACCESS role can be assigned to a user by the FULL ADMIN. This will enable the user to use the CURL functionality.
  • Certificates should be stored on the local machine – each query node within a cluster. Use …./Couchbase/var/lib/couchbase/n1qlcerts to store certificates.Use cacert to pass in the “name” of the certificate to use. Only names are valid, paths are invalid. (passing in a path will cause an error.)
  • CURL throws an error in case of invalid/expired certificates.
  • User has the ability to “Whitelist” endpoints.

The N1QL implementation of CURL uses the golang libcurl API  – https://github.com/andelf/go-curl

List of Available Options

Security Options

Option Description value
user Server user and password

When password is empty it is treated as an empty password string.

USERNAME[:PASSWORD]
basic Use HTTP Basic Authentication BOOLEAN (TRUE/ FALSE)
insecure   Allow connections to SSL sites without certs (H) BOOLEAN (TRUE/ FALSE)
anyauth curl to figure out authentication method by itself, and use the most secure one BOOLEAN (TRUE/ FALSE)
cacert Specify CA signed certificate filename

Certificates should be stored on the local machine – each query node within a cluster.

/Couchbase/var/lib/couchbase/n1qlcerts to store certificates. This is not visible to the user.

The Filename cannot contain a path. If it is not a match to the existing contents of n1qlcerts directory, the function throws an error.

For expired and invalid certificates throw an error.

FILENAME (This is the certificate, pem file for aws for example)
result-cap Set capacity for buffer that stores result of CURL operation Number of MB. (Minimum is 20MB)

 

Other Transfer-Related Options

Option Description Value
get, G Get request for CURL BOOLEAN (true/false)
X, request Set the request method. This only accepts GET or POST and is case sensitive.

For all other cases it errors out.

{“request”:”POST”}
connect-timeout Maximum time allowed for connection. It should contain the maximum time in seconds that you allow the connection phase to the server to take. This only limits the connection phase, it has no impact once it has connected. Set to zero to switch to the default built-in connection timeout – 300 seconds.

If float value, we truncate it to the integer value.

For all other types (not a number) throw error.

SECONDS
max-time Maximum time allowed for the transfer operation.

Default timeout is 0 (zero) which means it never times out during transfer.

If float value, we truncate it to the integer value.

For all other types (not a number) throw error.

SECONDS
data      HTTP POST data (H)

Allows us to set all the rest api parameters for the given endpoint.

STRING

OR

[…string,string….]

header    Pass custom header string to server (H) STRING

OR

[…string,string….]

show-error Show error.

When true show errors when they occur.

When false suppress the errors

BOOLEAN (TRUE/ FALSE)
silent Silent mode (don’t output anything) BOOLEAN (TRUE/ FALSE)
keepalive-time Wait SECONDS between keepalive probes for low level TCP connectivity. (Does not affect HTTP level keep-alive) SECONDS
user-agent Value for the User-Agent to send to the server. STRING
data-urlencode Encode the data, and send to server.

This is a test => this%20is%20a%20test  

STRING

OR

[…string,string….]

 

Posted by Isha Kandaswamy

Leave a reply