Failed to update 6m + documents on couchbase community server version 3.0.1

I am trying to update 6 million+ documents on a couchbase 3.0.1 community server cluster. I am using the latest java sdk and have tried various ways that I could read a batch of documents from a view, update them and replace them back into the bucket.

It seems to me that as the process progresses, the throughput becomes too slow, which is not even 300 op / s. I tried many ways to do it using bulk operation method (using Observable) to speed it up, but to no avail. I even let the process run for hours just to see the Timeout exception later.

The last option I tried was to read all document IDs in the temp file from the view so that I could read the file and update the records. But after 3 hours and only 1.7m ID read (~ 157 units / sec in total!) From the view, DB gives a timeout exception.

Note that the couchbase cluster contains 3 servers (Ubuntu 14.04) with 8 cores, 24GB RAM and 1TB SSD each, and the java code running to update the data is on the same network with 4 cores, 16GB RAM, and 1 TB SSD. And there is no other load on this cluster.

It seems like reading even all ids from the server view is not possible. I checked the network bandwidth and the DB server was giving barely 1 Mbps of data.

Below is a sample code that is used to read all document IDs from a view:

final Bucket statsBucket = db.getStatsBucket();
int skipCount = 0;
int limitCount = 10000;

System.out.println("reading stats ids ...");

try (DataOutputStream out = new DataOutputStream(new FileOutputStream("rowIds.tmp")))
{
    while (true)
    {
        ViewResult result = statsBucket.query(ViewQuery.from("Stats", "AllLogs").skip(skipCount).limit(limitCount).stale(Stale.TRUE));

        Iterator<ViewRow> rows = result.iterator();

        if (!rows.hasNext())
        {
            break;
        }

        while (rows.hasNext())
        {
            out.writeUTF(rows.next().id());
        }

        skipCount += limitCount;
        System.out.println(skipCount);
    }
}

      

I have tried this even when using the bulk operation method (Observable) without any success. Also tried to change the number of limits to 1000 (no limit, java application goes over the nuts after a while and even SSH stops responding.

Is there a way to do this?

+3


source to share


1 answer


I found a solution. The ViewQuery.skip () method does not skip and should not be used for pagination. The skip () method will simply read all of the data from the beginning of the view and only start producing output after reading the number of records, just like a linked list.

The solution is to use startKey () and startKeyDocId (). The ID that goes into these methods is the last item ID that you read. Got this solution from here: http://tugdualgrall.blogspot.in/2013/10/pagination-with-couchbase.html



So the last code to read all the elements in the view:

final Bucket statsBucket = db.getStatsBucket();
int limitCount = 10000;
int skipCount = 0;

System.out.println("reading stats ids ...");

try (DataOutputStream out = new DataOutputStream(new FileOutputStream("rowIds.tmp")))
{
    String lastKeyDocId = null;

    while (true)
    {
        ViewResult result;

        if (lastKeyDocId == null)
        {
            result = statsBucket.query(ViewQuery.from("Stats", "AllLogs").limit(limitCount).stale(Stale.FALSE));
        }
        else
        {
            result = statsBucket.query(ViewQuery.from("Stats", "AllLogs").limit(limitCount).stale(Stale.TRUE).startKey(lastKeyDocId).skip(1));
        }

        Iterator<ViewRow> rows = result.iterator();

        if (!rows.hasNext())
        {
            break;
        }

        while (rows.hasNext())
        {
            lastKeyDocId = rows.next().id();
            out.writeUTF(lastKeyDocId);
        }

        skipCount += limitCount;
        System.out.println(skipCount);
    }
}

      

+1


source







All Articles