Skip to content

Bigtable

Bigtable is a sparsely populated table that can scale to billions of rows and thousands of columns, enabling you to store terabytes or even petabytes of data. A single value in each row is indexed; this value is known as the row key. Bigtable is ideal for storing large amounts of single-keyed data with low latency. It supports high read and write throughput at low latency, and it's an ideal data source for MapReduce operations.
Bigtable is exposed to applications through multiple client libraries, including a supported extension to the . As a result, it integrates with the existing Apache ecosystem of open source big data software.
Bigtable's powerful backend servers offer several key advantages over a self-managed HBase installation:
Incredible scalability. Bigtable scales in direct proportion to the number of machines in your cluster. A self-managed HBase installation has a design bottleneck that limits the performance after a certain threshold is reached. Bigtable does not have this bottleneck, so you can scale your cluster up to handle more reads and writes.
Simple administration. Bigtable handles upgrades and restarts transparently, and it automatically maintains high . To replicate your data, add a second cluster to your instance, and replication starts automatically. No more managing replicas or regions; just design your table schemas, and Bigtable will handle the rest for you.
Cluster resizing without downtime. You can increase the size of a Bigtable cluster for a few hours to handle a large load, then reduce the size of the cluster again—all without any downtime. After you change a cluster's size, it typically takes just a few minutes under load for Bigtable to balance performance across all of the nodes in your cluster.

What it's good for?

Bigtable is ideal for applications that need high throughput and scalability for key-value data, where each value is typically no larger than 10 MB. Bigtable also excels as a storage engine for batch MapReduce operations, stream processing/analytics, and machine-learning applications.
You can use Bigtable to store and query all of the following types of data:
Time-series data, such as CPU and memory usage over time for multiple servers.
Marketing data, such as purchase histories and customer preferences.
Financial data, such as transaction histories, stock prices, and currency exchange rates.
Internet of Things data, such as usage reports from energy meters and home appliances.
Graph data, such as information about how users are connected to one another.

image.png

Bigtable storage model

Bigtable stores data in massively scalable tables, each of which is a sorted key-value map. The table is composed of rows, each of which typically describes a single entity, and columns, which contain individual values for each row. Each row is indexed by a single row key, and columns that are related to one another are typically grouped into a column family. Each column is identified by a combination of the column family and a column qualifier, which is a unique name within the column family.
Each intersection of a row and column can contain multiple cells. Each cell contains a unique timestamped version of the data for that row and column. Storing multiple cells in a column provides a record of how the stored data for that row and column has changed over time. Bigtable tables are sparse; if a column is not used in a particular row, it does not take up any space.
image.png
A few things to notice in this illustration:
Columns can be unused in a row.
Each cell in a given row and column has a unique timestamp (t).
image.png


Cloud Bigtable is optimized for time-series data. It is cost-efficient, highly available, and low-latency. It scales well. Best of all, it is a managed service that does not require significant operations work to keep running.
Want to print your doc?
This is not the way.
Try clicking the ··· in the right corner or using a keyboard shortcut (
CtrlP
) instead.