Although there have been immense improvements in the speed and capacity of modern computers in the last 20 years, there is often still a notable time lag between ingesting data and being able to act on the results from it being processed and queried.
It’s still relatively common, for example, for batch processes to be queued up that query large data sets, in extreme cases to be run overnight. In contexts like business intelligence gathering, the relatively glacial speed of surfacing insights isn’t a show-stopper. But in many contexts, the time taken between information arriving at a database and an action taking place based on data processing can be critical.
One of the main sectors in which CrateDB works, for example, is in manufacturing. Here, IIoT sensors on fast-moving machinery report data in real-time. Speed of ingestion, processing, and attenuation of devices – to take a single example – is critical. Similar capabilities are needed for geospatial applications, and in certain financial processing functions.
We spoke to CrateDB’s Developer Relations Lead, Simon Prickett, about the platform and some of its unique features.
“Where we fit well is minimising the lag between receiving information and making it available to query,” he said. “If you receive data, put it in a queue, and it’s not processed until sometime later, then your time to figure out that, say, your conveyor belt that’s been working 24/7 […] is starting to get too warm, is greatly increased. So we optimise to a concurrent read and write workload.
“It has an incredibly high ingestion rate. The way we do that is through clustering. The cluster figures out what it needs to do and you can add more nodes to your cluster and scale it up. So we say it’s a real time analytics database.”
The flexibility of ingestion capability and available processing power gives CrateDB users the ability to increase or diminish the power of data operations, but there’s further extensibility in terms of input.
“You might have something structured like a relational table, like Postgres, where you can describe a schema and fields, [data] has types and it’s all pretty fixed. Or you might have something semi-structured like, say, a JSON document that has a nominal schema but nothing’s enforcing it. And you might have something completely unstructured, so like binary or vector data. CrateDB is one place to put all of those things.”
CrateDB can replace multiple database instances thanks to its mutability, plus it’s therefore open-ended with regards new sources of data. “A user can can describe a table schema for what you know about now, and then if you put records in that don’t match that schema, you can have the database auto-extend the schema and start indexing everything immediately,” Simon said.
“Either, store everything and index it, or only index the described part of the schema. Or you can have CrateDB reject records that strictly don’t match the schema like a traditional database would. […] If you want everything to be dynamic, it can be dynamic, but at a table and a field level you can also lock it down or only index parts.”
There are some overheads with indexing, mostly around storage, but in production contexts (like real-time financial transaction checks or monitoring a fast-moving production line, for example), they’re negligible.
“We believe that the benefits are the flexible querying and lower cost of operations. You don’t have to get a DBA to add an index to something after somebody’s slowed your database by running a complex query. [The slowdown] didn’t happen,” he said.
Implementation of CrateDB into existing environments is made simpler as it’s Postgres wire compatible. The chances are existing staff will be able to hit the ground running, and for those who need help, CrateDB works on the familiar, open-source monetised-by-paid-support business model.
You’ll find CrateDB typically installed alongside other database technologies, and often, running either locally or in hybrid topologies, or where any solution has to keep running even if the internet fails or connection to cloud services stutters.
“Would CrateDB unseat an existing system of record database for something? Almost certainly not. […] Would you use it as a companion to something else to be able to do specialised, new and different things? Yes, absolutely.”
You can find out more about CrateDB here, and we recommend looking up Simon Prickett [GitHub] or contacting a member of the advocacy team to discuss the technology in more detail.
There’s a video-based online learning academy, and potential users can (perhaps should) use their own data to run specific tests to see how the platform performs in simulated production environments. There’s also a fully-managed cloud-based offering as an option. The open-source basis of CrateDB and a vibrant community help ensure that the platform will not just continue to be around, but grow in power and capability.
(Imagery by CrateDB)
See also: Linux Foundation Decentralized Trust aims for web3 innovation
Looking to revamp your digital transformation strategy? Learn more about Digital Transformation Week taking place in Amsterdam, California, and London. The comprehensive event is co-located with IoT Tech Expo, AI & Big Data Expo, Cyber Security & Cloud Expo, and other leading events.
Explore other upcoming enterprise technology events and webinars powered by TechForge here.