Google continues its expansion of big data management services by launching Google Cloud Dataproc. The service, now in beta, will serve as a full-scale data management tool that orchestrates data pipelines on Google’s platform.
Dataproc will be able to spin up Hadoop clusters in less than 90 seconds, making this service one of the fastest available. Users will be charged per minute with a 10-minute minimum and 1 cent per virtual CPU/hour in the cluster.
Dataproc uses standard Spark and Hadoop distributions, so it will be compatible with all Hadoop products. For this reason, users should be able to transfer existing projects to the new service easily. Dataproc is also integrated with all of Google’s cloud services.
Since Dataproc is a managed service, some users may not find the amount of creativity or control they need using this service versus managing their own virtual machine. Greg DeMichillie, director of product management for Google Cloud Platform, said he believes users won’t have to make any real tradeoffs when using Dataproc compared to setting up their own infrastructure.