Showing posts with label Bigdata. Show all posts
Showing posts with label Bigdata. Show all posts

Thursday, November 7, 2013

Facebook open-sources Presto (Faster than Hive)

Facebook is open-sourcing Presto, an SQL query engine that it developed in-house to help analysts, data scientists and engineers pick apart the information stored in its enormous data warehouses.
Development for Presto began in the fall of 2012 and was then released to all Facebook employees last spring. The system is now used by over 1,000 employees, running over 30,000 queries that include at least one petabyte of data on a daily basis. Facebook says it’s “ten times better” than alternatives such as Hive and Mad*Reduce in regards to CPU efficiency and latency for the majority of queries submitted by its employees.
“It currently supports a large subset of ANSI SQL, including joins, left/right outer joins, subqueries, and most of the common aggregate and scalar functions, including approximate distinct counts (using HyperLogLog) and approximate percentiles (based on quantile digest),” Martin Traverso, a software engineer at Facebook said.

Thursday, October 17, 2013

Search Engine For Hadoop

Vertascale Announces SimpleSearch(beta), The Search Engine For Hadoop - See more at: http://www.toolsjournal.com/cloud-articles/item/1449-vertascale-simplesearch#sthash.i44tZugb.dpuf




Big Data software developer Vertascale has announced the opening of its private beta for the company’s inaugural product, SimpleSearch. The SimpleSearch software provides powerful real-time query and summary analysis capability for structured and mix-structured data stored in Amazon S3 or the Hadoop File System (HDFS).

SimpleSearch - The Search Engine For Hadoop is aimed at providing benefit to engineers, data scientists and business analysts looking for faster “time to answer” on Big Data, the company said.

While Hadoop has been successful in addressing the problems of Big Data storage and batch processing, Vertascale’s founders recognized early on that in order to democratize access to Big Data, users needed an intuitive ad-hoc query capability. Vertascale is based in Menlo Park, California, USA.

“Today’s problem is less about storing data and more about being able to actually find what you’re looking for in the data. Everyone working with Big Data is challenged by the ‘I don’t know what I don’t know’ problem, and the prohibitively long iteration cycles. SimpleSearch lets you find, explore and export large data sets quickly and easily in a way that’s scalable and cost effective,” said Vertascale CTO Geoffrey Hendrey.

Added Vertascale President James Ladd, “With the rapid adoption of Hadoop, companies that recognize the value inherent in their data are also looking for simplicity and speed in querying Big Data.”
Vertascale will be demonstrating SimpleSearch, The Search Engine For Hadoop, at the Innovators Pavilion P23 at Strata 2013 in Santa Clara, USA.
- See more at: http://www.toolsjournal.com/cloud-articles/item/1449-vertascale-simplesearch#sthash.i44tZugb.dpuf

BIG Data Increasing the speed

Graphics Chips Help Process Big Data Sets in Milliseconds: A new database tool dramatica


Visualizing Big Data in Milliseconds on Cheap Computers

An Overview of MapD (Massively Parallel Database)

MapD is a software system that is designed to run on a hybrid architecture of GPUs and CPUs. Every modern computer has at least one CPU, and increasingly computer CPUs are divided into more than one sub-processing unit, called cores. It is not uncommon, as of early 2013,

Known as MapD, or massively parallel database, the new technology achieves big speed gains by storing the data in the onboard memory of graphics processing units (GPUs) instead of in central processing units (CPUs), as is conventional. Using a single high-performance GPU card can make data processing up to 70 times faster.

Tuesday, October 1, 2013

Big Data

Everyone talking about Big data in the industry, This is one of the major growing industry world wide.I was attended few session about Big data from Microsoft and some other vendor. 
It's really interesting stuff and gives more value to business, and that is the reason am writing to share Knowledge in Big data to all. 

Big Data,
This is nothing but a massive structure and unstructured data can't handle in a normal relational database management and it require to handle 10's and 100's servers. 

Example:
Facebook data, Tweeter data.