Skip to main content

What is Big Data




What is Data?

The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic, optical, or mechanical recording media.

Now, let’s learn Big Data introduction

Big data defined

What exactly is big data?

The definition of big data is data that contains greater variety, arriving in increasing volumes and with more velocity. This is also known as the three Vs.

Put simply, big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before.


The three Vs of big data

VolumeThe amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data. This can be data of unknown value, such as Twitter data feeds, clickstreams on a web page or a mobile app, or sensor-enabled equipment. For some organizations, this might be tens of terabytes of data. For others, it may be hundreds of petabytes.
VelocityVelocity is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being written to disk. Some internet-enabled smart products operate in real time or near real time and will require real-time evaluation and action.
VarietyVariety refers to the many types of data that are available. Traditional data types were structured and fit neatly in a relational database. With the rise of big data, data comes in new unstructured data types. Unstructured and semistructured data types, such as text, audio, and video, require additional preprocessing to derive meaning and support metadata.

The value—and truth—of big data

Two more Vs have emerged over the past few years: value and veracity. Data has intrinsic value. But it’s of no use until that value is discovered. Equally important: How truthful is your data—and how much can you rely on it?

Today, big data has become capital. Think of some of the world’s biggest tech companies. A large part of the value they offer comes from their data, which they’re constantly analyzing to produce more efficiency and develop new products.

Recent technological breakthroughs have exponentially reduced the cost of data storage and compute, making it easier and less expensive to store more data than ever before. With an increased volume of big data now cheaper and more accessible, you can make more accurate and precise business decisions.

Finding value in big data isn’t only about analyzing it (which is a whole other benefit). It’s an entire discovery process that requires insightful analysts, business users, and executives who ask the right questions, recognize patterns, make informed assumptions, and predict behavior.


Comments

Popular posts from this blog

Jenkins

Pre-requisites 1. Install a Webserver https://gitlab.com/Azam-devops/webserver/-/blob/main/README.md Code for index.html https://gitlab.com/Azam-devops/webserver 2. Maven Code https://gitlab.com/Azam-devops/imperial-maven-project 1. Install & configure Jenkins Automation Server on Linux Vm. 2. Go through at some of the important options in Jenkins. 3. Manage Jenkins. 4. Plugins 5. Global Tools Configuration. 6. Credentials 7. Users 8. Slave Nodes 9. Configuring CI pipeline using Gitlab. 10. Configuring standalone CICD pipeline using. 11. Automating the CICD pipeline. 12. Jenkins log 13. Introduction to Jenkins file. 14. Basic groovy syntax & file formation. 15. Launching a Pipeline using Jenkins file. 3. DevOps Architecture Description of above DevOps plan. Create Maven based source code in Gitlab. Create a Jenkins job which will execute below stages. Checkout code from Gitlab Build/compile the source code using Maven as a build tool. scan the code virtually. Test...

Docker In Details

  Course Contents:- 1. Overview of Docker 2. Difference between Virtualization & Containerization 3. Installation & Configuration of Docker Runtime on Linux & Windows 4. Practice on Docker commands 5. launch a Webserver in a container 6. Launch public & official images of application like Jenkins, Nginx, DB etc.. 7. Launch a base OS Container 8. How to save changes inside the container & create a fresh image(commit) 9. How to ship image & container from one hardware to another. 10. How to remove stop/rm multiple container/images 11. Docker Registry 12. Docker Networking       Check current docker network                  Docker Network Bridge                     Docker Network Weaving                  Launch our own Docker Cluster with our defined Network             ...

Ansible

  Ansible is an open-source software provisioning, configuration management, and application-deployment tool. It runs on many Unix-like systems, and can configure both Unix-like systems as well as Microsoft Windows. It includes its own declarative language to describe system configuration. Ansible was written by Michael DeHaan and acquired by Red Hat in 2015. Ansible is agentless, temporarily connecting remotely via SSH or Windows Remote Management (allowing remote PowerShell execution) to do its tasks. Platform support Control machines have to be a Linux/Unix host (for example SUSE Linux Enterprise, Red Hat Enterprise Linux, Debian, CentOS, macOS, BSD, Ubuntu, and Python 2.7 or 3.5 is required. Managed nodes, if they are Unix-like, must have Python 2.4 or later. For managed nodes with Python 2.5 or earlier, the python-simplejson package is also required. Since version 1.7, Ansible can also manage Windows nodes. In this case, native PowerShell remoting supported by the WS-Managemen...

Basic Linux Commands

  Linux Command Cheat Sheet Hello All, Below are the most common commands used in a day to day life of  linux user. if you are new to linux i will recommend you to go through all of the commands.  this commands will help you to troubleshoot linux issues.   Command Description ls Lists all files and directories from present working directory ls-R Lists files in sub-directories ls-a to list down hidden files. ls-al Lists files and directories with complete details like permissions, size, owner cd or cd ~ To go back to home directory cd .. Move one level up cd To change to a particular directory cd / Move to the root directory cat > filename Creates a new file cat filename Displays the content of a file cat file...

Kubernetes-Update

                                                    https://kubernetes.io/ Kubernetes (K8s)  is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes builds upon  15 years of experience of running production workloads at Google , combined with best-of-breed ideas and practices from the community. Latest Verion:-  1.19 Kubernetes Objects Kubernetes defines a set of building blocks ("primitives"), which collectively provide mechanisms that deploy, maintain, and scale applications based on CPU, memory or custom metrics. Kubernetes is loosely coupled and extensible to meet different workloads. This extensibility is provided in large part by the Kubernetes API, which is used by int...