Andy Patrizio, Author at Datamation https://www.datamation.com/author/andy-patrizio/ Emerging Enterprise Tech Analysis and Products Wed, 09 Aug 2023 14:29:03 +0000 en-US hourly 1 https://wordpress.org/?v=6.3 Data Lake vs Data Warehouse https://www.datamation.com/big-data/data-lake-vs-data-warehouse/ Wed, 09 Aug 2023 05:00:00 +0000 http://datamation.com/2017/06/16/data-lake-vs-data-warehouse/

Data lakes and data warehouses are two ways of storing data in large quantities with very different approaches, each with its own strengths and weaknesses. Rather than being mutually exclusive, they’re complementary solutions that can work effectively to provide business intelligence for organizations that implement them wisely. This article compares both storage solutions on their features and use cases to help you better understand the difference.

What is a Data Lake?

A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. Data lakes use a flat architecture to store data so it is completely unstructured, retaining it in the format in which it was originally ingested. 

Each data element in the lake is assigned a unique identifier and a set of extended metadata tags. When queries are run, they can run against the limited, smaller data set with the specific tags rather than having to process all the data stored in the lake.

Data lakes are well-suited to storing unstructured data from disparate sources and in different formats—for example, social media posts, multimedia files, log files, emails, and data from Internet of Things (IoT) devices. These are cost-effective storage solutions for businesses who need to rapidly capture and retain huge amounts of data without needing much transformation.

More complicated than data warehouses, data lakes typically need to be set up by data scientists or engineers with the expertise to interpret and organize raw data before it can be processed. They’re also more flexible than data warehouses—users can more easily add and store more data and configure data models and applications—but they’re also less secure and require more expertise to use. 

Because data stored in lakes tends to be disorganized, over time it can stagnate and lead to “data swamps.”

What is a Data Warehouse?

A data warehouse is a hierarchical repository of structured data integrated from multiple sources and organized for analysis—for example, customer relationship management (CRM) data and financial records. They often use multiple databases for different types of data storage—ingestion, staging, and transformation, for example—and for processing. 

The structured database environment makes them better suited for analytics, business intelligence (BI), and online transaction processing than data lakes, and can be used in conjunction with them to provide the data processing capabilities lacking in data lakes. 

Business or data analysts with some awareness of the functions and outcomes of a specific processed data set can typically set up a data warehouse, while data lakes are far more complicated and require more specialized knowledge. Less flexible than data lakes, data warehouses have a more rigid structure that is difficult to change once it is built. They’re more expensive, but they’re also more secure.

Data Lake vs Data Warehouse

Bottom Line: Data Lake vs. Data Warehouse

While both data lakes and data warehouses are repositories for storing large amounts of data, their differences make them better suited to different use cases. It comes down to what users want out of the data. 

If they know what they are looking for—monthly sales reports or in-store vs. website traffic, for example—then a data warehouse is a better choice. Organizations that want more flexibility to search for more amorphous information—what time of day is web traffic busiest, or how do weather patterns impact sales—then a data lake is a better fit.

Healthcare organizations, educational institutions, and businesses in the transportation industry could benefit from the flexibility to store both structured and unstructured data—all three industries generate massive amounts of raw data used for a wide range of purposes. 

But if the goal is strictly business analysis, a data warehouse is a better choice. Data warehouses designed to process structured data and provide insights and reports that can give organizations a better understanding of their customer base, pricing models, historical sales data, market trends over time, and more. 

Enterprises in the financial or business sectors that use vast volumes of structured data can make it available across the organization rather than limiting it to use by a handful of data scientists, making it much more useful for their needs.

For many enterprises, the choice should not be “data lake or data warehouse,” because the two are complementary. The best approach for some cases is to implement both and use them in tandem. Organizations that are already using a data warehouse might implement a data lake to store new data sources, for example, providing a repository for archival data moved out of the warehouse. 

Read next: What is a Data Lakehouse?

]]>
Top 75 Software-as-a-Service (SaaS) Companies in 2023 https://www.datamation.com/cloud/saas-companies/ Thu, 09 Feb 2023 22:30:00 +0000 http://datamation.com/2020/09/02/top-75-saas-companies-2020/ Software-as-a-service (SaaS) companies are clearly on the upswing, aided by the rapid growth of the larger cloud computing market. Years ago, much of the emphasis in the tech industry was on packaged software. Today, it’s nearly impossible to launch a startup based on packaged software only.

Although some companies are experiencing significant SaaS churn, and some on this list have recently been acquired or absorbed by others, many are transforming their products to meet new customer needs.

Top Software-as-a-Service Companies

The flood of activity among software-as-a-service companies shows no signs of slowing down. New SaaS companies are being launched, and most impressively, old guard software firms are engaging in impressive reinventions as SaaS companies.

Below is a list of 75 SaaS companies organized by the industries they usually fall under:

Cloud SaaS Companies

Cloud technology and software as a service (SaaS) works within enterprises across industries. SaaS can be expensive, but by using cloud technology, the price and efficiency will increase.

Big technology companies such as Salesforce and Microsoft have created their own cloud SaaS, but they are among competitors, including Box, Oracle, and Intuit.

Salesforce

Salesforce is a top SaaS company that launched the concept based on customer relationship management (CRM). It has since expanded into platform development, marketing, machine learning (ML), analytics, and social networking.

The company is considered one of the most innovative cloud software solution providers on the market and generates most of its annual recurring revenue from its cloud SaaS tool set. As of March 2022, Salesforce has a market cap of $208.91 billion.

Microsoft

One of the pioneers in business technology, Microsoft has since become one of the largest SaaS companies in the world after moving its desktop productivity suite Office 365 to the cloud. Now, Office 365 outsells the packaged, client version, and it offers Dynamics CRM, SharePoint collaboration, and even SQL Server databases on demand.

The Microsoft team also heavily invests in customer acquisition, customer onboarding, customer success, and customer retention, which makes it a strong contender for enterprise customers with a wide array of customers.

Intuit

Another software company that made a successful pivot to the cloud, Intuit has converted its flagship finance and tax prep software Quicken, QuickBooks, TurboTax, and Mint to on-demand, cloud-based versions. These solutions now account for three-quarters of all company revenue and are a huge factor for planned revenue growth.

Veeva Systems

Veeva Systems develops several types of cloud-based applications, targeted at cloud-based industries like life sciences. With the help of these apps, the industry can assist with data, customer relationships, and content management. Because they are cloud-based, these apps can be accessed easily in an organization’s environment.

Oracle

Another software giant that turned to SaaS, Oracle has moved all of its line of business on-premises apps to the cloud, including enterprise resource management (ERP), CRM, supply chain management (SCM), human resources (HR), and payroll.

The company also acquired NetSuite in 2016, which sells CRM to small and midsize businesses (SMBs) not normally served by Oracle and Salesforce. As of March 2022, Oracle has a market cap of $218.25 billion.

Cvent

Cvent, a solution acquired by Vista Equity Partners, is a cloud-based event management and planning platform. It offers features that allow planners to manage all aspects of an event, such as online event registration, venue selection, event management, mobile apps for events, email marketing, and web surveys.

Druva

Druva offers cloud-based comprehensive backup, recovery, and archival services for cloud business apps like Office 365, Google Suite, Box, and Salesforce with full data visibility, access, and compliance monitoring.

Box

Box started as a cloud storage firm but has expanded to offer file collaboration and editing services for files stored on its servers.

E-Commerce SaaS Companies

Companies use e-commerce in SaaS to help with their sales, marketing, and payment operations to increase revenues.

Google Workspace

Google Workspace, formerly called G Suite, is mostly a collection of the individual services Google already offers, like Gmail, Storage, and Calendar, but with added features like custom email and 24/7 support. This solution comes at a subscription price over the free basic services offered, but it also comes with the benefits of Google Cloud and its large collection of enterprise solutions.

Zendesk

Zendesk offers cloud-based customer service and customer support solutions, with features like support tracking, purchase tracking, billing, shipping, and other customer data. As of March 2022, Zendesk has a market cap of $14.68 billion.

RingCentral

RingCentral provides business communications solutions to various organizations. Its product portfolio includes messaging and video options that allow organizations to communicate internally. Its premium product, RingCentral Office, provides multiple utilities, including conferencing and messaging. As of March 2022, RingCentral has a market cap of $11.57 billion.

GoToConnect

GoToConnect, formerly LogMeln, allows various individuals to connect and develop business relationships. It provides cloud-based communication services, including conferencing. It also offers options for customer engagement and support. This platform is widely used by businesses for remote management and support.

Avangate

Avangate, owned by VeriFone, is an e-commerce platform for software sales. It provides software registration services and also functions as a software reseller. It supports multi-language ordering along with multiple options for payment to buyers. Avangate also manages subscription billing for direct sales and channel sales.

Teem

Teem, now by iOFFICE and SpaceIQ, is a platform for booking rooms or venues and for conference scheduling. It provides a snapshot of the real-time availability of rooms. Based on the availability, a meeting or conference can be scheduled and the room booked. It also manages visits for external people attending conferences or meetings.

Pipeliner CRM

Pipeliner CRM, as the name suggests, offers CRM solutions for sales enhancement. It automates the sales process with a host of apps available in its app marketplace. Some of the key offerings are contact and account management, email integration, and even offline apps. It can automate the sales process, which helps in achieving stretch targets.

Meistertask

Meistertask is an intuitive platform for online task management. Its web-based solution is easily accessible from mobile platforms, such as iPads and iPhones. It features kanban-style project boards that can be used for flexible task management.

Travelperk

Travelperk provides several types of travel management services, including expense management. It automates spending limits and ensures compliance with organizational policies. It’s a one-stop solution for business and leisure travel.

Practical Ignition

Practical Ignition provides client management software with multiple features and benefits. It automates recurring billing on a specified date and in a particular payment channel. It also performs proposal management, including the creation of proposals themselves.

This platform offers multiple tools for client engagement and for facilitating better customer management while showcasing a business dashboard. These features particularly help to enhance forecasting.

Anaplan

Anaplan is a planning and performance management platform used in a variety of departments for business planning practices and SaaS metrics. It uses a variety of databases to generate models based on business rules, which can be changed for instant adjustments. There are plans for Anaplan to be acquired by Thoma Bravo.

Domo

Domo delivers a SaaS-based platform that helps CEOs and business leaders obtain business intelligence (BI) from business data without requiring the executives to know BI skills.

DocuSign

A SaaS company based in San Francisco, DocuSign started out as a platform for electronic signatures for legal documents. It has since expanded to help SMBs collect information, automate data workflows, and sign on various devices.

Slack

Slack is one of the most popular enterprise collaboration platforms, and it has significantly expanded its capabilities after being acquired by Salesforce. Based in San Francisco, this SaaS company offers messaging, archiving, and search for modern teams.

Twilio

Twilio is a cloud communication SaaS company that enables customers to use standard web languages to build a variety of telephony apps supporting voice, voice over Internet Protocol (VoIP), IP to traditional telcos, and SMS apps. Developers can embed voice, video, messaging, and authentication into their apps using the Twilio platform.

GitHub

A popular online software development tracking and version control repository, GitHub is particularly popular with open-source projects. It allows for full project management, including version control and splits/forks management.

GoToMeeting

A Citrix Systems spin-off, GoToMeeting is a popular online meeting program that supports secure connections over any device.

Cisco

Cisco offers a SaaS solution portfolio that keeps network security and business needs at the forefront. Among its offerings is WebEx, a professional video conferencing service, and Spark, a collaboration service for teams to work together on projects. The two are often pitched in tandem.

Human Resources SaaS Companies

Despite the large number of HR SaaS companies, only some will offer a comprehensive suite of services for employers to manage their workforce, while others specialize in various aspects of HR.

Personio

With its simplified HR solutions, Personio eases human resources management, from hiring to full and final settlement. Its four key offerings are Recruit, Manage, Develop, and Pay.

ADP

One of the world’s best-known payroll management brands has gone to the cloud to offer human capital management support, which involves HR, payroll, and employee benefits.

Clearlake Capital

Clearlake Capital is a top SaaS company, particularly due to its cloud-based talent management software solutions and its acquisition of Cornerstone. These solutions go beyond the basics of HR applications for things like recruitment, training, succession management, and career guidance.

Workday

Workday is a SaaS company that provides financial management and HR management to enterprise customers, with emphasis on complex, global industries as well as government.

ServiceNow

ServiceNow specializes in IT services management (ITSM), IT operations management (ITOM), and IT business management (ITBM). It offers real-time communication, collaboration, and resource sharing and primarily covers IT, HR, security, customer service, software development, facilities, field service, marketing, finance, and legal enterprise needs.

Workable

Based in Boston, Workable is a SaaS HR platform that helps a company hire new employees. Using security methods, applications, and assisted onboarding, it creates a great HR experience with less work from an HR team.

Namely

Namely, based in New York, is a human resources management systems (HRMS) SaaS company. Using HR software, Namely allows a company to personalize its platform as needed by businesses.

Data Analytics SaaS Companies

Companies are using SaaS data analytics companies to track their data and make smart decisions about their operations and business goals

Apptio

Apptio, another company acquired by Vista Equity Partners, is a provider of business management solutions for CIOs to better manage the business of IT. Its suite of applications uses analytics to provide information and insight about technology cost, value, and quality for making faster, data-driven decisions.

GoodData

GoodData provides a business analytics platform for enterprises to create smart business applications using existing data to automate, recommend, and make better business decisions.

Tableau

Tableau is a giant in its own right, and after being acquired by Salesforce, it has exponentially grown its reach and use cases. Tableau Online is the SaaS version of the company’s popular interactive data visualization and data analytics products focused on business intelligence.

New Relic

New Relic is a leading digital intelligence company, delivering visibility and analytics into website application and mobile app performance as well as real-time monitoring.

NapoleonCat

NapoleonCat is an analytics platform within the SaaS industry. It aims to help moderate, publish, analyze, and grow social media management for its customers.

Financial SaaS Companies

Financial SaaS solutions are being used by companies to help with their employee expenses, accounting, and payroll as well as invoicing and payments.

FreshBooks

FreshBooks is a cloud-based accounting SaaS product designed for sole proprietors and small business owners to bill clients for time and services as well as to track time spent with the client.

Paychex

Despite its older age on this list, Paychex has managed to avoid a lot of customer churn by leaning on a SaaS business model. Along with its subsidiaries, Paychex is a finance and payroll vertical SaaS company that provides payroll, human resources, and benefits outsourcing solutions for SMBs. It launched SaaS services for payroll, time and attendance, training, HR, and benefits in 2013.

Xero

Xero provides cloud accounting software for accounting professionals and small businesses. Its key features are automatic bank and credit card account feeds, invoicing, accounts payable, expense claims, fixed asset depreciation, purchase orders, and standard business and management reporting.

2Checkout

An online payment processing platform acquired by VeriFone, 2Checkout helps organizations receive global payments as much as they do local transactions, enabling businesses to expand and operate globally.

It also provides global tax management and compliance services, including professional management of VAT (value-added tax), sales tax, and remittances that are applicable globally. It features subscription billing to manage customer subscriptions, along with sending timely reminders for renewal.

Spendesk

Managing employee expenses and reimbursements can prove challenging. Spendesk manages employee end-to-end expense management with a SaaS approach. This includes approvals, limit automation on cards, automated accounting, and invoice management. Spendesk also helps to develop intuitive and interactive reports on expenses and reimbursement.

Bill.com

Bill.com provides users with tools to automate business bill payment and invoicing processes as well as cash flow management. These solutions integrate with accounting and banking systems, including QuickBooks, Xero, NetSuite, and Intacct.

Vista Equity Partners

Vista Equity Partners, which acquired former SaaS player, Xactly, is a SaaS company that offers a suite of products designed around sales and finance management to design, build, manage, audit, and optimize sales compensation management programs. It measures sales performance and effectiveness as well as employee engagement.

Zuora

Zuora is a SaaS company that serves customers that rely on a subscription-based business model to automate billing, commerce, financial operations, subscription payments tracking, invoicing, products, and catalogs.

Coupa Software

Coupa is a cloud platform that offers a strong SaaS model for business spend expenses tracking. It offers a fully unified suite of financial applications for business spend management, like procurement, invoicing, expenses, and sourcing.

Marketing SaaS Companies

SaaS is used in marketing to help companies run email and ad campaigns as well as manage customer relationships and communications.

Skyword

Skyword, previously known as Trackmaven, is a cloud-based marketing analytics solution that helps organizations design well-informed and data-based marketing campaigns. It uses the power of big data collected from multiple channels to develop powerful marketing insights.

Mobiniti

As text marketing becomes increasingly important in digital marketing, Mobiniti, acquired by PennSpring Capital, provides user-friendly, customized solutions to marketing agencies and resellers. Mobiniti also provides email marketing solutions customized for various industry segments that can be easily integrated with the existing applications and CRM tools in order to better promote customer engagement.

Intercom

Intercom is an intuitive communication platform that facilitates an organization’s interaction with potential and existing customers. It has multiple messaging apps for sales and marketing campaigns as well as others that support clients. Intercom aims to make communication easy through its cloud-based solutions.

Mailchimp

Mailchimp is a cloud-based email marketing solution that was acquired by Intuit in 2021. Widely known and used, it offers end-to-end solutions, from creating and managing the mailing list to automating campaigns.

Mailchimp can integrate contacts from an existing CRM tool and create a fresh mailing list. It also provides various analytical services through integration with Google Analytics.

Aurea

Aurea, formerly known as XANT and InsideSales.com, offers a sales acceleration platform built on a predictive and prescriptive self-learning engine, designed to help with a sale from first engagement until close. Its machine learning feature predicts and prescribes optimized sales activities, enhances performance and motivation, and increases live high-quality conversations.

Act-On Software

Act-On Software is a popular adaptive marketing automation provider for SMBs, providing email marketing, landing sites, and social media prospecting.

Marketo

Marketo, a subsidiary of Adobe, develops marketing automation software that provides inbound marketing, social marketing, CRM, and other related services.

NextRoll

NextRoll, previously known as AdRoll, is one of the top SaaS companies for marketing, with a marketing platform that enables companies of any size to create personalized ad campaigns based on their own website data.

HubSpot

HubSpot develops cloud-based, inbound marketing software that provides businesses with tools for social media marketing, content management, web analytics, customer service, customer support, customer experience, and search engine optimization (SEO).

Software Development SaaS Companies

Companies use collaborative SaaS tools to help manage, iterate, and deploy applications compatible with their environments.

Adobe Creative Cloud

Another reinvention based in San Jose, Adobe was the king of desktop creativity software and has now pivoted to make Photoshop and other audio and video editing tools available via an annual cloud-based SaaS business subscription. The Creative Suite offers graphic design, video editing, web development, and photo editing.

Bitnami

With a wide range of services, Bitnami, acquired by VMware, enables faster and easier deployment of software on a selected platform, from native and on-premises to cloud deployment. It provides packaged applications that can be run on any preferred platform while offering a broad catalog of applications ready to run on the servers.

Blackbaud

Blackbaud is a provider of software and services designed specifically to help nonprofit organizations more efficiently operate and engage in things like fundraising, building relationships, marketing, advocacy, and web management.

Splunk

Splunk provides operational intelligence software that monitors, reports, and analyzes real-time machine data, including logs and big data sources, for operational intelligence. There is also talk of Splunk being acquired by fellow SaaS giant Cisco.

Amazon Web Services (AWS) SaaS

Amazon’s SaaS offerings cover both a platform for companies needing to build their own SaaS application as well as reselling a number of SaaS platform solutions from third-party vendors, many of which are on this list.

MuleSoft

MuleSoft, headquartered in San Francisco, is another SaaS company that was acquired by Salesforce in the last few years. MuleSoft provides a platform for application building.

Through its directory that functions as a social network, customers can share updates and information on application programming interfaces (APIs). It also provides integration for applications, data, and devices that are both on-premises and in the cloud.

Atlassian

Atlassian is an enterprise software company that develops products for software developers, project managers, and content management. Its flagship product is Jira, an issue-tracking and project management product.

Cybersecurity SaaS Companies

Cybersecurity companies use SaaS models to help customers identify, predict, and respond to cyberthreats on their networks.

Okta

Okta offers identity and access management solutions that help organizations manage and secure user authentication in modern applications. It also helps developers identify controls to make an application more secure. These controls can be built into web-based apps as well as devices. As of March 2022, Okta has a market capitalization of $22.64 billion.

Proofpoint

Proofpoint, acquired by Thoma Bravo, provides various cybersecurity solutions to protect organizations from cyberattacks, including coverage for campaigns and data. Some of the innovative offerings are prevention of outbound data loss, inbound email and mobile device security, electronic discovery, email encryption, and archiving.

KnowBe4

KnowBe4 provides security awareness training related to various classes of security issues. These include problems arising from phishing, ransomware attacks, and social engineering. The service offers enterprise-class training on risk management and compliance (RMAC). Overall, KnowB4 is a one-stop solution for all types of security training.

Rapid7

Rapid7’s IT security solutions collect data across a company’s users, assets, services, and networks. The tool is accessible on-premises, through a mobile device, and through the cloud, making instantaneous security decisions possible. As of March 2022, Rapid7 has a market capitalization of $6.39 billion.

Veracode

Veracode, acquired by Thoma Bravo, provides cloud-based application intelligence and security verification services for internally developed and/or purchased software applications as well as third-party components. It conducts testing and uses machine learning to identify and eradicate vulnerabilities.

Education SaaS Companies

Companies are using education SaaS tools to help students and teachers by offering assessments, course management, and training.

CodeSignal

CodeSignal, a subsidiary of the American company BrainFlights, performs skills-based assessments. It helps clients by recruiting the best technical talent, then running a tailor-made test according to the recruitment requirements. CodeSignal facilitates certifications, which are the standardized tests specified by the clients.

ServiceMax

ServiceMax, acquired by GE, is built on Salesforce’s Force.com platform and serves field service technicians through cloud and mobile software. This software solution offers workforce optimization, advanced scheduling and dispatch, parts logistics, inventory and depot repair, and installed base entitlement.

SurveyMonkey

SurveyMonkey, now represented by Momentive, offers a cloud-based online survey and questionnaire platform for companies to gather survey-related information.

Blackboard

Blackboard Learn, now owned by Anthology, is a SaaS company that provides a virtual learning environment and course management system for online schooling. It offers online elements to classes usually available only in the classroom.

Eloomi

Eloomi is a software platform that can upgrade personnel skill sets within any organization and improve their workplace performance. By increasing employees’ skills, the platform aims to grow their output in areas such as compliance and strategy.

Grammarly

Grammarly, based in San Francisco, is a software platform to help with communication and grammar use. Used by students and professionals alike, Grammarly analyzes the text submitted into it to help clarify what customers mean.

Kahoot!

Kahoot! Is an assessment tool used in classrooms and educational platforms. The data is presented during the assessment and reports are generated for teachers and hosts in their server.

Bottom Line: Top SaaS Companies

Although some companies are experiencing significant SaaS churn, and some on this list have recently been acquired or absorbed by others, many are transforming their products to meet new customer needs.

Whether your company needs cloud, e-commerce, HR, data analytics, cybersecurity, or education SaaS tools, this list has some of the top SaaS companies to date.

]]>
Best Data Modeling Tools & Software https://www.datamation.com/applications/top-data-modeling-tools/ Thu, 26 May 2022 18:00:15 +0000 https://www.datamation.com/?p=21171 What Is Data Modeling?

Data modeling is the process of applying structures and methodologies to the data in order to convert it in a useful form for analysis and gaining insight. By preparing a model of the data involved in an information system, you optimize the database design and gain understanding of the data flow within the information system. 

The data modeling process converts complex software design into a simple, easy-to-understand diagram of the data flow. Data modeling tools then help to create a database structure from these diagrams. In this article, we will be covering the following data modeling tools:

  1. Erwin Data Modeler
  2. Apache Spark
  3. RapidMiner
  4. SAP PowerDesigner
  5. Edraw Max’s Database Model Diagram
  6. Oracle SQL Developer Data Modeler
  7. SQLDBM
  8. MySQL Workbench
  9. Enterprise Architect
  10. IBM Infosphere Data Architect

What Is Data Modeling Software? 

Data modeling is an important step before developing a database for an application. It’s rather hard to develop a database if you haven’t worked out its underlying structure first. A good data model is an abstract model of specifics in the database, such as how the data is captured, how the data flows within the system, how is data entered in individual tables, and what checks and constraints apply to the data before storing them in the databases.

Setting up the database can be tedious and a kind of scutwork programmers, database administrators, and data scientists don’t want to bother with. To make this tedious job simpler, or at least tolerable, data modeling tools are available for beginners and experts alike. Below are 10 of the most popular.

How to Select a Data Modeling Tool

  1. Usage and Needs: This is your first deciding factor. How are you going to use it, how extensively, and for what specific purpose? Not all data modeling tools are created equally and some have different areas of focus. Know your business requirements first to pick the right tool.
  2. Scalability: Your project might start small but grow in requirements. At that point you don’t want to be hamstrung by an incapable modeling tool. Give yourself a little room to grow.
  3. Features: Once you have a grasp of your business requirements, the next step is to determine the right tool. Not all tools are created equal, some are very niche or specific. Shop around.
  4. Integration: Be advised that some tools create a data model in a proprietary format and some use common or open file format, which is easily read into other tools. Make sure your modeling tool plays nice with your database.
  5. The User Community: Most every business tool has a user community which you can turn to for help. See what kind of community is around your tools of interest.

10 Data Modeling Tools

This is not a complete list of tools, but we did try to include the most popular and widely used. The list is in no particular order.

  1. Erwin Data Modelererwin data modeling software

This data modeling tool is known for being powerful but less expensive tool for business than others and also complies with the governance database rules. It is considered one of the best data modeling tools, because it includes automated schema generation, cloud-based data solution and power to create hybrid architecture. 

 

  1. Apache Spark

If it’s Apache, you know it works well with other Apache products, like Hadoop. Good at handling large databases and parallel tasks. 

  1. RapidMiner

RapidMiner is ideal for those who have never used any data modeling tool before, because it is easy to use. It can easily be incorporated through any data source types, including Access, Teradata, Excel, Oracle, Microsoft SQL, Ingres, IBM SPSS, IBM DB2, MySQL. It uses visualization of pipelines to create analytics based on the settings of real-life data. 

  1. SAP PowerDesignerSAP Logo

SAP Powerdesigner is capable of capturing, analyzing and presentation of business data. It follows industry best practices to give comprehensive coverage on the metadata storage and get an understanding of the input data. Supports a very wide range of databases.

  1. Edraw Max’s Database Model Diagram

Edraw Max’s Database Model Diagram is among the best of the free and open source tools to create a database model diagram. It uses a drag-and-drop interface to rapidly build tables and easily redesign them.

  1. Oracle SQL Developer Data Modeleroracle database

Made for the Oracle environment, Oracle SQL Developer Data Modeler is a data modeling tool which also supports physical database design. It covers capturing data, exploring data, managing data and getting insights from the data.

  1. SQLDBM

Ideal for beginners, this design tool is used to design SQL databases without having to write a single line of code. It allows you to manage large and small databases and data models seamlessly, import existing database schema and creating a physical model or ERD of your database.

  1. MySQL Workbench

Designed specifically for the MySQL database, the MySQL Workbench is a unified data modeling tool for database architects, developers and database admins. It provides tools for configuration, visual database design, administration, backup and deployment. 

  1. Enterprise Architect

Enterprise Architect is ideal tool for both entry level modelers and advanced alike. It comes with a lot of functionalities and strategies for analyzing, visualizing, testing and maintaining all the data in any enterprise landscape. It uses diagram-based modeling, can pull in data from various domains and locations throughout the enterprise to create a single, unified version of the model.

IBM logo.

Infosphere Data Architect is a data modeling tool from IBM built on the Eclipse Integrated Development Environment. InfoSphere is known for its ability to discover patterns within the data, model the data, find relations and also standardize the interfaces between various applications, servers and existing databases.

]]>
The IoT Cloud Market https://www.datamation.com/cloud/iot-cloud-market/ Fri, 02 Jul 2021 20:20:30 +0000 https://www.datamation.com/?p=21363 Cloud computing and the Internet of Things (IoT) have become inseparable when one or the other is discussed and with good reason: You really can’t have IoT without the cloud. The cloud, a grander idea that stands on its own, is nonetheless integral to the IoT platform’s success. 

The Internet of Things is a system of unrelated computing devices, mechanical and digital machines, objects, and other devices provided with unique identifiers (an IP address) and the ability to transfer data over a network without requiring human-to-human or human-to-computer interaction.

Whereas the traditional internet consists of clients – PCs, tablets, and smartphones, primarily – the Internet of Things could be cars, street signs, refrigerators, or watches. Whereas traditional Internet input and interaction relies on human input, IoT is almost totally automated.

Because the bulk of IoT devices are not in traditional data centers and almost all are connected wirelessly, they are reliant on the cloud for connectivity. For example, connected cars that send up terabytes of telemetry aren’t always going to be near a data center to transmit their data, so they need cloud connectivity.

The Role of Cloud Computing in IoT

IoT has been embraced by many industries around the world, including precision agriculture, health care, energy, transportation, building management, and so forth. Regardless of industry, IoT generates a huge amount of data, which needs to be processed, and most IoT devices do not have the compute or storage capacity to do it locally.

Therefore, data must be sent up the chain to a data center for processing. Managing the flow and storage of this data is a consuming task for enterprises. At best, the IoT device can decide what should be processed and what can be discarded, but that is still a large amount of data movement to data centers for processing.

Connectivity of the devices in and of itself does not provide the benefits to people. It’s what those devices enable that makes IoT valuable. Again, many of these devices are compute-constrained, so connectivity to cloud services helps the devices to provide its valuable proposition. 

Therefore, the cloud provides:

Services: IoT depends on the cloud for its services, since most IoT devices cannot provide them on their own.

Scalability: Some day, every car will be a smart car, generating gigabytes if not terabytes of data. That data has to go somewhere to be processed, and one central data center would be overwhelmed. So smaller, redundant IoT edge data centers scattered around the city would allow for scale-out capacity.

Increased performance: The large amounts of data produced by IoT devices need extreme performance to interact and connect with one another. IoT in the cloud provides the connectivity that is necessary to share information between the devices and make meaning from it at a fast pace.

Types of IoT

There are three major types of IoT: industrial, commercial, and home. Let’s break them down:

Industrial Internet of Things: Called IIoT for short, industrial Internet of Things covers things like connected factory equipment. IIoT devices are primarily sensors used to monitor equipment in case of malfunction but can also include everything from remote monitoring of computer-chipped livestock on a commercial farm to a commercial delivery truck.

The idea behind IIoT is to provide much more useful information than a flashing red light. IIoT sensor data is used to provide actionable insights into physical events and the environment with specificity. It could warn of hardware failure, power fluctuations, heat build up and other points of failure. While most IoT applications operate in the public cloud, most IIoT systems operate mainly in private clouds.

Commercial IoT: Commercial IoT, sometimes called consumer IoT, is a term applied to IoT for business enablement. Think of it as Bluetooth and RFID all grown up. Sensors, micro-controllers, actuator devices, and other systems allow for business-oriented use like intelligent asset tracking, smart offices and buildings, connected lighting, sensing and monitoring of all types, and location services.

Home IoT: Also called smart home IoT, it covers a range of smart devices that you can control remotely for automated house maintenance. It’s one thing to program the security alarm to arm and disarm a certain hours, but with the smart home, you can program them with your smartphone, including locked doors you left open. Or have your refrigerator send you an alert if food is running low, or perhaps turn on the air conditioner when the home detects you have left work and are headed home.

See more: Google Cloud Launches Unified Data Platform with Analytics Hub, Dataplex and Datastream

Major IoT Cloud Providers

AWS IoT

Amazon Web Services (AWS) has a broad array of IoT offerings covering industrial, commercial, and home uses. These services include:

  • Amazon FreeRTOS: which helps the user to program, deploy, secure, connect, and manage microcontrollers for edge devices. 
  • AWS IoT Device Management: a service that organizes, monitors and remotely manages IoT devices at scale.
  • AWS IoT Device Defender: a managed service that helps the user secure their IoT devices by auditing the user’s devices to make sure they are in compliance with security practices.
  • AWS IoT Analytics: a fully managed service that creates simple-to-run analytics on large volumes of IoT data.
  • AWS IoT 1-Click: a service which triggers AWS Lambda functions that execute a particular function or action. 
  • AWS Greengrass: performs functions such making sure connected devices will run AWS Lambda functions, keep device data in sync, and communicate with different devices firmly. 

Microsoft Azure IoT

Microsoft also covers all three areas of IoT, with emphasis on supporting its infrastructure. There is strong support from Microsoft tools and on-premises connections. Services include:

  • Azure IoT Central: The main foundation for building IoT solutions
  • Azure IoT solution accelerators: Custom solution templates for common IoT uses, like remote monitoring, industrial IoT (IIoT), predictive maintenance, and device simulation. 
  • Azure IoT Edge: Provides connectivity from the central hub to your edge devices
  • Azure IoT Hub: Connect, monitor, and control all of your IoT assets
  • Azure Digital Twins: Create a digital model of your physical space or assets.
  • Azure Time Series Insights: Explore and gain insights from time-series IoT data in real-time.
  • Azure Sphere: Build and connect highly secure MCU-powered devices
  • Azure RTOS: Making embedded IoT development and connectivity easy.
  • Azure SQL Edge: Consume services privately on Azure Platform.

Google Cloud IoT

Google Cloud IoT leverages Google Cloud’s areas of emphasis, such as big data, analytics, and machine learning (ML). Its core services focus on commercial IoT, but Google does play in industrial and home IoT as well. Key IoT services include:

  • Tools for building IoT applications: ranges from data ingestion to intelligence using Google’s IoT building blocks.
  • Predictive maintenance: Lets you automatically predict when equipment will need maintenance and optimize its performance in real-time, while predicting downtime, detecting anomalies, and tracking device status, state, and location.
  • Real-time asset tracking: Track valuable assets in real-time and perform complex analytics and machine learning on the data you collect to deliver actionable business insights.
  • Logistics and supply chain management: Perform fleet management, inventory tracking, cargo integrity monitoring, and other business-critical functions with Google Cloud IoT’s logistics solution.
  • Smart cities and buildings: Bring new levels of intelligence and automation to entire homes, buildings, or cities by building a comprehensive solution that spans billions of sensors and edge devices.

IBM Watson IoT

IBM is fully invested in its Watson AI initiative and its IoT offering is no different. IBM Watson IoT is heavily focused around predictive analytics and all the related subsections, like supply chain management, enterprise asset management, real estate and facilities management, and climate and weather technology. IBM Watson for IoT includes:

  • Watson IoT Platform starter: a starter platform for building the basics of IoT. Not available for production environments.
  • Platform Service: the basic platform service that is used as the IoT device message broker for secure device registration, real-time analytics, and more. 
  • Analytics Service: add-on component for analytics features
  • IoT Registration Service: a service to maintain a registry of IoT appliances and consumers. 
  • Cloudant NoSQL DB: a managed NoSQL database service that captures device measurements and events and stores them for use by real-time applications. 
  • Db2 Warehouse on Cloud: the Watson IoT Platform data lake.
  • IBM Cloud Object Storage: unstructured cloud data storage for long-term storage. 
  • IBM Cloud App ID: authentication for mobile and web apps that connect to your Watson IoT Platform. 
  • IBM Secure Gateway for IBM Cloud: a secure way to access your on-premises or cloud data from your IBM Cloud application by using a secure passage. 

Alibaba IoT

The Alibaba Group is frequently called “The Chinese AWS” for a similar structure and working methods, and its cloud service is no different. The Alibaba IoT platform provides several services, along with flexible discount plans for new customers, covering up all the key cloud services, such as hosting, object storage, elastic computing, a relational database (SQL), big data (Hadoop), artificial intelligence, machine learning and NoSQL database. 

It is the largest cloud vendor in China but has expanded to serve international customers as well. However, the common view is that Alibaba’s foreign offerings aren’t as robust as what it has in China. But it does comply with international regulations, including PCI DSS and HIPAA in the U.S., Germany’s C5 and the European Union’s General Data Protection Regulation (GDPR).

Alibaba’s IoT Platform breaks down into four major categories:

  1. Device connection: Allows IoT platform to connect a large number of devices to the cloud. This includes device development, access via NB-IoT, cellular (2G, 3G, 4G, and 5G), Wi-Fi, and LoRaWAN devices, SDKs that support various protocols, such as MQTT, CoAP, HTTP, and HTTPS and supports various open source programming languages.
  2. Device management: For managing the life cycle of devices: device registration, feature definition, data parsing, online debugging, remote configuration, OTA update, real-time monitoring, device grouping, and device deletion.
  3. Security: The IoT platform provides multiple protections to effectively secure devices and data in the cloud, such as identity authentication for chip-level secure storage solution and a device key management mechanism to prevent device keys from being cracked. Provides unique-certificate-per-device authentication mechanism as well as unique-certificate-per-product authentication mechanism. Also supports MQTT or HTTPS-based TLS encryption and CoAP-based DTLS encryption to ensure data confidentiality and integrity. 
  4. Rules engine: Offers server-side subscription to messages of one or more types from all devices of a specific product and data forwarding to forward the specified fields of topic messages to a destination based on a data forwarding rule for storage and computing.

Unlike its competitors, Alibaba also makes IoT device chips and an operating system, ALiOS, but for the Chinese market.

Oracle IoT Cloud

Oracle is fully invested in the industrial and commercial sides of IoT, oriented toward manufacturing and logistics operations and supporting integration with Oracle and non-Oracle applications and IoT devices using the REST API.

Oracle is focused on helping companies bring their products to market as soon as possible, using high-speed messaging and endpoint management, extending and improving your supply chain, customer experience apps and operational efficiency, all apps Oracle sells. The Oracle IoT cloud platform offers real-time analysis research and integrates acquired data with the company’s apps or web services. 

Oracle is focused on four areas of IoT intelligent applications:

  1. Asset monitoring: Effectively manage enterprise assets and reduce overall maintenance costs. Track the real-time location, health, and utilization of your assets. View your assets and analytics on the dashboard, and automate actions based on predictive insights from your business applications.
  2. Production monitoring: Effectively meet your product deliveries. Understand the real-time status of your factories and machines, and diagnose performance issues. Apply predictive analytics to the health of your factory, products, and machines to get recommendations and take action.
  3. Fleet monitoring: Monitor your vehicles, drivers, and trips in real-time. View your vehicles’ location, costs of operation, usage, and driving behavior history.
  4. Connected worker: Improve the safety and health of your workers. Comply with safety regulations. Gain real-time visibility into your workers health, location, and work environment.

See more: Top Cloud Security Companies & Solutions

]]>
The Data Capture Market https://www.datamation.com/big-data/data-capture/ Sat, 05 Jun 2021 03:33:56 +0000 https://www.datamation.com/?p=21311 Data capture is the process of collecting, ingesting, or otherwise acquiring structured and unstructured data and either converting it into a data format usable by a computer or merely storing it, for the purpose of using that data to gain some form of insight. 

Data capture covers both physical data sources – paper documents of all kinds, primarily – and digital sources. Optical character recognition has been around for decades but the technology has advanced, as has the technology for storing and analyzing the content. Automated physical data capture methods include but are not limited to documents, mail, fax, and receipts, where it is converted into a readable digital format 

Digital sources range from database applications, and applications in general, to data streams like RSS feeds, to sensors and other input devices. The data is then either processed for storage (in a data warehouse) or simply stored for later use (in a data lake).

The Data Capture Process

Data capture is the process that allows you to collect information, either manually on your part, manually on the part of third party, or automatically. 

Manually, on your part: scanning documents, reading in data files to save in a database.

Manually, on someone else’s part: an e-commerce customer filling in their relevant information (name, address, etc.)

Automatic: Logging customer sales, logging Website visits, storing security video, taking in sensor data.

Automatic data capture is clearly the preferred method for multiple reasons. By utilizing an enterprise information platform like a database with data capture features, businesses can:

  • Greatly increase data volume then doing it manually
  • Reduce costs
  • Accelerate processes
  • Eliminate input errors from tedium
  • Maintain and support a single system
  • Set rules and policies for what is to be ingested and what not to
  • Set rules and policies for who can access what data

As a general rule of thumb, a business could spend $1 to prevent errors in data capture. Correcting that error could cost up to $10, in comparison. Not catching that error could result in up to $100 in lost revenue.

See more: What is Data Scraping?

Data Capture Methods

You need to use a variety of data capture methods to handle digital and physical data. Scanning a document is different from creating a PDF or filling out a Web form.

Hence, there are many data capture methods, including:

  • Manual data capture
  • Automated data capture
  • OCR (Optical Character Recognition)
  • ICR (Intelligent Character Recognition)
  • Barcode/ QR Сode Recognition
  • Voice Capture
  • IDR (Intelligent Document Recognition)
  • Digital forms (Both Web and App)
  • Digital signatures
  • Image & video capture
  • Paperless Forms
  • Double Blind Data Entry
  • Smart Cards
  • Magnetic Stripe Cards

See more: Structured vs. Unstructured Data

Benefits of Automated Capture

  • Reduces the amount of manual date entry required.
  • Reduces costs and speeds the entry of content into the designated business and organizational processes
  • Improve accuracy by avoiding mistyping and missed data fields
  • Greatly increases the rate of data entry
  • Automates the process of delivering data to the destination or target
  • Enhances productivity all around
  • Checks for data accuracy
  • Enhanced visibility by offering the same input resources to all staff

Different Types of Data Capture

The term “data capture” is an umbrella term for a wide range of data capture processes.

Here are a few examples of different types of data capture:

Change Data Capture

Many businesses still rely on batch processing, which runs data integration jobs at regular intervals but not in constant real time. So what happens if the data set changes between now and the last update? That’s where you would use change data capture (CDC). 

Change data capture comprises the processes and techniques that detect the changes made to a source table or source database, usually in real-time. The changed entries are then moved to a target location, usually a second location than the primary data store.

There are two main ways of performing change data capture: log-based CDC and trigger-based CDC. In log-based CDC, the CDC solution looks at a database’s transaction log. Log-based CDC is designed to help databases recover from failure with low latency, but some databases use very complex logs, making log-based CDC difficult, and each database has its own proprietary log file format, which makes it harder to build a robust, generic solution.

In trigger-based CDC, the CDC solution uses database triggers, which are functions that run when another event occurs like entering new data or performing a table update. Database triggers decrease the overhead for extracting changes when doing CDC, but they also add overhead to the source system because they need to run every time the database updates.

Declared Data Capture

Declared data is information that is freely and actively given to your company from your customers. This includes the obvious facts, like customer information, mailing address, and credit card, but also their motivations, intentions, interests, and preferences. It is also known as first-party data because it comes directly from the source, the customer. That’s the strength of declared data: the customer gives it to you willingly.

The benefits are knowledge and context. Your customer is telling you about themselves and it enables more direct contact and marketing. It means providing a more personalized experience because declared data removes the guesswork. You know what your customers want because they told you so. 

Intelligent Data Capture

Intelligent capture is the process of identifying and extracting critical information from incoming paper and electronic documents without extensive input from a user. When used in conjunction with content management or business process automation software, an organization can use the extracted data for digital routing and delivery of relevant documents.

Invoice Data Capture

Invoice data capture is the process of entering of invoice details into an accounting system. Paper trails are important in finance but for any large company, dealing with paper records would be a logistical headache. Digital invoice entry allows for easy routing and storage of invoicing without requiring any paper.

Data Storage, Warehouse vs. Lake

When it comes to mass ingestion of data, you have two non-RDBMs ways of storing it, in a data warehouse or a data lake. Both are helpful for storing data for later processing to gain business insight but they operate very differently.

Data warehouses – around for decades – perform what is known as schema on write. This is where the data is processed for organized, structured storage. Errors are fixed, duplicates are removed, and so on. When the data is called on later, it can be processed right away because it was prepared on storage.

Data lakes – a concept barely a decade old – do schema on read. It just stores everything, in a variety of formats, even plain text. The data is processed as it is read into whatever application is using it, such as a business intelligence or analytics app. This slows the process down because the data needs to be processed first. Data lakes are a great way to take in a lot of data very fast and in large amounts, but you need to process it eventually.

See more: Best Data Analysis Methods 2021

]]>
Data-Driven Decision Making: Top 9 Best Practices https://www.datamation.com/big-data/data-driven-decision-making/ Fri, 09 Apr 2021 21:19:21 +0000 https://www.datamation.com/?p=21067 The phrase data-driven decision making – certainly popular in the field of data analytics – may seem redundant. After all, nearly everything is driven by Big Data or we wouldn’t have petabytes of databases in public and private data centers around the world.

So what exactly does it mean to be “data-driven?” It’s quite straight-forward. Data-driven decision making (DDDM) is the process of making organizational decisions based on actual data analytics rather than intuition, anecdote, or observation.

Business intelligence (BI), another popular data term, is entirely data-driven decision making. Using enterprise data analytics applications like TeraData or Microsoft Power BI, IT managers and business people process data, extract facts, figures, and patterns from that data, and make decisions based on the cold, hard facts, not gut feelings.

DDDM is the art and science of using facts, metrics, and other data to guide strategic business decisions to meet your company’s goals and objectives. Done right, DDDM helps you make better business decisions and spot strategic opportunities.

So how do you do DDDM right? You start with making it the norm. Your organization needs to make data-driven decision-making standard operating procedure. Sure there is room for gut instinct but first and foremost you need a culture of analytics. That’s why analytics has become so predominant in technology. As data has exploded, so has the opportunity for insight from that data, whether it’s through business intelligence, Big Data, data warehouses or data lakes.

Data mining results in general fall into two distinctive types: qualitative analysis and quantitative analysis, and both are equally valuable to making a data driven decision.

Quantitative data analysis is what DDDM is all about. It is measured analysis that focuses on numbers and statistics and other elements such as median and standard deviation. Qualitative analysis focuses on data that isn’t defined by numbers or metrics, such as images, videos, and social media.

Qualitative data analysis is observational while quantitative is factual. Both qualitative and quantitative data should be analyzed to make smarter data driven business decisions.

The good news is that employees across the board can participate in DDDM. While some of the more arcane data science disciplines belong to data scientists with advanced degrees, there are plenty of DDDM-related business applications for mere mortals, starting with Microsoft Excel.

Data-Driven Decision Making: Best Practices

Beyond apps, companies have to develop data skills through practice and application through best practices and business models with security and governance to watch over things. To effectively utilize data, professionals must take several steps:

1. Know your end game

If you don’t know your destination, how can you get there? That should be the first step in any DDDM scenario: ask yourself what are you trying to solve. Identify and understand your goals thoroughly. You need to do this before you begin collecting data so you know what data to collect and not to collect.

To get the most out of your data, companies should define their objectives before beginning their analysis. As Sun Tzu said in The Art of War, “Victorious warriors win first and then go to war, while defeated warriors go to war first and then seek to win.” Set a strategy to avoid falling into traps through Key Performance Indicators (KPIs) as measures of success or failure.

2. Coordinate among teams

Your DDDM project will involve at least two stakeholders: the business unit looking for insight and the IT people who will run the computing. But there may be others with a vested interest. Other departments or C-level executives might want to know the results as well. And adding new people might mean changes in the data collected. A new stakeholder could meant a new data variable added to the mix.

3. Democratize the process

We all have unconscious biases and we all have blind spots. We might even be guilty of seeing the data we wish was there instead of what’s really there. Therefore, make this a team effort and bring multiple eyes to the project. They will have their own biases, sure, but hopefully they won’t be the same as yours.

A 2010 McKinsey study of more than 1,000 major business investments showed that when organizations worked at reducing the effect of bias in their decision making processes, they achieved returns up to 7% higher. By eliminating bias, you open yourself up to discovering more opportunities.

4. Clean and organize your data

According to Gartner, data scientists spend 79% of their time collecting, cleaning, and organizing data and only 20% actually performing analysis. Not surprisingly, this is the least favorite part of the job for data scientists but it must be done.

The term “data cleaning” is the process of preparing raw data for analysis by removing or correcting incorrect, incomplete, or irrelevant data. In data warehousing, this is known as “schema on write,” where you apply such filters before you store it. The process involves creating a data dictionary, a table that defines each of your variables and translates them into what they mean to you in the context of this particular database. Once you have a dictionary it is available for reuse on other projects.

5. Find the data needed to solve these questions

Look at the data you have already gathered and try to focus on your ideal data, that which will help you answer the questions you are asking. Once you determine the data needed, check if you already have this data or if you need to set up a way to collect it or acquire it externally.

6. Perform basic statistical analysis

If you are new to analytics and DDDM, it really isn’t a good idea to involve a multi-petabyte database for your first project. Start small and learn. You are testing your models to see if they are providing the answers you need. Testing different models such as linear regressions, decision trees, random forest modeling, and others can help you determine which method is best suited to your data set.

From there, you can come up with three types of reports:

  • Descriptive: Just the facts.
  • Inferential: The facts, plus an interpretation to provide context.
  • Predictive: An inference based upon results.

7. Draw conclusions

The last step in data-driven decision making is coming to a conclusion or conclusions based on the findings. The conclusions you drawn from your analysis will be the basis on which your organization will make an informed business decision and plot strategy moving forward. The final step of DDDM is always the human element.

8. Present the data in a meaningful way

Digging through data is what computers do but humans want something more than rows and columns of numbers or their eyes glaze over. With the help of a great data visualization application you can present the data in a meaningful way to people less technologically skilled. Apps like Databox, Zoho Analytics, Tableau, Infogram, ChartBlocks, Datawrapper, and many more provide an easy to use GUI environment to tell your data story.

9. Revisit, review, revise, and reevaluate

Once you have your models and data dictionaries in place, you can’t rest on your laurels. Do not be afraid to step back and to rethink your decisions, review your models revise them, and ask how can I do this better? Optimization is always possible and nothing is bug-free. Keep revising your work and make it better.

]]>
IoT Security: 10 Tips to Secure the Internet of Things https://www.datamation.com/networks/iot-security-10-tips-to-secure-the-internet-of-things/ Thu, 25 Mar 2021 00:20:31 +0000 https://www.datamation.com/?p=20855 Clearly, IoT security is more important than ever before – but unfortunately, IoT security is also more challenging than ever before.

Some background: the COVID-19 pandemic and lockdown of 2020 threw all of the analyst predictions into chaos, but as the economy starts to emerge from the crisis, IT spending is once again expected to resume, and that includes the growth of the Internet of Things (IoT).

IoT is not a monolithic category but breaks down into numerous industries and use cases. Forrester Research predicts that in 2021, the IoT market growth will be driven by healthcare, smart offices, location services, remote-asset monitoring, and new networking technologies.

Much of that is being driven by the fallout from COVID-19. The explosion in remote work is driving some of this, as is remote medicine. New devices are coming out to do diagnosis for patients who can’t or won’t go in to see their doctor.

One of the key concerns related to the successful adoption of the IoT is having sufficient security mechanisms in place. This is especially important for IoT medical devices, because health care is so heavily regulated for privacy reasons.

And it’s not just securing the specific IoT devices, it’s all devices. Your connected refrigerator might not appear to be much of a threat to the security of your home if it is compromised by a hacker, but it can act as a gateway to more important devices on your home network. Then it becomes as significant as a heart monitor.

The same applies to industrial IoT (IIoT). Last summer it was revealed that Russian hackers penetrated the control systems of U.S. nuclear power plants. The consequences for such a compromise to a global manufacturing operation are considerable.

So it’s no surprise IoT security has been top of mind for IT managers for some time.

Understanding IoT – and Its Complexity

The Internet of Things (IoT) is a collection of devices that are connected to the Internet; these IoT devices are not traditional computing devices. Think of electronic devices that haven’t historically been connected, like copy machines, refrigerators, heart and glucose meters, or even the coffee pot.

The IoT is a hot topic because of its potential to connect previously unconnected devices and bring connectivity to places and things normally isolated. Research suggests improved employee productivity, better remote monitoring, and streamlined processes are among the top benefits for companies that embrace IoT.

What Should You Know About IoT Security?

There is an unfortunate pattern in technology that has repeated over the years; we rush to embrace the new and secure it later. Such has been the case with IoT devices. They often make the news because of hacks, ranging from mild to severe in their threat.

IoT security is top of mind at the Department of Homeland Security, which produced a lengthy paper on securing IoT devices. While the document is five years old and much has changed in the IoT world, many of the principals and best practices outlined are still valid and worthy of consideration.

Research from 451 Research shows 55% of IT professionals list IoT security as their top priority – and the figure is likely growing. So what can you do to secure your IoT devices? A lot, and over many areas. Let’s dig in.

1) Assume Every IoT Device Needs Configuring

When the market sees the advent of smart cat litter boxes and smart salt shakers, you know we’re at or approaching peak adoption for IoT devices. But don’t just ignore such features or assume they are securely configured out of the box. Leaving them unconfigured and not locked down is an opening to a hacker, whatever the device.

2) Know your Devices

It’s imperative that you know which types of devices are connected to your network and keep a detailed, up-to-date inventory of all connected IoT assets.

You should always keep your asset map current with each new IoT device connected to the network and know as much as possible about it. Facts to know include manufacturer and model ID, the serial number, software and firmware versions, and so forth.

3) Require Strong Login Credentials

People have a tendency of using the same login and password for all of their devices, and often the passwords are simple.

Make sure every login is unique for every employee, and require strong passwords. Use two-factor authorization if it’s available, and always change the default password on new devices. In order to ensure trusted connections, use public key infrastructure (PKI) and digital certificates to provide a secure underpinning for device identity and trust.

4) Use End-to-End Encryption

Connected devices talk to one another, and when they do, data is transferred from one point to another, and all too often when no encryption. You need to encrypt data at every transmission to protect against packet sniffing, a common form of attack. Devices should have encrypted data transfer as an option. If they don’t, consider alternatives.

5) Make Sure to Update the Device

Make sure to update the device on first use, as the firmware and software may have been updated between when the device was made and when you bought it. If the device has an auto-update feature, enable it so you don’t have to do it manually. And check the device regularly to see if it needs updating.

On the server side, change the name and password of the router. Routers are often named after the manufacturer by default. It’s also recommended that you avoid using your company name in the network.

6) Disable Features You Don’t Need

A good step in protecting a device is to disable any feature or function you don’t need. That includes open TCP/UDP ports, open serial ports, open password prompts, unencrypted communications, unsecured radio connections or any place a code injection can be done, like a Web server or database.

7) Avoid Using Public Wi-Fi

Using the Wi-Fi at Starbucks is rarely a good idea, but especially when connecting to your network.

All too often public Wi-Fi access points are old, outdated, not upgraded, and have easily broken security. If you must use public Wi-Fi, use a Virtual Private Network (VPN).

8) Build a Guest Network

A guest network is a good security solution for visitors who want to use your Wi-Fi either at home or in the office. A guest network gives them network access but walls them off from the main network so they can’t access your systems.

You can also use a guest network for your IoT devices, so if a device is compromised, the hacker will be stuck in the guest network.

9) Use Network Segmentation

Network segmentation is where you divide a network into two or more subsections to enable granular control over lateral movement of traffic between devices and workloads. In an unsegmented network, nothing is walled off. Every endpoint can communicate with another, so once a hacker breaches your firewall they have total access. In a network that is segmented, it becomes much harder it is for hackers to move around.

Enterprises should use virtual local area network (VLAN) configurations and next-generation firewall policies to implement network segments that keep IoT devices separate from IT assets. This way, both groups can be protected from the possibility of a lateral exploit.

Also consider deploying a Zero Trust Network architecture. Like its name implies, Zero Trust basically secures every digital asset and makes no assumption of trust from another digital asset, thus limiting the movement of someone with unauthorized access.

10) Actively Monitor IoT Devices

We can’t stress this enough: real-time monitoring, reporting and alerting are imperative for organizations to manage their IoT risks.

Traditional endpoint security solutions often don’t work with IoT devices so a new approach is required. That means real-time monitoring for unusual behavior. Just as with a Zero Trust network, don’t let IoT devices access your network without keeping a constant eye on them.

]]>
Guide to Database Management https://www.datamation.com/data-center/guide-to-database-management/ Thu, 11 Mar 2021 18:49:08 +0000 https://www.datamation.com/?p=20806 A Database Management System (DBMS) is a system for the management of digital databases, stand alone or multiple, including the storage of database content, creation and maintenance of data, search, and other core data functions.

An effective Database Management System is essential for competitive use of data analytics software. To data mine their vast store of information, companies rely heavily on their DBMS. Indeed, the DBMS is a core tool in the enterprise data center.

There are different types of Database Management Systems in existence, with the most prominent being Oracle, IBM DB2 and Microsoft SQL Server. These products and many more offer access to multiple users based on different levels of privileges for different users and are controlled centrally by a single administrator. The database is managed by a database administrator, or DBA.

A DBA is in charge of maintaining the database(s), with their primary responsibility to maintain data integrity. This means the DBA has to ensure that data does not get corrupted, is backed up and immediately recoverable and is secure from unauthorized access.

Four Key Components to a DBMS

The four data technologies define any given Database Management System.

1) The modeling language, which defines the language of each database hosted in the DBMS. There are a number of approaches – hierarchal, network, relational and object – but relational is the most popular and ubiquitous.

2) Data structures, which help organize the data such as individual records, files, fields and their definitions and objects such as visual media.

3) Data query language, which handles how queries are made to the system and maintains the security of the database by monitoring data access rights of users. The overwhelming favorite is the SQL language, especially in Relational Database Management Systems.

4) Finally, the mechanism that protects transactions by protecting data integrity and making sure same record will not be modified by multiple users at the same time.

Database Management Key Issues

Your Database Admin, DBA, has several issues that are top of mind in the day-to-day operation of his or her job. They include:

1. Scalability

No one likes a slow response time from a database. Speed is of the essence. The question is do you scale up, using a very large server and database and increasing the CPUs, memory, and storage as use demands it? Or scale out, distributing the workload across multiple machines and geographic locations.

2. Cybersecurity

Data breaches inevitably involve a database. Your security has to work every time, the bad guys only need to be right once. That means constant monitoring and staying on top of patches and vulnerabilities. And educate your staff on corporate data security policies.

3. New data sources

It used to be databases were filled with the traditional data sources, like user input and transactions. But in recent years, data sources have expanded to include IOT, sensors, and social media. This data explosion has companies struggling to cope with the influx of data and rapidly expanded databases.

4. Decentralized data

Databases have traditionally been held in a single, giant repository, but there has been a trend toward decentralized storage, especially as companies expand globally. There are benefits to decentralized data management, such as redundancy and load balancing, but there are challenges as well, such as how will the data be distributed, what’s the best decentralization method, and regulatory/compliance issues.

5. Database Talent

DBAs are rare. Good DBAs are even more rare. The result is salaries for experienced DBAs can easily run into the six-figure range, and that’s if they are on the market for a job to begin with. But you can’t live without them, either, so DBAs are going to be a necessary expense.

6. Siloing

This is one of those issues that requires considerable foresight and planning. Companies start small and grow. Or perhaps they merge with another. The result is multiple databases with multiple silos of information that have no connection to each other. This means a major overhaul of the back end to erase the silos and give a full picture. A little planning in the beginning can save a lot of heavy lifting later.

7. Commercial or Open Source

Most industries consolidate over time, but there is a rich choice of open source databases on the market, including PostgreSQL, MariaDB, CockroachDB, Neo4j, MongoDB, Redis, Cassandra, SQLite and more. The main reason for the variety is they all have specialties; CockroachDB, for example, specializes in distributed clusters rather than a single monolithic database.

While free to use, open source databases aren’t as well supported as Oracle or SQL Server and talent might prove harder to come by. As the saying goes, you get what you pay for.

8. Disaster recovery

In the good old days of a decade or two ago, DBAs’ backup plans typically consisted of making regular database backups and storing them off-site. If a disaster occurred, the backup could then be used to restore databases at their last point of backup.

That doesn’t work anymore. Too much data is coming in to allow for a once-a-week backup because you could lose up to six days of data. Databases may be too large to back up conveniently or in a timely manner.

There are two options for DBAs: hefty on-premises backup, including high capacity disk arrays, high-capacity tape storage and even hybrid hardware and software solutions, or the cloud. The former has the pluses of being immediately available but the drawback of being expensive, while the latter is convenient, requires no hardware purchase but may be subject to latency. There are a number of vendors who provide data protection as a service, or disaster recovery as a service, including IBM and Microsoft.

Moving to the Cloud

Moving a database to the cloud creates multiple issues, hence it warrants its own segment separate from the other key issues. The good news is that all of the major on-premises databases – Oracle, DB2, SQL Server, MySQL, and PostgreSQL – are available from major cloud services providers, along with smaller players like MariaDB and Apache Cassandra. This makes migration from on-premises much easier.

Moving to the cloud solves several on-prem issues. For starters, you don’t have to worry about DBAs because the cloud service provider manages the database for you. The cloud service provider also handles backup, patching, load balancing, and distribution, taking a number of chores off your hands.

But then there is the issue of data movement. Everything in the cloud is metered. Moving a multi-terabyte or even petabyte database to the cloud can become very expensive. And it can get even more expensive if you have to pull that data down on-prem for whatever reason.

It can also get difficult. There have been anecdotal stories of companies having to fight their CSP to get data back. Once your data has moved to cloud, you become more dependent upon the provider and could be locked into its services, giving the CSP leverage over you in negotiating contract terms.

In moving to the cloud, don’t assume permanent residence. Make sure the contract describes the process by which your data will be returned, quantify the costs and set a time expectation for when the data is to be returned.

And once again the regulatory issue comes into play. The major CSPs all say they are HIPAA and financial regulatory compliant but you still might want to keep particularly sensitive data in-house.

The general consensus is leave data on-premises if it’s already there, and build new databases with new data in the cloud, and do all of the processing there. So move data back and forth as little as possible.

Database vs. Data Warehouse

Databases and data warehouses are sometimes confused, in no small part because of their name, but they are very different. Databases are either SQL (relational) or NoSQL (non-relational). A SQL database organizes information into rows and columns and is highly structured. A NoSQL or non-relational database uses any paradigm for storing data besides rows and columns and is ideal and popular for non-structured data.

A data warehouse is a system that aggregates, stores, and processes information in a structured format for the express purpose of analyzing business data to gain insight, a technique commonly referred to as Business Intelligence (BI).

So a standard database might hold customer information for a business but a data warehouse holds their transactions and is used to gain consumer insight; is there a particular time of day or day of the week when sales are very good or very bad? Is there a preferred method of purchase? What products have a high return rate? And so on.

So the primary purpose of a data warehouse is for companies to analyze all of their data to derive the most accurate business insights and forecasting models, whereas a database is more general storage.

Database vs. Data Lake

A data lake is a repository, yet it bears little to no resemblance to a standard database. A data lake is a large storage repository that holds a huge amount of raw data in its original format and you don’t use it until you need it. Whereas your data is neatly structured in a data warehouse, data stored in data lakes isn’t processed until you use it.

Data lakes are primarily used by data scientists and engineers rather than business users. If you think DBAs are expensive, wait until you hire a data scientist. They are used for processing very large amounts of data of all types. So they are of little use to small and medium-sized businesses.

]]>
Guide to Network as a Service (NaaS) https://www.datamation.com/data-center/guide-to-networking-as-a-service/ Thu, 18 Feb 2021 00:29:45 +0000 https://www.datamation.com/?p=20744 In the era of cloud computing, many compute functions are being offered as an on-demand service with metered use. While on-demand software could rightfully be traced back to the mainframes of the 1960s, arguably it started in the modern era with Salesforce and SaaS in 1999 launching its CRM software-as-a-service.

Since then there have been numerous other as-a-service offerings, from virtualized operating environments, storage, application development, desktops and even disaster recovery.

Include network as a service (NaaS) in that collection. NaaS is the sale of network services from a third-party to companies that don’t want to build their own networking infrastructure. Like all of the other as-a-service offerings, network as a service offers its functionality on a subscription basis, through the cloud, and with metered use, so you only pay for what you use.

Foundation of Network as a Service

NaaS is a series of network and value-added services – plus computing and network resources – sold as a service by communications service providers, including IT hardware vendors, cloud service providers, and telcos.

NaaS is presented to the customer though a single self-service portal where the customer can order, deploy, and manage their services on demand as needed. This reduces the work needed and the level of expertise required to deploy services.

Like the other as-a-service offerings, it provides a networking setup through a subscription (an operating expense, or OPEX) rather than the large up-front acquisition cost (a capital expense, or CAPEX). That means while you start off much cheaper, you have to be wary of your use and to not let consumption costs exceed the Capex costs.

Networking as a service arose around 2015, as companies started to embrace software-defined networking (SDN). SDN decouples the network control and forwarding functions, enabling the network control to become directly programmable. This makes a SDN more adaptable, dynamic, manageable, and cost-effective than traditional networking because the SDN can adapt to networking changes on the fly as needed.

The abstraction of applications from the hardware layer enables the use of application programming interfaces (APIs) to orchestrate and manage the network infrastructure in a more flexible and extensible way. It was now possible to program your network.

As SDN grew, companies started to virtualize the network process and use virtual logic entities to control the network instead of utilizing hardware switches and nodes. Companies were able to reduce complexity and increase network automation, eliminated manual configuration, centralized control and monitoring and were able to deploy applications and services faster by leveraging open APIs.

Three Services of Network as a Service

As the name implies, networking is sold as a service. There are three primary services under the NaaS umbrella:

  • Virtual private networks (VPN): NaaS extends a VPN and the resources contained in the network across other networks, like the public Internet. A VPN is only a point-to-point connection; from a remote worker’s laptop to the company network. But if they went outside the corporate network, they had no VPN protection. NaaS enables that protection outside the corporate firewall.
  • Bandwidth on demand (BoD): A technique by which network capacity is assigned based on requirements between different nodes or users. An app or user who suddenly needs more bandwidth can be dynamically adjusted to their needs.
  • Mobile network virtualization: This is a model where a telecommunications manufacturer or independent network operator – many of whom are NaaS providers — builds and operates a network and sells its communication access capabilities to third parties.

Network as a Service

NaaS technology contains three distinct sub-categories, each of which is sold as a service. 

Benefits of NaaS

SDN set the stage for NaaS because it broke the dependency on physical servers and networking hardware in traditional networking setups. With NaaS, a lot of the network administration can also be outsourced, giving a company flexibility and freedom to manage a network with less in-house technical expertise.

This means a company can offload day-to-day maintenance of equipment and network administration and focus on their line of business.

That’s just the beginning. NaaS can include services such as Wide Area Networking (WAN) connectivity, data center connectivity, cloud connectivity, bandwidth on demand, security services, and other applications.

Because your networking is provided by a service provider, you are protected with Service level agreements (SLAs) guarantee concerns such as levels of availability, network uptime, and response and resolution times for addressing issues. A good NaaS provider will optimize your network for your needs, so it’s important to establish performance expectations before signing the contract.

NaaS Pros and Cons 

Like any as a Service offering, you need to watch your usage. Data is being moved around more than ever and datasets are growing exponentially in this era of Big Data and artificial intelligence. You can easily run up a bill that eats up any potential savings realized from migrating to NaaS.

Another challenge is tradeoffs associated with kinds of outsourcing like ceding too much control of your assets to the provider. Such issues have already come up as relates to storage and who owns the data. Setting expectations in the contract and SLA are key.

Legacy data centers may prove challenging to upgrade. If you are heavily-dependent on the pre-cloud MPLS technology and have deployed little of SD-WAN, for example, you may have trouble migrating. Older hardware, like switches and routers, or on-premises applications not written for the cloud, may also prove problematic.

Finally there is always the risk of vendor lock-in. As mentioned earlier, cloud service providers have different specialties and moving off one may prove difficult because other providers don’t offer the same services. And there is always the risk that an organization may become too reliant on a particular service provider and become stuck with them.

Best Practice of NaaS

As with any as-a-Service model, the best use is to make your firm more agile and responsive to any changes in your environment. Many companies with seasonal crushes, such as Christmas, rely on AWS, Azure, etc., for bursts of compute power when needed and then dial back their usage after the need has passed.

Rely on your NaaS provider to offer the network administration skills you might not have so you can focus on core business competency and not worry about network administration. Shifting off your enterprise network to NaaS enables enterprises to scale bandwidth much quicker for increased mobile use or for excess capacity when needed. This allows customers to use features and services they might not otherwise use because they might not have had the skill sets needed in-house.

Likewise, the provider can help address capacity limitations behind the scenes so all you have to concern yourself with is rolling out line-of-business services and not setting up the network.

Who Sells NaaS and Where Do You Buy It?

Assessments of the growth of NaaS vary by research firm. Market Insights Reports predicts the NaaS market will grow from USD $6.5 billion in 2020 to USD $23.6 billion by 2026, at a Compound Annual Growth Rate (CAGR) of 38.2% during the forecast period.

According to Market Research Future, the NaaS market is growing at a scorching 28 percent CAGR rate. 

So who is selling NaaS? For starters, all of the major public cloud vendors – AWS, Azure, Google Cloud Platform, IBM Cloud, Rackspace, and so on. Major networking vendors like Cisco Systems, Juniper Networks, VMware, Aryaka Networks, and Brocade offer it, as do top communications firms like Alcatel Lucent, AT&T, Ciena, Akamai Technologies, Broadcom, Century Link, Inc., Citrix Systems, and Verizon.

NaaS providers vary in their offerings from one to the next, depending on the specialty of the provider. For example, Aryaka offers WAN and secure Virtual Private Networks (VPN) as a service, since it specializes in SD-WAN. Akamai offers CDN as a service because it is a content delivery network. With hundreds of services available, Amazon offers a massive variety of services.

Future of NaaS

NaaS is built on three emerging and growing technologies: SD-WAN, 5G, and Zero Trust networks.

To be clear, NaaS is the realization of SDN as the middle of the network for a cloud-centric enterprise architecture, or the “middle mile.” It brings SDN with its programmable networking to WAN services. But it only handles the middle mile. It does not address what is known as the “last mile,” and while some NaaS vendors may offer last mile connections between customers and POPs, not all do.

5G holds much promise but is still more vapor than product. As the expensive rollout continues, more services can be deployed around it. Of particular value is the network slicing of 5G, which virtualizes the network to insure an isolated end-to-end connection that delivers all of the needed services.

The same holds true to Zero Trust, the next step beyond the VPN in securing a network. Zero Trust networks are only now being rolled out and will grow over the next few years as a much more secure replacement to the VPN.

So as SD-WAN, 5G, and Zero Trust networks grow – and they will grow rapidly – so will Network as a Service.

]]>
What is a Data Fabric? https://www.datamation.com/big-data/what-is-a-data-fabric/ Thu, 04 Feb 2021 00:32:39 +0000 https://www.datamation.com/?p=20623 Data fabrics are a new type of networking based on a very familiar design concept.

If you haven’t heard of the concept of a “data fabric,” you will – the trend is growing rapidly. Allied Market Research projects the data fabric market will grow to $4.5 billion by 2026, with much of that in the North American market, due to hyperscale cloud growth.

Data fabrics are emerging as an approach to help organizations better deal with fast growing volumes of data, ever-changing application requirements and distributed processing scenarios.

Think of data fabric as a web stretched over a large network that connects multiple locations, types, and sources of data, both on-premises and in the public cloud, with a variety of methods for accessing that data to process, move, manage, and store the data within the confines of the fabric.

To be a “fabric,” it must have redundancy of pathways and not be dependent upon a single point to point connection, so if one connection is overloaded with data or otherwise unavailable, there are other pathways to the destination.

Does this sound familiar? It should, that’s how the Internet operates. The original Internet, Darpanet, was designed by the military to create a resilient and redundant network capable of surviving a nuclear attack but shifting from traditional hub-and-spoke network design to multiple pathways. Data fabrics work in the same manner.

Elements of a Data Fabric

For the longest time, apps had their own unique approach to storing and retrieving data. Unifying all of this data can be quite a challenge, since every application stores the data in different formats. Data is stored in many places around the network and in different application silos. Sometimes the data is redundant and deduplication is required. All of this adds to the burden of unifying data.

A data fabric, then, crosses all the data stores and brings together the right data for the right application. We have touched on what makes a data fabric, but now let’s go into detail. What is needed to constitute a data fabric includes the following:

  • Creates a unified data environment. A data fabric is not a connection to a storage array or database. It is a holistic connection of multiple and disparate data sources and should not omit any data source.
  • Combine data from multiple systems. It should pull data from everything from a mainframe to an AWS S3 storage repository.
  • Support multiple locations. It should support the on-premises data center, edge networks, and cloud computing environments.
  • Provide high availability and reliability. The fabric must be available at all times, resilient to high loads and even self-healing when there is a problem such as an unexpected outage.
  • Connect to any data source via connectors and components, eliminating the need for hard coding connections.
  • Provide seamless data ingestion and integration capabilities between the different data sources.

Data Fabrics and Big Data

Digital transformation is a major strategic agenda in most organizations, and that means tapping into all resources, legacy and modern, across a variety of formats. The goal is to create a converged platform that supports the storage, processing, analysis and management of disparate forms of data. This data can be drawn from a variety of sources, including files, database tables, data streams, objects, images, sensor data and even container-based applications.

In short, it’s a network of Big Data sources all tied together through a high-speed, redundant network of interfaces such as Network File System (NFS), POSIX (portable operating system interface), a REST API, Hadoop distributed file system (HDFS), ODBC, and Apache Kafka.

To truly engage in digital transformation and make the most of all of your data, you need to access all of it from different resources, and not be overwhelmed by the sheer volume of data. This requires a data-centric organization not tied to one vendor or protocol.

Why Data Fabrics Matter to Business

It’s often said that “data is the new oil.” Data drives competitive advantage for every business. Organizations need to deliver data quickly to serve business and customer needs. If you do not want rapid data access from all parts of your enterprise, from the mainframe to AWS stores, you aren’t serious about being competitive.

IT systems are becoming more complex than ever before and need the ability to work across complex environments using both legacy applications and data while also embracing new microservice-based applications. A data fabric is the versatile, powerful way to achieve this important goal.

Data Fabrics Vendors

A data fabric is a software solution with significant hardware connectivity and networking requirements. Most of the big names in networking have an offering of some kind:

]]>