AWS Guide
    • Home
      • Amazon Elastic Compute Cloud (Amazon EC2)
      • Amazon Lightsail
      • AWS Batch
      • AWS Batch
      • AWS Elastic Beanstalk
      • AWS Lambda
      • AWS Lambda
    • AWS Database
      • Amazon DynamoDB
      • Amazon ElastiCache
      • Amazon Glacier
      • Amazon Neptune
      • Amazon QLDB
      • Amazon Redshift
      • Amazon RDS
    • AWS Management
      • Amazon CloudFront
      • Amazon CloudFront
      • Amazon CloudWatch
      • Amazon SNS
      • Amazon SQS
      • Amazon SWF
      • AWS Auto Scaling
      • AWS CloudFormation
      • AWS CloudTrail
      • AWS Identity and Access Management
    • AWS Networking
      • Amazon API Gateway
      • Amazon CloudFront
      • Amazon CloudFront
      • Amazon Virtual Private Cloud
      • AWS Direct Connect
    • AWS Storage
      • Amazon EBS
      • Amazon ECR
      • Amazon ECR
      • Amazon ECS
      • Amazon Elastic Kubernetes Service
      • AWS Snowball
      • Storage Gateway
    Amazon Neptune

    Amazon Neptune

    Amazon Neptune is a purpose-built, high-performance graph database engine, that is optimized for storing billions of relationships and querying the graph with milliseconds latency. Neptune supports the popular graph query languages such  as Apache TinkerPop Gremlin and W3C’s SPARQL, which enable customers to build queries that efficiently navigate highly connected data sets. Some of Neptune power’s graph use cases are recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security.

    • Amazon Neptune is highly available, with read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across Availability Zones. 
    • Amazon Neptune provides data security features, with support for encryption at rest and in transit. Neptune is fully managed, which simply means hardware provisioning, software patching, setup, configuration, or backups will be done by Neptune

    Table of Contents

    • Amazon Neptune features
    • Amazon Neptune Component
      • Neptune replica
      • Primary DB
      • Cluster volume
    • Gremlin Console
    • Graph database
      • Recommendation Engines
      • Fraud Detection
      • Life Sciences
      • Knowledge Graphs

    Amazon Neptune features

    Customers can launch a database instance and connect their  application within minutes without additional configuration. Database Parameter Groups provide granular control and fine-tuning their database.

    • Amazon Neptune provides Amazon CloudWatch metrics for customers database instances. So that they can use the AWS Management Console to view over 20 key operational metrics for their database instances, including compute, memory, storage, query throughput, and active connections.
    • Customers have the ability to control if and when their  instance is patched via Database Engine Version Management. Amazon Neptune engines can notify customers via email or SMS of important database events like automated failover. 
    • Amazon Neptune supports quick, efficient cloning operations, where entire multi-terabyte database clusters can be cloned in minutes. Cloning is useful for a number of purposes including application development, testing, database updates, and running analytical queries.

    Amazon Neptune memory resources powering the production cluster up or down by creating new replica instances of the desired size, or by removing instances. Compute scaling operations typically complete in a few minutes.

    • Amazon Neptune will automatically grow the size of the database volume as their database storage needs grow. The volume can grow in increments of 10 GB up to a maximum of 64 TB. 
    • Amazon Neptune replicas increase read throughput to support high volume application requests by creating up to 15 database read replicas. By avoiding the need to perform writes at the replica nodes, it frees up more processing power to serve read requests and reduces the replica lag time often down to single digit milliseconds.

    Amazon Neptune allows fast, parallel bulk loading for Property Graph data that is stored in S3. They also can use a REST interface to specify the S3 location for the data. It uses a CSV delimited format to load data into the Nodes and Edges. 

    • RDF Bulk Loading:- Amazon Neptune enables fast, parallel bulk loading for RDF data that is stored in S3. Customers can use a REST interface to specify the S3 location for the data. 
      • The N-Triples (NT), N-Quads (NQ), RDF/XML, and Turtle RDF 1.1 serializations are supported.

    Resource Description Framework (RDF) provides flexibility for modeling complex information domains. There are a number of existing free or public datasets available in RDF including Wikidata and PubChem, a database of chemical molecules. 

    • Amazon Neptune enables the W3C’s Semantic Web standards of RDF  and SPARQL, and it also provides an HTTP REST endpoint that implements the SPARQL Protocol.

    Amazon Neptune enables the Property Graph model using the open source Apache TinkerPop Gremlin traversal language and provides a Gremlin Websockets server that supports TinkerPop version 3.3.

    • Using Neptune, AWS customers can quickly build fast Gremlin traversals over property graphs. Existing Gremlin applications can easily use Neptune by changing the Gremlin service configuration to point to a Neptune instance.
    • Neptune Supports W3C’s Resource Description Framework (RDF) and SPARQL. RDF is popular because it provides flexibility for modeling complex information domains. 

    Amazon Neptune allows fast, parallel bulk loading for Property Graph data that is stored in S3. They also can use a REST interface to specify the S3 location for the data. It uses a CSV delimited format to load data into the Nodes and Edges. 

    • RDF Bulk Loading:- Amazon Neptune enables fast, parallel bulk loading for RDF data that is stored in S3. Customers can use a REST interface to specify the S3 location for the data. 
      • The N-Triples (NT), N-Quads (NQ), RDF/XML, and Turtle RDF 1.1 serializations are supported.

    Resource Description Framework (RDF) provides flexibility for modeling complex information domains. There are a number of existing free or public datasets available in RDF including Wikidata and PubChem, a database of chemical molecules. 

    • Amazon Neptune enables the W3C’s Semantic Web standards of RDF  and SPARQL, and it also provides an HTTP REST endpoint that implements the SPARQL Protocol.

    Amazon Neptune uses graph structures such as nodes (data entities), edges (relationships), and properties to represent and store data. The relationships are stored as first-order citizens of the data model. This condition allows data in nodes to be directly linked, dramatically improving the performance of queries that navigate relationships in the data. The interactive performance at scale in Neptune effectively enables a broad set of graph use cases.

    • A graph in a graph database can be traversed along specific edge types, or across the entire graph.
    • Graph databases can represent how entities relate by using actions, ownership, parentage, and so on

    Graph databases are useful for connected, contextual, relationship-driven data, such as  social media data, recommendation engines, driving directions (route finding), logistics, diagnostics, and scientific data analysis in fields like neuroscience.

    • Another use case for graph databases is detecting fraud. For example, you can track credit card purchases and purchase locations to detect uncharacteristic use. Detecting fraudulent accounts is another example.

    Amazon Neptune enables customers build knowledge graph applications. A knowledge graph lets them store information in a graph model and use graph queries to help your users navigate highly connected datasets more easily. Neptune supports open source and open standard APIs so that you can quickly use existing information resources to build your knowledge graphs and host them on a fully managed service.

    • For example, suppose that a user is interested in the Mona Lisa by Leonardo da Vinci. User an discover other works of art by the same artist or other works located in The Louvre. Using a knowledge graph, it is possible to add topical information to product catalogs, build and query complex models of regulatory rules, or model general information, like Wikidata.

    Amazon Neptune helps you build applications that store and navigate information in the life sciences, and process sensitive data easily using encryption at rest. For example, using Neptune customers can store models of disease and gene interactions; search for graph patterns within protein pathways to find other genes that might be associated with a disease.

    • Amazon Neptune helps integrate information to tackle challenges in healthcare and life sciences research. Using Neptune creating and storing patient relationships from medical records across different systems is seamless. It also enables topically organize research publications to find relevant information quickly.

    With Amazon Neptune, clients can store relationships between information categories such as customer interests, friends, and purchase history in a graph. They can then quickly query it to make recommendations that are personalized and relevant.

    • Using a highly available graph database, making product recommendations to a user based on which products are purchased by others who follow the same sport and have similar purchase history. Or, identify people who have a friend in common, but don’t yet know each other, and make a friendship recommendation.

    With Amazon Neptune, you can use relationships to process financial and purchase transactions in near-real time to easily detect fraud patterns. Neptune provides a fully managed service to execute fast graph queries to detect that a potential purchaser is using the same email address and credit card as a known fraud case.

    • If you are building a retail fraud detection application, Neptune can help you build graph queries. These queries can help you easily detect relationship patterns, such as multiple people associated with a personal email address or multiple people who share the same IP address but reside in different physical addresses.

    Amazon Neptune Component

    The  type of instance that client specify determines the hardware of the host computer used for their instance. Each instance type offers different compute, memory, and storage capabilities and are grouped in instance families based on these capabilities. Each instance type provides higher or lower minimum performance from a shared resource.

    Neptune replica

    Neptune replica can Connected to the same storage volume as the primary DB instance and supports only read operations. Each Amazon Neptune DB cluster can have up to 15 Neptune Replicas in addition to the primary DB instance. This provides high availability by locating Neptune Replicas in separate Availability Zones and distribution load from reading clients.

    Primary DB

    Primary DB instance enables read and write operations, and performs all of the data modifications to the cluster volume. Each Neptune DB cluster has one primary DB instance that is responsible for writing (that is, loading or modifying) graph database contents

    Cluster volume

    Amazon Neptune data is stored in the cluster volume, which is designed for reliability and high availability. A cluster volume consists of copies of the data across multiple Availability Zones in a single AWS Region. Because your data is automatically replicated across Availability Zones, it is highly durable, and there is little possibility of data loss.

    Gremlin Console

    The Gremlin Console is a fairly standard REPL (Read Eval Print Loop) shell. It is based on the Groovy console and if you have used any of the other console environments such as those found with Scala, Python and Ruby you will feel right at home here. The Console offers a low overhead (you can set it up in seconds) and low barrier of entry way to start to play with graphs on your local computer. A Gremlin edge statement is what implies the existence of an edge between two vertices in a graph in Neptune. The subject (S) of an edge statement is the source from vertex. The predicate (P) is a user-supplied edge label. The object (O) is the target overtex. The graph (G) is a user-supplied edge identifier. The console can actually work with graphs that are running locally or remotely but for the majority of this book we will keep things simple and focus on local graphs.

    • A Gremlin property statement in Neptune asserts an individual property value for a vertex or edge. The subject is a user-supplied vertex or edge identifier.
    • The predicate is the property name (key), and the object is the individual property value.
    • The graph (G) is again the default graph identifier, the null graph, displayed as <~>.
    • A property can be represented by storing the element identifier in the S position, the property key in the P position, and the property value in the O position.

    Property graph data in Amazon Neptune is composed of four-position (quad) statements. Each of these statements represents an individual atomic unit of property graph data.  Each quad is a statement that makes an assertion about one or more resources. A statement can assert the existence of a relationship between two resources, or it can attach a property (key-value pair) to a resource. One can think of the quad predicate value generally as the verb of the statement. It describes the type of relationship or property that’s being defined. The object is the target of the relationship, or the value of the property.  

    • User-facing values in a quad statement are usually stored separately in a dictionary index, where the statement indexes reference them using an 8-byte long term identifier.
    • The exception to this is numeric values, including date and datetime values (represented as milliseconds from the epoch). These can be stored inline directly in the statement indexes.

    Amazon Neptune featured a Gremlin, a self-service tool for understanding the execution approach taken by the Neptune engine. Which add an explain parameter to an HTTP call that submits a Gremlin query. The explain feature provides information about the logical structure of query execution plans. It can be used to identify potential evaluation and execution bottlenecks. 

    Graph database

    Amazon Neptune is Graph database, which is purpose-built to store and navigate relationships. Graph databases have advantages over relational databases for certain use cases—including social networking, recommendation engines, and fraud detection—when creating relationships between data and quickly query these relationships. There are a number of challenges when building these types of applications using a relational database. It requires multiple tables with multiple foreign keys. The SQL queries to navigate this data require nested queries and complex joins that quickly become unwieldy. Neptune uses graph structures such as nodes (data entities), edges (relationships), and properties to represent and store data. The relationships are stored as first-order citizens of the data model. 

    • This condition allows data in nodes to be directly linked, which dramatically improves the performance of queries that navigate relationships in the data. The interactive performance at scale in Neptune effectively enables a broad set of graph use cases.

    Graph databases can represent how entities relate by using actions, ownership, parentage, and so on. Whenever connections or relationships between entities are at the core of the data a graph database is a natural choice. 

    • Graph databases are useful for modeling and querying social networks, business relationships, dependencies, shipping movements, and similar items.

    Graph databases are useful for connected, contextual, relationship-driven data. Other use cases include recommendation engines, driving directions (route finding), logistics, diagnostics, and scientific data analysis in fields like neuroscience.

    Recommendation Engines

    Amazon Neptune enables customers to store relationships between information categories such as customer interests, friends, and purchase history in a graph. They can then quickly query it to make recommendations that are personalized and relevant. 

    • A highly available graph database to make product recommendations to a user based on which products are purchased by others who follow the same sport and have similar purchase history. Or, identify people who have a friend in common, but don’t yet know each other, and make a friendship recommendation.

    Fraud Detection

    Using Amazon Neptune, customers can use relationships to process financial and purchase transactions in near-real time to easily detect fraud patterns. Neptune provides a fully managed service to execute fast graph queries to detect that a potential purchaser is using the same email address and credit card as a known fraud case. 

    • With Neptune, customers can build graph queries. These queries can help them detect relationship patterns, such as multiple people associated with a personal email address or multiple people who share the same IP address but reside in different physical addresses.

    Life Sciences

    Amazon Neptune enables customers to build applications that store and navigate information in the life sciences, and process sensitive data easily using encryption at rest. 

    • Using Neptune, they can store models of disease and gene interactions. It can be used to search graph patterns within protein pathways to find other genes that might be associated with a disease. 
    • Neptune helps integrate information to tackle challenges in healthcare and life sciences research. it can be used to create and store patient relationships from medical records across different systems.

    Knowledge Graphs

    Amazon Neptune allows  customers to build knowledge graph applications. A knowledge graph lets them store information in a graph model and use graph queries to help their users navigate highly connected datasets more easily. 

    • Neptune supports open source and open standard APIs so that you can quickly use existing information resources to build knowledge graphs and host them on a fully managed service. 
    • Using a knowledge graph, customers can add topical information to product catalogs, build and query complex models of regulatory rules, or model general information, like Wikidata.

    →

    Featured

    • Amazon ElastiCacheAmazon ElastiCache
    • Amazon DynamoDBAmazon DynamoDB
    • Amazon Elastic Block StoreAmazon EBS
    • Elastic Load BalancingElastic Load Balancing (ELB)
    • AWS Storage GatewayStorage Gateway
    • Amazon GlacierAmazon Glacier
    • Amazon NeptuneAmazon Neptune
    • Amazon Quantum Ledger DatabaseAmazon QLDB
    • Amazon SWFAmazon SWF
    • Amazon RedshiftAmazon Redshift
    © 2021