Hexagonal Architecture in the Frontend: A Real Case
You Can Shape Trend Reports — Participate in DZone Research Surveys + Enter the Raffles!
Modern API Management
When assessing prominent topics across DZone — and the software engineering space more broadly — it simply felt incomplete to conduct research on the larger impacts of data and the cloud without talking about such a crucial component of modern software architectures: APIs. Communication is key in an era when applications and data capabilities are growing increasingly complex. Therefore, we set our sights on investigating the emerging ways in which data that would otherwise be isolated can better integrate with and work alongside other app components and across systems.For DZone's 2024 Modern API Management Trend Report, we focused our research specifically on APIs' growing influence across domains, prevalent paradigms and implementation techniques, security strategies, AI, and automation. Alongside observations from our original research, practicing tech professionals from the DZone Community contributed articles addressing key topics in the API space, including automated API generation via no and low code; communication architecture design among systems, APIs, and microservices; GraphQL vs. REST; and the role of APIs in the modern cloud-native landscape.
Open Source Migration Practices and Patterns
MongoDB Essentials
Python list is a versatile data structure that allows you to easily store a large amount of data in a compact manner. Lists are widely used by Python developers and support many useful functions out-of-the-box. Often you may need to work with multiple lists or a list of lists and iterate over them sequentially, one after another. There are several simple ways to do this. In this article, we will learn how to go through multiple Python lists in a sequential manner. Let us say you have the following 3 lists. Python L1=[1,2,3] L2=[4,5,6] L3=[7,8,9] 1. Using itertools.chain() itertools is a very useful Python library that provides many functions to easily work with iterable data structures such as list. You can use the itertools.chain() function to quickly go through multiple lists sequentially. Here is an example of iterating through lists L1, L2, and L3 using the chain() function. Python >>> for i in itertools.chain(L1,L2,L3): print i 1 2 3 4 5 6 7 8 9 Using itertools is one of the fastest and most memory-efficient ways to go through multiple lists since it uses iterators. This is because iterators only return one item at a time, instead of storing a copy of the entire iterable in memory, as is the case of for loop. 2. Using for Loop Sometimes you may have a list of lists, as shown below. Python L4 = [L1, L2, L3] print L4 [[1, 2, 3], [4, 5, 6], [7, 8, 9]] In such cases, you can use a nested for loop to iterate through these lists. Python >>> for i in L4: for j in i: print j 1 2 3 4 5 6 7 8 9 Alternatively, you can also use itertools.chain() to go through a list of lists. Python >>> for i in itertools.chain(L4): for j in i: print j 1 2 3 4 5 6 7 8 9 3. Using Star Operator The above-mentioned methods work with most Python versions. But if you use Python 3+, then you can also avail the star (*) operator to quickly unpack a list of lists. Python for i in [*L1, *L2, *L3]: print(i) 1 2 3 4 5 6 7 8 9 4. Using itertools.izip() So far, in each of the above cases, all items of the first list are displayed, followed by all items of the second list, and so on. But sometimes you may need to sequentially process the first item of each list, followed by the second item of each list, and so on. For this kind of sequential order, you need to use the itertools.izip() function. Here is an example to illustrate it. Python for i in itertools.izip(*L4): for j in i: print j 1 4 7 2 5 8 3 6 9 Notice the difference in sequence. In this case, the output is the first item of each list (1, 4, 7), followed by the second item on each list (2, 5, 8), and so on. This is different from the sequence of the first list items (1, 2, 3) followed by second list items (4, 5, 6), and so on. Conclusion In this article, we have learned several simple ways to sequentially iterate over multiple lists in Python. Basically, there are two ways to do this. The first approach is when you need to process all items of one list before moving to the next one. The second approach is where you need to process the first item of each list then the second item of each list and so on. In the first case, you can use the itertools.chain() function, a for loop, or a star(*) operator. In the second case, you need to use the itertools.izip() function.
While debugging in an IDE or using simple command line tools is relatively straightforward, the real challenge lies in production debugging. Modern production environments have enabled sophisticated self-healing deployments, yet they have also made troubleshooting more complex. Kubernetes (aka k8s) is probably the most well-known orchestration production environment. To effectively teach debugging in Kubernetes, it's essential to first introduce its fundamental principles. This part of the debugging series is designed for developers looking to effectively tackle application issues within Kubernetes environments, without delving deeply into the complex DevOps aspects typically associated with its operations. Kubernetes is a big subject: it took me two videos just to explain the basic concepts and background. Introduction to Kubernetes and Distributed Systems Kubernetes, while often discussed in the context of cloud computing and large-scale operations, is not just a tool for managing containers. Its principles apply broadly to all large-scale distributed systems. In this post I want to explore Kubernetes from the ground up, emphasizing its role in solving real-world problems faced by developers in production environments. The Evolution of Deployment Technologies Before Kubernetes, the deployment landscape was markedly different. Understanding this evolution helps us appreciate the challenges Kubernetes aims to solve. The image below represents the road to Kubernetes and the technologies we passed along the way. In the image, we can see that initially, applications were deployed directly onto physical servers. This process was manual, error-prone, and difficult to replicate across multiple environments. For instance, if a company needed to scale its application, it involved procuring new hardware, installing operating systems, and configuring the application from scratch. This could take weeks or even months, leading to significant downtime and operational inefficiencies. Imagine a retail company preparing for the holiday season surge. Each time they needed to handle increased traffic, they would manually set up additional servers. This was not only time-consuming but also prone to human error. Scaling down after the peak period was equally cumbersome, leading to wasted resources. Enter Virtualization Virtualization technology introduced a layer that emulated the hardware, allowing for easier replication and migration of environments but at the cost of performance. However, fast virtualization enabled the cloud revolution. It lets companies like Amazon lease their servers at scale without compromising their own workloads. Virtualization involves running multiple operating systems on a single physical hardware host. Each virtual machine (VM) includes a full copy of an operating system, the application, necessary binaries, and libraries—taking up tens of GBs. VMs are managed via a hypervisor, such as VMware's ESXi or Microsoft's Hyper-V, which sits between the hardware and the operating system and is responsible for distributing hardware resources among the VMs. This layer adds additional overhead and can lead to decreased performance due to the need to emulate hardware. Note that virtualization is often referred to as "virtual machines," but I chose to avoid that terminology due to the focus of this blog on Java and the JVM where a virtual machine is typically a reference to the Java Virtual Machine (JVM). Rise of Containers Containers emerged as a lightweight alternative to full virtualization. Tools like Docker standardized container formats, making it easier to create and manage containers without the overhead associated with traditional virtual machines. Containers encapsulate an application’s runtime environment, making them portable and efficient. Unlike virtualization, containerization encapsulates an application in a container with its own operating environment, but it shares the host system’s kernel with other containers. Containers are thus much more lightweight, as they do not require a full OS instance; instead, they include only the application and its dependencies, such as libraries and binaries. This setup reduces the size of each container and improves boot times and performance by removing the hypervisor layer. Containers operate using several key Linux kernel features: Namespaces: Containers use namespaces to provide isolation for global system resources between independent containers. This includes aspects of the system like process IDs, networking interfaces, and file system mounts. Each container has its own isolated namespace, which gives it a private view of the operating system with access only to its resources. Control groups (cgroups): Cgroups further enhance the functionality of containers by limiting and prioritizing the hardware resources a container can use. This includes parameters such as CPU time, system memory, network bandwidth, or combinations of these resources. By controlling resource allocation, cgroups ensure that containers do not interfere with each other’s performance and maintain the efficiency of the underlying server. Union file systems: Containers use union file systems, such as OverlayFS, to layer files and directories in a lightweight and efficient manner. This system allows containers to appear as though they are running on their own operating system and file system, while they are actually sharing the host system’s kernel and base OS image. Rise of Orchestration As containers began to replace virtualization due to their efficiency and speed, developers and organizations rapidly adopted them for a wide range of applications. However, this surge in container usage brought with it a new set of challenges, primarily related to managing large numbers of containers at scale. While containers are incredibly efficient and portable, they introduce complexities when used extensively, particularly in large-scale, dynamic environments: Management overhead: Manually managing hundreds or even thousands of containers quickly becomes unfeasible. This includes deployment, networking, scaling, and ensuring availability and security. Resource allocation: Containers must be efficiently scheduled and managed to optimally use physical resources, avoiding underutilization or overloading of host machines. Service discovery and load balancing: As the number of containers grows, keeping track of which container offers which service and how to balance the load between them becomes critical. Updates and rollbacks: Implementing rolling updates, managing version control, and handling rollbacks in a containerized environment require robust automation tools. To address these challenges, the concept of container orchestration was developed. Orchestration automates the scheduling, deployment, scaling, networking, and lifecycle management of containers, which are often organized into microservices. Efficient orchestration tools help ensure that the entire container ecosystem is healthy and that applications are running as expected. Enter Kubernetes Among the orchestration tools, Kubernetes emerged as a frontrunner due to its robust capabilities, flexibility, and strong community support. Kubernetes offers several features that address the core challenges of managing containers: Automated scheduling: Kubernetes intelligently schedules containers on the cluster’s nodes, taking into account the resource requirements and other constraints, optimizing for efficiency and fault tolerance. Self-healing capabilities: It automatically replaces or restarts containers that fail, ensuring high availability of services. Horizontal scaling: Kubernetes can automatically scale applications up and down based on demand, which is essential for handling varying loads efficiently. Service discovery and load balancing: Kubernetes can expose a container using the DNS name or using its own IP address. If traffic to a container is high, Kubernetes is able to load balance and distribute the network traffic so that the deployment is stable. Automated rollouts and rollbacks: Kubernetes allows you to describe the desired state for your deployed containers using declarative configuration, and can change the actual state to the desired state at a controlled rate, such as to roll out a new version of an application. Why Kubernetes Stands Out Kubernetes not only solves practical, operational problems associated with running containers but also integrates with the broader technology ecosystem, supporting continuous integration and continuous deployment (CI/CD) practices. It is backed by the Cloud Native Computing Foundation (CNCF), ensuring it remains cutting-edge and community-focused. There used to be a site called "doyouneedkubernetes.com," and when you visited that site, it said, "No." Most of us don't need Kubernetes and it is often a symptom of Resume Driven Design (RDD). However, even when we don't need its scaling capabilities the advantages of its standardization are tremendous. Kubernetes became the de-facto standard and created a cottage industry of tools around it. Features such as observability and security can be plugged in easily. Cloud migration becomes arguably easier. Kubernetes is now the "lingua franca" of production environments. Kubernetes For Developers Understanding Kubernetes architecture is crucial for debugging and troubleshooting. The following image shows the high-level view of a Kubernetes deployment. There are far more details in most tutorials geared towards DevOps engineers, but for a developer, the point that matters is just "Your Code" - that tiny corner at the edge. In the image above we can see: Master node (represented by the blue Kubernetes logo on the left): The control plane of Kubernetes, responsible for managing the state of the cluster, scheduling applications, and handling replication Worker nodes: These nodes contain the pods that run the containerized applications. Each worker node is managed by the master. Pods: The smallest deployable units created and managed by Kubernetes, usually containing one or more containers that need to work together These components work together to ensure that an application runs smoothly and efficiently across the cluster. Kubernetes Basics In Practice Up until now, this post has been theory-heavy. Let's now review some commands we can use to work with a Kubernetes cluster. First, we would want to list the pods we have within the cluster which we can do using the get pods command as such: $ kubectl get pods NAME READY STATUS RESTARTS AGE my-first-pod-id-xxxx 1/1 Running 0 13s my-second-pod-id-xxxx 1/1 Running 0 13s A command such as kubectl describe pod returns a high-level description of the pod such as its name, parent node, etc. Many problems in production pods can be solved by looking at the system log. This can be accomplished by invoking the logs command: $ kubectl logs -f <pod> [2022-11-29 04:12:17,262] INFO log data ... Most typical large-scale application logs are ingested by tools such as Elastic, Loki, etc. As such, the logs command isn't as useful in production except for debugging edge cases. Final Word This introduction to Kubernetes has set the stage for deeper exploration into specific debugging and troubleshooting techniques, which we will cover in the upcoming posts. The complexity of Kubernetes makes it much harder to debug, but there are facilities in place to work around some of that complexity. While this article (and its follow-ups) focus on Kubernetes, future posts will delve into observability and related tools, which are crucial for effective debugging in production environments.
Do you need to write a lot of mapping code in order to map between different object models? MapStruct simplifies this task by generating mapping code. In this blog, you will learn some basic features of MapStruct. Enjoy! Introduction In a multi-layered application, one often has to write boilerplate code in order to map different object models. This can be a tedious and an error-prone task. MapStruct simplifies this task by generating the mapping code for you. It generates code during compile time and aims to generate the code as if it was written by you. This blog will only give you a basic overview of how MapStruct can aid you, but it will be sufficient to give you a good impression of which problem it can solve for you. If you are using IntelliJ as an IDE, you can also install the MapStruct Support Plugin which will assist you in using MapStruct. Sources used in this blog can be found on GitHub. Prerequisites Prerequisites for this blog are: Basic Java knowledge, Java 21 is used in this blog Basic Spring Boot knowledge Basic Application The application used in this blog is a basic Spring Boot project. By means of a Rest API, a customer can be created and retrieved. In order to keep the API specification and source code in line with each other, you will use the openapi-generator-maven-plugin. First, you write the OpenAPI specification and the plugin will generate the source code for you based on the specification. The OpenAPI specification consists out of two endpoints, one for creating a customer (POST) and one for retrieving the customer (GET). The customer consists of its name and some address data. YAML Customer: type: object properties: firstName: type: string description: First name of the customer minLength: 1 maxLength: 20 lastName: type: string description: Last name of the customer minLength: 1 maxLength: 20 street: type: string description: Street of the customer minLength: 1 maxLength: 20 number: type: string description: House number of the customer minLength: 1 maxLength: 5 postalCode: type: string description: Postal code of the customer minLength: 1 maxLength: 5 city: type: string description: City of the customer minLength: 1 maxLength: 20 The CustomerController implements the generated Controller interface. The OpenAPI maven plugin makes use of its own model. In order to transfer the data to the CustomerService, DTOs are created. These are Java records. The CustomerDto is: Java public record CustomerDto(Long id, String firstName, String lastName, AddressDto address) { } The AddressDto is: Java public record AddressDto(String street, String houseNumber, String zipcode, String city) { } The domain itself is used within the Service and is a basic Java POJO. The Customer domain is: Java public class Customer { private Long customerId; private String firstName; private String lastName; private Address address; // Getters and setters left out for brevity } The Address domain is: Java public class Address { private String street; private int houseNumber; private String zipcode; private String city; // Getters and setters left out for brevity } In order to connect everything together, you will need to write mapper code for: Mapping between the API model and the DTO Mapping between the DTO and the domain Mapping Between DTO and Domain Add Dependency In order to make use of MapStruct, it suffices to add the MapStruct Maven dependency and to add some configuration to the Maven Compiler plugin. XML <dependency> <groupId>org.mapstruct</groupId> <artifactId>mapstruct</artifactId> <version>${org.mapstruct.version}</version> </dependency> ... <build> <plugins> ... <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-compiler-plugin</artifactId> <version>3.8.1</version> <configuration> <annotationProcessorPaths> <path> <groupId>org.mapstruct</groupId> <artifactId>mapstruct-processor</artifactId> <version>${org.mapstruct.version}</version> </path> </annotationProcessorPaths> </configuration> </plugin> ... </plugins> </build> Create Mapper The CustomerDto, AddressDto and the Customer, Address domains do not differ very much from each other. CustomerDto has an id while Customer has a customerId. AddressDto has a houseNumber of the type String while Address has a houseNumber of the type integer. In order to create a mapper for this using MapStruct, you create an interface CustomerMapper, annotate it with @Mapper, and specify the component model with the value spring. Doing this will ensure that the generated mapper is a singleton-scoped Spring bean that can be retrieved via @Autowired. Because both models are quite similar to each other, MapStruct will be able to generate most of the code by itself. Because the customer id has a different name in both models, you need to help MapStruct a bit. Using the @Mapping annotation, you specify the source and target mapping. For the type conversion, you do not need to do anything, MapStruct can sort this out based on the implicit type conversions. The corresponding mapper code is the following: Java @Mapper(componentModel = "spring") public interface CustomerMapper { @Mapping(source = "customerId", target = "id") CustomerDto transformToCustomerDto(Customer customer); @Mapping(source = "id", target = "customerId") Customer transformToCustomer(CustomerDto customerDto); } Generate the code: Shell $ mvn clean compile In the target/generated-sources/annotations directory, you can find the generated CustomerMapperImpl class. Java @Generated( value = "org.mapstruct.ap.MappingProcessor", date = "2024-04-21T13:38:51+0200", comments = "version: 1.5.5.Final, compiler: javac, environment: Java 21 (Eclipse Adoptium)" ) @Component public class CustomerMapperImpl implements CustomerMapper { @Override public CustomerDto transformToCustomerDto(Customer customer) { if ( customer == null ) { return null; } Long id = null; String firstName = null; String lastName = null; AddressDto address = null; id = customer.getCustomerId(); firstName = customer.getFirstName(); lastName = customer.getLastName(); address = addressToAddressDto( customer.getAddress() ); CustomerDto customerDto = new CustomerDto( id, firstName, lastName, address ); return customerDto; } @Override public Customer transformToCustomer(CustomerDto customerDto) { if ( customerDto == null ) { return null; } Customer customer = new Customer(); customer.setCustomerId( customerDto.id() ); customer.setFirstName( customerDto.firstName() ); customer.setLastName( customerDto.lastName() ); customer.setAddress( addressDtoToAddress( customerDto.address() ) ); return customer; } protected AddressDto addressToAddressDto(Address address) { if ( address == null ) { return null; } String street = null; String houseNumber = null; String zipcode = null; String city = null; street = address.getStreet(); houseNumber = String.valueOf( address.getHouseNumber() ); zipcode = address.getZipcode(); city = address.getCity(); AddressDto addressDto = new AddressDto( street, houseNumber, zipcode, city ); return addressDto; } protected Address addressDtoToAddress(AddressDto addressDto) { if ( addressDto == null ) { return null; } Address address = new Address(); address.setStreet( addressDto.street() ); if ( addressDto.houseNumber() != null ) { address.setHouseNumber( Integer.parseInt( addressDto.houseNumber() ) ); } address.setZipcode( addressDto.zipcode() ); address.setCity( addressDto.city() ); return address; } } As you can see, the code is very readable and it has taken into account the mapping of Customer and Address. Create Service The Service will create a domain Customer taken the CustomerDto as an input. The customerMapper is injected into the Service and is used for converting between the two models. The other way around, when a customer is retrieved, the mapper converts the domain Customer to a CustomerDto. In the Service, the customers are persisted in a basic list in order to keep things simple. Java @Service public class CustomerService { private final CustomerMapper customerMapper; private final HashMap<Long, Customer> customers = new HashMap<>(); private Long index = 0L; CustomerService(CustomerMapper customerMapper) { this.customerMapper = customerMapper; } public CustomerDto createCustomer(CustomerDto customerDto) { Customer customer = customerMapper.transformToCustomer(customerDto); customer.setCustomerId(index); customers.put(index, customer); index++; return customerMapper.transformToCustomerDto(customer); } public CustomerDto getCustomer(Long customerId) { if (customers.containsKey(customerId)) { return customerMapper.transformToCustomerDto(customers.get(customerId)); } else { return null; } } } Test Mapper The mapper can be easily tested by using the generated CustomerMapperImpl class and verify whether the mappings are executed successfully. Java class CustomerMapperTest { @Test void givenCustomer_whenMaps_thenCustomerDto() { CustomerMapperImpl customerMapper = new CustomerMapperImpl(); Customer customer = new Customer(); customer.setCustomerId(2L); customer.setFirstName("John"); customer.setLastName("Doe"); Address address = new Address(); address.setStreet("street"); address.setHouseNumber(42); address.setZipcode("zipcode"); address.setCity("city"); customer.setAddress(address); CustomerDto customerDto = customerMapper.transformToCustomerDto(customer); assertThat( customerDto ).isNotNull(); assertThat(customerDto.id()).isEqualTo(customer.getCustomerId()); assertThat(customerDto.firstName()).isEqualTo(customer.getFirstName()); assertThat(customerDto.lastName()).isEqualTo(customer.getLastName()); AddressDto addressDto = customerDto.address(); assertThat(addressDto.street()).isEqualTo(address.getStreet()); assertThat(addressDto.houseNumber()).isEqualTo(String.valueOf(address.getHouseNumber())); assertThat(addressDto.zipcode()).isEqualTo(address.getZipcode()); assertThat(addressDto.city()).isEqualTo(address.getCity()); } @Test void givenCustomerDto_whenMaps_thenCustomer() { CustomerMapperImpl customerMapper = new CustomerMapperImpl(); AddressDto addressDto = new AddressDto("street", "42", "zipcode", "city"); CustomerDto customerDto = new CustomerDto(2L, "John", "Doe", addressDto); Customer customer = customerMapper.transformToCustomer(customerDto); assertThat( customer ).isNotNull(); assertThat(customer.getCustomerId()).isEqualTo(customerDto.id()); assertThat(customer.getFirstName()).isEqualTo(customerDto.firstName()); assertThat(customer.getLastName()).isEqualTo(customerDto.lastName()); Address address = customer.getAddress(); assertThat(address.getStreet()).isEqualTo(addressDto.street()); assertThat(address.getHouseNumber()).isEqualTo(Integer.valueOf(addressDto.houseNumber())); assertThat(address.getZipcode()).isEqualTo(addressDto.zipcode()); assertThat(address.getCity()).isEqualTo(addressDto.city()); } } Mapping Between API and DTO Create Mapper The API model looks a bit different than the CustomerDto because it has no Address object and number and postalCode have different names in the CustomerDto. Java public class Customer { private String firstName; private String lastName; private String street; private String number; private String postalCode; private String city; // Getters and setters left out for brevity } In order to create a mapper, you need to add a bit more @Mapping annotations, just like you did before for the customer ID. Java @Mapper(componentModel = "spring") public interface CustomerPortMapper { @Mapping(source = "street", target = "address.street") @Mapping(source = "number", target = "address.houseNumber") @Mapping(source = "postalCode", target = "address.zipcode") @Mapping(source = "city", target = "address.city") CustomerDto transformToCustomerDto(Customer customerApi); @Mapping(source = "id", target = "customerId") @Mapping(source = "address.street", target = "street") @Mapping(source = "address.houseNumber", target = "number") @Mapping(source = "address.zipcode", target = "postalCode") @Mapping(source = "address.city", target = "city") CustomerFullData transformToCustomerApi(CustomerDto customerDto); } Again, the generated CustomerPortMapperImpl class can be found in the target/generated-sources/annotations directory after invoking the Maven compile target. Create Controller The mapper is injected in the Controller and the corresponding mappers can easily be used. Java @RestController class CustomerController implements CustomerApi { private final CustomerPortMapper customerPortMapper; private final CustomerService customerService; CustomerController(CustomerPortMapper customerPortMapper, CustomerService customerService) { this.customerPortMapper = customerPortMapper; this.customerService = customerService; } @Override public ResponseEntity<CustomerFullData> createCustomer(Customer customerApi) { CustomerDto customerDtoIn = customerPortMapper.transformToCustomerDto(customerApi); CustomerDto customerDtoOut = customerService.createCustomer(customerDtoIn); return ResponseEntity.ok(customerPortMapper.transformToCustomerApi(customerDtoOut)); } @Override public ResponseEntity<CustomerFullData> getCustomer(Long customerId) { CustomerDto customerDtoOut = customerService.getCustomer(customerId); return ResponseEntity.ok(customerPortMapper.transformToCustomerApi(customerDtoOut)); } } Test Mapper A unit test is created in a similar way as the one for the Service and can be viewed here. In order to test the complete application, an integration test is created for creating a customer. Java @SpringBootTest @AutoConfigureMockMvc class CustomerControllerIT { @Autowired private MockMvc mockMvc; @Test void whenCreateCustomer_thenReturnOk() throws Exception { String body = """ { "firstName": "John", "lastName": "Doe", "street": "street", "number": "42", "postalCode": "1234", "city": "city" } """; mockMvc.perform(post("/customer") .contentType("application/json") .content(body)) .andExpect(status().isOk()) .andExpect(jsonPath("firstName", equalTo("John"))) .andExpect(jsonPath("lastName", equalTo("Doe"))) .andExpect(jsonPath("customerId", equalTo(0))) .andExpect(jsonPath("street", equalTo("street"))) .andExpect(jsonPath("number", equalTo("42"))) .andExpect(jsonPath("postalCode", equalTo("1234"))) .andExpect(jsonPath("city", equalTo("city"))); } } Conclusion MapStruct is an easy-to-use library for mapping between models. If the basic mapping is not sufficient, you are even able to create your own custom mapping logic (which is not demonstrated in this blog). It is advised to read the official documentation to get a comprehensive list of all available features.
Synopsis Many databases contain bitmaps stored as blobs or files: photos, document scans, medical images, etc. When these bitmaps are retrieved by various database clients and applications, it is sometimes desirable to uniquely watermark them as they are being retrieved, so that they can be identified later. In some cases, you may even want to make this watermark invisible. This kind of dynamic bitmap manipulation can easily be done by a programmable database proxy, without changing the persisted bitmaps. This approach has the following benefits: The watermark can be customized for each retrieval and can contain information about the date, time, user identity, IP address, etc. The image processing is done by the proxy, which puts no extra load on the database. This requires no changes to the database or to the database clients. The End Result Given a bitmap stored in a database, such as: a programmable database proxy can modify the bitmap on its way to the client to include a watermark containing any desired information, such as: How This Works The architecture is simple: instead of the normal connection between database clients and servers: the clients connect to the proxy, and the proxy connects to the server: The proxy can then manipulate the bitmaps as needed when they are retrieved. For instance, it can watermark only some bitmaps, or it can use different styles of watermarks, depending on the circumstances. The bitmaps stored in the database are completely unaffected: they are modified on the fly as they are forwarded to the clients. Advantages The clients and the database are blissfully unaware - this is completely transparent to them. Each image can be watermarked uniquely when it is retrieved (e.g., date/time, user name, IP address of client, etc.). No additional load is put on the database server(s). Disadvantages The system is more complex with the addition of the proxy. There will be (typically modest) an increase in latency, depending mostly on the size of the images, but this should be compared to the alternatives. Example Using a proxy, we can create a simple filter to add a watermark to certain bitmaps. If we assume that our database contains a table called images, with a column called bitmap of type blob or varbinary (depending on your database), we can create a result set filter in the proxy with the following parameter: Query pattern: regex:select.*from.*images.* and a bit of JavaScript code (which also uses the underlying Java engine): JavaScript // Get the value of the bitmap column as a byte stream let stream = context.packet.getJavaStream("bitmap"); if (stream === null) { return; } // The text to use as watermark const now = new Date(); const watermark = "Retrieved by " + context.connectionContext.userName + " on " + now.getFullYear() + "/" + (now.getMonth()+1) + "/" + now.getDate(); // Read the bitmap const ImageIO = Java.type("javax.imageio.ImageIO"); let img = ImageIO.read(stream); // Create the Graphics to draw the text let g = img.createGraphics(); const Color = Java.type("java.awt.Color"); g.setColor(new Color(255, 255, 0, 150)); const Font = Java.type("java.awt.Font"); g.setFont(new Font("sans-serif", Font.BOLD, 16)); // Draw the text at the bottom of the bitmap let textRect = textFont.getStringBounds(watermark, g.getFontRenderContext()); g.drawString(watermark, (img.getWidth() / 2) - (textRect.getWidth() / 2), img.getHeight() - (textRect.getHeight() / 2)); // Write the bitmap to the column value const ByteArrayOutputStream = Java.type("java.io.ByteArrayOutputStream"); let outStream = new ByteArrayOutputStream(); ImageIO.write(img, "png", outStream); context.packet.bitmap = outStream.toByteArray(); With this filter in place, bitmaps retrieved from this table will include a watermark containing the name of the database user, and a timestamp. The database is never affected: the bitmaps stored in the database are completely unchanged. They are modified on the fly as they are delivered to the client. Obviously, we can watermark bitmaps selectively, we can change the text of the watermark depending on any relevant factors, and we can play with fonts, colors, positioning, transparency, etc. See this example for details. Secret Watermarks In some cases, it might be desirable to mark the bitmaps in a way that is not visible to the naked eye. One trivial way to do this would be to edit the image's metadata, but if we need something more subtle, we can use steganography to distribute a secret message among the bitmap in a way that makes it difficult to detect. The example above can be modified to use the Adumbra library: // Get the value of the bitmap column as a byte stream let inStream = context.packet.getJavaStream("bitmap"); if (inStream === null) { return; } // The hidden message const now = new Date(); const message = "Retrieved by " + context.connectionContext.userName + " on " + now.getFullYear() + "/" + (now.getMonth()+1) + "/" + now.getDate(); const messageBytes = context.utils.getUTF8BytesForString(message); const keyBytes = context.utils.getUTF8BytesForString("This is my secret key"); // Hide the message in the bitmap const Encoder = Java.type("com.galliumdata.adumbra.Encoder"); const ByteArrayOutputStream = Java.type("java.io.ByteArrayOutputStream"); let outStream = new ByteArrayOutputStream(); let encoder = new Encoder(1); encoder.encode(inStream, outStream, "png", messageBytes, keyBytes); context.packet.bitmap = outStream.toByteArray(); With this in place, the modified bitmaps served to the clients will contain a secret watermark that will be difficult to detect, and almost impossible to extract without the secret key. What Else Can You Do With This? This watermarking technique can also be applied to documents other than bitmaps: Documents such as PDF and MS Word can be given some extra metadata on the fly, or they can be given a visible or invisible watermark - see this example for PDF documents. All text documents can be subtly marked using techniques such as altering spacing, spelling, layout, fonts and colors, zero-width characters, etc. All digital documents that can sustain minor changes without losing any significant meaning, such as bitmaps, audio files, and sample sets, can be altered in a similar way. In fact, entire data sets can be watermarked by subtly modifying some non-critical aspects of the data, making it possible to identify these datasets later on and know exactly their origin. This is beyond the scope of this article, but there are many ways to make data traceable back to its origin. Conclusion When you need to have a custom watermark for every retrieval of some bitmaps or documents from a database, the technique shown here is a solid approach that avoids any additional load on the database and requires no changes to the clients or servers.
Dynamic query building is a critical aspect of modern application development, especially in scenarios where the search criteria are not known at compile time. In this publication, let's deep dive into the world of dynamic query building in Spring Boot applications using JPA criteria queries. We’ll explore a flexible and reusable framework that allows developers to construct complex queries effortlessly. Explanation of Components Criteria Interface The Criteria interface serves as the foundation for our framework. It extends Specification<T> and provides a standardized way to build dynamic queries. By implementing the toPredicate method, the Criteria interface enables the construction of predicates based on the specified criteria. Java package com.core.jpa; import java.util.ArrayList; import java.util.List; import org.springframework.data.jpa.domain.Specification; import jakarta.persistence.criteria.CriteriaBuilder; import jakarta.persistence.criteria.CriteriaQuery; import jakarta.persistence.criteria.Predicate; import jakarta.persistence.criteria.Root; public class Criteria<T> implements Specification<T> { private static final long serialVersionUID = 1L; private transient List<Criterion> criterions = new ArrayList<>(); @Override public Predicate toPredicate(Root<T> root, CriteriaQuery<?> query, CriteriaBuilder builder) { if (!criterions.isEmpty()) { List<Predicate> predicates = new ArrayList<>(); for (Criterion c : criterions) { predicates.add(c.toPredicate(root, query, builder)); } if (!predicates.isEmpty()) { return builder.and(predicates.toArray(new Predicate[predicates.size()])); } } return builder.conjunction(); } public void add(Criterion criterion) { if (criterion != null) { criterions.add(criterion); } } } Criterion Interface The Criterion interface defines the contract for building individual predicates. It includes the toPredicate method, which is implemented by various classes to create specific predicates such as equals, not equals, like, etc. Java public interface Criterion { public enum Operator { EQ, IGNORECASEEQ, NE, LIKE, GT, LT, GTE, LTE, AND, OR, ISNULL } public Predicate toPredicate(Root<?> root, CriteriaQuery<?> query, CriteriaBuilder builder); } LogicalExpression Class The LogicalExpression class facilitates the combination of multiple criteria using logical operators such as AND and OR. By implementing the toPredicate method, this class allows developers to create complex query conditions by chaining together simple criteria. Java public class LogicalExpression implements Criterion { private Criterion[] criterion; private Operator operator; public LogicalExpression(Criterion[] criterions, Operator operator) { this.criterion = criterions; this.operator = operator; } @Override public Predicate toPredicate(Root<?> root, CriteriaQuery<?> query, CriteriaBuilder builder) { List<Predicate> predicates = new ArrayList<>(); for(int i=0;i<this.criterion.length;i++){ predicates.add(this.criterion[i].toPredicate(root, query, builder)); } if(null != operator && operator.equals(Criterion.Operator.OR)) { return builder.or(predicates.toArray(new Predicate[predicates.size()])); } return null; } } Restrictions Class The Restrictions class provides a set of static methods for creating instances of SimpleExpression and LogicalExpression. These methods offer convenient ways to build simple and complex criteria, making it easier for developers to construct dynamic queries. Java public class Restrictions { private Restrictions() { } public static SimpleExpression eq(String fieldName, Object value, boolean ignoreNull) { if (ignoreNull && (ObjectUtils.isEmpty(value))) return null; return new SimpleExpression(fieldName, value, Operator.EQ); } public static SimpleExpression ne(String fieldName, Object value, boolean ignoreNull) { if (ignoreNull && (ObjectUtils.isEmpty(value))) return null; return new SimpleExpression(fieldName, value, Operator.NE); } public static SimpleExpression like(String fieldName, String value, boolean ignoreNull) { if (ignoreNull && (ObjectUtils.isEmpty(value))) return null; return new SimpleExpression(fieldName, value.toUpperCase(), Operator.LIKE); } public static SimpleExpression gt(String fieldName, Object value, boolean ignoreNull) { if (ignoreNull && (ObjectUtils.isEmpty(value))) return null; return new SimpleExpression(fieldName, value, Operator.GT); } public static SimpleExpression lt(String fieldName, Object value, boolean ignoreNull) { if (ignoreNull && (ObjectUtils.isEmpty(value))) return null; return new SimpleExpression(fieldName, value, Operator.LT); } public static SimpleExpression gte(String fieldName, Object value, boolean ignoreNull) { if (ignoreNull && (ObjectUtils.isEmpty(value))) return null; return new SimpleExpression(fieldName, value, Operator.GTE); } public static SimpleExpression lte(String fieldName, Object value, boolean ignoreNull) { if (ignoreNull && (ObjectUtils.isEmpty(value))) return null; return new SimpleExpression(fieldName, value, Operator.LTE); } public static SimpleExpression isNull(String fieldName, boolean ignoreNull) { if (ignoreNull) return null; return new SimpleExpression(fieldName, null, Operator.ISNULL); } public static LogicalExpression and(Criterion... criterions) { return new LogicalExpression(criterions, Operator.AND); } public static LogicalExpression or(Criterion... criterions) { return new LogicalExpression(criterions, Operator.OR); } public static <E> LogicalExpression in(String fieldName, Collection<E> value, boolean ignoreNull) { if (ignoreNull && CollectionUtils.isEmpty(value)) return null; SimpleExpression[] ses = new SimpleExpression[value.size()]; int i = 0; for (Object obj : value) { if(obj instanceof String) { ses[i] = new SimpleExpression(fieldName, String.valueOf(obj), Operator.IGNORECASEEQ); } else { ses[i] = new SimpleExpression(fieldName, obj, Operator.EQ); } i++; } return new LogicalExpression(ses, Operator.OR); } public static Long convertToLong(Object o) { String stringToConvert = String.valueOf(o); if (!"null".equals(stringToConvert)) { return Long.parseLong(stringToConvert); } else { return Long.valueOf(0); } } } SimpleExpression Class The SimpleExpression class represents simple expressions with various operators such as equals, not equals, like, greater than, less than, etc. By implementing the toPredicate method, this class translates simple expressions into JPA criteria predicates, allowing for precise query construction. The SimpleExpression class represents simple expressions with various operators such as equals, not equals, like, greater than, less than, etc. By implementing the toPredicate method, this class translates simple expressions into JPA criteria predicates, allowing for precise query construction. Java public class SimpleExpression implements Criterion { private String fieldName; private Object value; private Operator operator; protected SimpleExpression(String fieldName, Object value, Operator operator) { this.fieldName = fieldName; this.value = value; this.operator = operator; } @Override @SuppressWarnings({ "rawtypes", "unchecked" }) public Predicate toPredicate(Root<?> root, CriteriaQuery<?> query, CriteriaBuilder builder) { Path expression = null; if (fieldName.contains(".")) { String[] names = StringUtils.split(fieldName, "."); if(names!=null && names.length>0) { expression = root.get(names[0]); for (int i = 1; i < names.length; i++) { expression = expression.get(names[i]); } } } else { expression = root.get(fieldName); } switch (operator) { case EQ: return builder.equal(expression, value); case IGNORECASEEQ: return builder.equal(builder.upper(expression), value.toString().toUpperCase()); case NE: return builder.notEqual(expression, value); case LIKE: return builder.like(builder.upper(expression), value.toString().toUpperCase() + "%"); case LT: return builder.lessThan(expression, (Comparable) value); case GT: return builder.greaterThan(expression, (Comparable) value); case LTE: return builder.lessThanOrEqualTo(expression, (Comparable) value); case GTE: return builder.greaterThanOrEqualTo(expression, (Comparable) value); case ISNULL: return builder.isNull(expression); default: return null; } } } Usage Example Suppose we have a User entity and a corresponding UserRepository interface defined in our Spring Boot application: Java @Entity public class User { @Id @GeneratedValue(strategy = GenerationType.IDENTITY) private Long id; private String name; private int age; private double salary; // Getters and setters } public interface UserRepository extends JpaRepository<User, Long> { } With these entities in place, let’s demonstrate how to use our dynamic query-building framework to retrieve a list of users based on certain search criteria: Java Criteria<User> criteria = new Criteria<>(); criteria.add(Restrictions.eq("age", 25, true)); criteria.add(Restrictions.like("name", "John", true)); criteria.add(Restrictions.or( Restrictions.gt("salary", 50000, true), Restrictions.isNull("salary", null, false) )); List<User> users = userRepository.findAll(criteria); In this example, we construct a dynamic query using the Criteria interface and various Restrictions provided by our framework. We specify criteria such as age equals 25, name contains "John", and salary greater than 50000 or null. Finally, we use the UserRepository to execute the query and retrieve the matching users. Conclusion Dynamic query building with JPA criteria queries in Spring Boot applications empowers developers to create sophisticated queries tailored to their specific needs. By leveraging the framework outlined in this publication, developers can streamline the process of constructing dynamic queries and enhance the flexibility and efficiency of their applications. Additional Resources Spring Data JPA Documentation
In an era of heightened data privacy concerns, the development of local Large Language Model (LLM) applications provides an alternative to cloud-based solutions. Ollama offers one solution, enabling LLMs to be downloaded and used locally. In this article, we'll explore how to use Ollama with LangChain and SingleStore using a Jupyter Notebook. The notebook file used in this article is available on GitHub. Introduction We'll use a Virtual Machine running Ubuntu 22.04.2 as our test environment. An alternative would be to use venv. Create a SingleStoreDB Cloud Account A previous article showed the steps required to create a free SingleStore Cloud account. We'll use Ollama Demo Group as our Workspace Group Name and ollama-demo as our Workspace Name. We’ll make a note of our password and host name. For this article, we'll temporarily allow access from anywhere by configuring the firewall under Ollama Demo Group > Firewall. For production environments, firewall rules should be added to provide increased security. Create a Database In our SingleStore Cloud account, let's use the SQL Editor to create a new database. Call this ollama_demo, as follows: SQL CREATE DATABASE IF NOT EXISTS ollama_demo; Install Jupyter From the command line, we’ll install the classic Jupyter Notebook, as follows: Shell pip install notebook Install Ollama We'll install Ollama, as follows: Shell curl -fsSL https://ollama.com/install.sh | sh Environment Variable Using the password and host information we saved earlier, we’ll create an environment variable to point to our SingleStore instance, as follows: Shell export SINGLESTOREDB_URL="admin:<password>@<host>:3306/ollama_demo" Replace <password> and <host> with the values for your environment. Launch Jupyter We are now ready to work with Ollama and we’ll launch Jupyter: Shell jupyter notebook Fill Out the Notebook First, some packages: Shell !pip install langchain ollama --quiet --no-warn-script-location Next, we’ll import some libraries: Python import ollama from langchain_community.vectorstores import SingleStoreDB from langchain_community.vectorstores.utils import DistanceStrategy from langchain_core.documents import Document from langchain_community.embeddings import OllamaEmbeddings We'll create embeddings using all-minilm (45 MB at the time of writing): Python ollama.pull("all-minilm") Example output: Plain Text {'status': 'success'} For our LLM we'll use llama2 (3.8 GB at the time of writing): Python ollama.pull("llama2") Example output: Plain Text {'status': 'success'} Next, we’ll use the example text from the Ollama website: Python documents = [ "Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels", "Llamas were first domesticated and used as pack animals 4,000 to 5,000 years ago in the Peruvian highlands", "Llamas can grow as much as 6 feet tall though the average llama between 5 feet 6 inches and 5 feet 9 inches tall", "Llamas weigh between 280 and 450 pounds and can carry 25 to 30 percent of their body weight", "Llamas are vegetarians and have very efficient digestive systems", "Llamas live to be about 20 years old, though some only live for 15 years and others live to be 30 years old" ] embeddings = OllamaEmbeddings( model = "all-minilm", ) dimensions = len(embeddings.embed_query(documents[0])) docs = [Document(text) for text in documents] We’ll specify all-minilm for the embeddings, determine the number of dimensions returned for the first document, and convert the documents to the format required by SingleStore. Next, we’ll use LangChain: Python docsearch = SingleStoreDB.from_documents( docs, embeddings, table_name = "langchain_docs", distance_strategy = DistanceStrategy.EUCLIDEAN_DISTANCE, use_vector_index = True, vector_size = dimensions ) In addition to the documents and embeddings, we’ll provide the name of the table we want to use for storage, the distance strategy, that we want to use a vector index, and the vector size using the dimensions we previously determined. These and other options are explained in further detail in the LangChain documentation. Using the SQL Editor in SingleStore Cloud, let’s check the structure of the table created by LangChain: SQL USE ollama_demo; DESCRIBE langchain_docs; Example output: Plain Text +----------+------------------+------+------+---------+----------------+ | Field | Type | Null | Key | Default | Extra | +----------+------------------+------+------+---------+----------------+ | id | bigint(20) | NO | PRI | NULL | auto_increment | | content | longtext | YES | | NULL | | | vector | vector(384, F32) | NO | MUL | NULL | | | metadata | JSON | YES | | NULL | | +----------+------------------+------+------+---------+----------------+ We can see that a vector column with 384 dimensions was created for storing the embeddings. Let’s also quickly check the stored data: SQL USE ollama_demo; SELECT SUBSTRING(content, 1, 30) AS content, SUBSTRING(vector, 1, 30) AS vector FROM langchain_docs; Example output: Plain Text +--------------------------------+--------------------------------+ | content | vector | +--------------------------------+--------------------------------+ | Llamas weigh between 280 and 4 | [0.235754818,0.242168128,-0.26 | | Llamas were first domesticated | [0.153105229,0.219774529,-0.20 | | Llamas are vegetarians and hav | [0.285528302,0.10461951,-0.313 | | Llamas are members of the came | [-0.0174482632,0.173883006,-0. | | Llamas can grow as much as 6 f | [-0.0232818555,0.122274697,-0. | | Llamas live to be about 20 yea | [0.0260244086,0.212311044,0.03 | +--------------------------------+--------------------------------+ Finally, let’s check the vector index: SQL USE ollama_demo; SHOW INDEX FROM langchain_docs; Example output: Plain Text +----------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------------+---------+---------------+---------------------------------------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | Index_comment | Index_options | +----------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------------+---------+---------------+---------------------------------------+ | langchain_docs | 0 | PRIMARY | 1 | id | NULL | NULL | NULL | NULL | | COLUMNSTORE HASH | | | | | langchain_docs | 1 | vector | 1 | vector | NULL | NULL | NULL | NULL | | VECTOR | | | {"metric_type": "EUCLIDEAN_DISTANCE"} | | langchain_docs | 1 | __SHARDKEY | 1 | id | NULL | NULL | NULL | NULL | | METADATA_ONLY | | | | +----------------+------------+------------+--------------+-------------+-----------+-------------+----------+--------+------+------------------+---------+---------------+---------------------------------------+ We’ll now ask a question, as follows: Python prompt = "What animals are llamas related to?" docs = docsearch.similarity_search(prompt) data = docs[0].page_content print(data) Example output: Plain Text Llamas are members of the camelid family meaning they're pretty closely related to vicuñas and camels Next, we’ll use the LLM, as follows: Python output = ollama.generate( model = "llama2", prompt = f"Using this data: {data}. Respond to this prompt: {prompt}" ) print(output["response"]) Example output: Plain Text Llamas are members of the camelid family, which means they are closely related to other animals such as: 1. Vicuñas: Vicuñas are small, wild relatives of llamas and alpacas. They are native to South America and are known for their soft, woolly coats. 2. Camels: Camels are also members of the camelid family and are known for their distinctive humps on their backs. There are two species of camel: the dromedary and the Bactrian. 3. Alpacas: Alpacas are domesticated animals that are closely related to llamas and vicuñas. They are native to South America and are known for their soft, luxurious fur. So, in summary, llamas are related to vicuñas, camels, and alpacas. Summary In this article, we’ve seen that we can connect to SingleStore, store the documents and embeddings, ask questions about the data in the database, and use the power of LLMs locally through Ollama.
Binary Search Trees (BSTs) are fundamental hierarchical data structures in computer science, renowned for their efficiency in organizing and managing data. Each node in a BST holds a key value, with left and right child nodes arranged according to a specific property: nodes in the left subtree have keys less than the parent node, while those in the right subtree have keys greater than the parent node. This property facilitates fast searching, insertion, and deletion operations, making BSTs invaluable in applications requiring sorted data. Traversal algorithms like in-order, pre-order, and post-order traversals further enhance their utility by enabling systematic node processing. BSTs find extensive use in databases, compilers, and various computer science algorithms due to their simplicity and effectiveness in data organization and manipulation. This article delves into the theory and practical implementation of BSTs, highlighting their significance in both academic and real-world applications. Understanding Binary Search Trees Binary Search Trees (BSTs) are hierarchical data structures commonly used in computer science to organize and manage data efficiently. Unlike linear structures like arrays or linked lists, which store data sequentially, BSTs arrange data in a hierarchical manner. Each node in a BST contains a key value and pointers to its left and right child nodes. The key property of a BST is that for any given node, all nodes in its left subtree have keys less than the node's key, and all nodes in its right subtree have keys greater than the node's key. This property enables quick searching, insertion, and deletion operations, as it allows the tree to be efficiently navigated based on the value of the keys. BSTs are particularly useful in applications where data needs to be stored in a sorted order. For example, in a phonebook application, BSTs can be used to store names alphabetically, allowing for fast lookups of phone numbers based on names. Similarly, in a file system, BSTs can be employed to store files in sorted order based on their names or sizes, facilitating efficient file retrieval operations. Traversal algorithms, such as in-order, pre-order, and post-order traversals, allow us to systematically visit each node in the BST. In an in-order traversal, nodes are visited in ascending order of their keys, making it useful for obtaining data in sorted order. Pre-order and post-order traversals visit nodes in a specific order relative to their parent nodes, which can be helpful for various operations such as creating a copy of the tree or evaluating mathematical expressions. Operations on Binary Search Trees Operations on Binary Search Trees (BSTs) involve fundamental actions such as insertion, deletion, and searching, each essential for managing and manipulating the data within the tree efficiently. Insertion When inserting a new node into a BST, the tree's hierarchical structure must be maintained to preserve the ordering property. Starting from the root node, the algorithm compares the new node's key with the keys of existing nodes, recursively traversing the tree until an appropriate position is found. Once the correct location is identified, the new node is inserted as a leaf node, ensuring that the BST's ordering property is maintained. Python class TreeNode: def __init__(self, key): self.key = key self.left = None self.right = None def insert(root, key): if root is None: return TreeNode(key) if key < root.key: root.left = insert(root.left, key) else: root.right = insert(root.right, key) return root In the insertion operation, we recursively traverse the BST starting from the root node. If the tree is empty (root is None), we create a new node with the given key. Otherwise, we compare the key with the current node's key and traverse left if the key is smaller or right if it's greater. We continue this process until we find an appropriate position to insert the new node. Deletion Removing a node from a BST requires careful consideration to preserve the tree's integrity. The deletion process varies depending on whether the node to be removed has zero, one, or two children. In cases where the node has no children or only one child, the deletion process involves adjusting pointers to remove the node from the tree. However, if the node has two children, a more intricate process is required to maintain the BST's ordering property. Typically, the node to be deleted is replaced by its successor (either the smallest node in its right subtree or the largest node in its left subtree), ensuring that the resulting tree remains a valid BST. Python def minValueNode(node): current = node while current.left is not None: current = current.left return current def deleteNode(root, key): if root is None: return root if key < root.key: root.left = deleteNode(root.left, key) elif key > root.key: root.right = deleteNode(root.right, key) else: if root.left is None: return root.right elif root.right is None: return root.left temp = minValueNode(root.right) root.key = temp.key root.right = deleteNode(root.right, temp.key) return root In the deletion operation, we recursively traverse the BST to find the node with the key to be deleted. Once found, we handle three cases: a node with no children, a node with one child, and a node with two children. For a node with no children or one child, we simply remove the node from the tree. For a node with two children, we find the in-order successor (smallest node in the right subtree), copy its key to the current node, and then recursively delete the in-order successor. Searching Searching for a specific key in a BST involves traversing the tree recursively based on the key values. Starting from the root node, the algorithm compares the target key with the keys of nodes encountered during traversal. If the target key matches the key of the current node, the search is successful. Otherwise, the algorithm continues searching in the appropriate subtree based on the comparison of key values until the target key is found or determined to be absent. Python def search(root, key): if root is None or root.key == key: return root if root.key < key: return search(root.right, key) return search(root.left, key) In the search operation, we recursively traverse the BST starting from the root node. If the current node is None or its key matches the target key, we return the current node. Otherwise, if the target key is greater than the current node's key, we search in the right subtree; otherwise, we search in the left subtree. Traversal Techniques Traversal techniques in Binary Search Trees (BSTs) are methods used to visit and process all nodes in the tree in a specific order. There are three main traversal techniques: in-order traversal, pre-order traversal, and post-order traversal. In-Order Traversal In in-order traversal, nodes are visited in ascending order of their keys. The process begins at the leftmost node (the node with the smallest key), then visits the parent node, and finally the right child node. In a BST, an in-order traversal will visit nodes in sorted order. Python def inorder_traversal(root): if root: inorder_traversal(root.left) print(root.key) inorder_traversal(root.right) Pre-Order Traversal In pre-order traversal, nodes are visited starting from the root node, followed by its left subtree, and then its right subtree. This traversal method is useful for creating a copy of the tree or prefix expressions. Python def preorder_traversal(root): if root: print(root.key) preorder_traversal(root.left) preorder_traversal(root.right) Post-Order Traversal In post-order traversal, nodes are visited starting from the left subtree, then the right subtree, and finally the root node. This traversal method is useful for deleting nodes from the tree or evaluating postfix expressions. Python def postorder_traversal(root): if root: postorder_traversal(root.left) postorder_traversal(root.right) print(root.key) Practical Applications of Binary Search Trees BSTs find applications in various real-world scenarios, including: Binary search in sorted arrays for efficient search operations. Symbol tables for fast retrieval of key-value pairs. Expression trees for evaluating mathematical expressions. Best Practices and Considerations Efficient implementation and usage of BSTs require attention to: Balancing the tree to ensure optimal performance, especially in scenarios with skewed data. Handling edge cases such as duplicate values or deleting nodes with two children. Avoiding common pitfalls like memory leaks or infinite loops in recursive functions. Conclusion Binary Search Trees (BSTs) are powerful data structures that play a fundamental role in computer science. Their hierarchical organization and key property make them efficient for storing, retrieving, and manipulating data in sorted order. By understanding the theory behind BSTs and their various operations, including insertion, deletion, searching, and traversal techniques, developers can leverage their capabilities to solve a wide range of computational problems effectively. Whether in database systems, compilers, or algorithm design, BSTs offer a versatile and elegant solution for managing data and optimizing performance. Embracing the versatility and efficiency of BSTs opens up a world of possibilities for innovation and problem-solving in the realm of computer science.
Healthcare has ushered in a transformative era dominated by artificial intelligence (AI) and machine learning (ML), which are now central to data analytics and operational utilities. The transformative power of AI and ML is unlocking unprecedented value by rapidly converting vast datasets into actionable insights. These insights not only enhance patient care and streamline treatment processes but also pave the way for groundbreaking medical discoveries. With the precision and efficiency brought by AI and ML, diagnoses and treatment strategies become significantly more accurate and effective, accelerating the pace of medical research and marking a fundamental shift in healthcare. Benefits of AI in Healthcare AI and ML will influence the healthcare industry’s entire ecosystem. From more accurate diagnostic procedures to personalized treatment recommendations and operational efficiency, everything can be sought with the help of AI and ML. AI technologies help healthcare providers take real-time data analytics, predictive analysis, and decision support capabilities towards the most proactive and highly personalized approach to patient care. For instance, AI algorithms will increase diagnostic accuracy through the study of images, while ML models will help analyze historical data to predict the outcomes of a patient, hence making the treatment approach used. Machine Learning in Health Data Analysis The revolution in health data lies at the door of machine learning, with powerful tools that identify patterns and predict future outcomes based on historical data. Prime importance falls on the algorithms that forecast disease progression, improve treatment methodologies, and streamline healthcare delivery. These findings will enable improved personalized medicine for better strategies for slowing disease progression and improving patient care. Most importantly, ML algorithms optimize healthcare operations through thorough data analysis of the trends, which may include patient admission levels and resource utilization in a streamlined hospital workflow to yield improved service delivery. Example: Patient Admission Rates With Random Forest Explanation Data loading: Load your data from a CSV file. Replace 'patient_data.csv' with the path to your actual data file. Feature selection: Only the features relevant to the hospital admissions targets, such as age, blood pressure, heart rate, and previous admissions, are selected. Data splitting: Split the data into training and testing sets to evaluate the model performance. Feature scaling should be used to rescale the features so that the model considers all features equally because logistic regression is sensitive to the features' scaling. Model training: Train a logistic regression model using the training data. Try making admission predictions using the model on the test set. Evaluation: The built model should be evaluated based on accuracy, confusion matrix, and a detailed classification report from the test set to validate model prediction for patient admission. Natural Language Processing in Health Data Analysis Natural Language Processing (NLP) is another critical feature that allows the extraction of useful information, including clinical notes, patient feedback, and medical journals. The NLP tools help analyze and interpret the overwhelming text data produced in health settings daily, thus easing access to appropriate information. This capability is precious for supporting clinical decisions and research, with fast insights from existing patient records and literature, improving the speed and accuracy of medical diagnostics and patient management. Example: Deep Learning Model for Disease Detection in Medical Imaging Explanation ImageDataGenerator: Automatically adjusts the image data during training for augmentation (such as rotation, width shift, and height shift), which helps the model generalize better from limited data. Flow_from_directory: Loads images directly from a directory structure, resizing them as necessary and applying the transformations specified in ImageDataGenerator. Model architecture: In sequence, the model uses several convolutional (Conv2D) and pooling layers (MaxPooling2D). Convolutional layers help the model learn the features in the images, and pooling layers reduce the dimensionality of each feature map. Dropout: This layer randomly sets a fraction of the input units to 0 at each update during training time, which helps to prevent overfitting. Flatten: Converts the pooled feature maps to a single column passed to the densely connected layers. Dense: Fully connected layers. These layers consist of fully connected neurons that take input from the features in the data. The final layer uses a sigmoid activation function to give binary classification output. Compilation and training: The model is compiled using a binary cross-entropy loss function, which is generally suitable for this classification task. Then, it's compiled and optimized with the given optimizer and finally trained using the .fit method on the train data received from the train_generator with validation using the validation_generator. Saving the model: Save the trained model for later use, whether for deployment in medical diagnostic applications or further refinement. Deep Learning in Health Data Analysis Deep learning is a complicated subject of machine learning used for analyzing high-complexity data structures using appropriate neural networks. The technology has been proven helpful in areas such as medical imaging, where deep learning models effectively detect and diagnose diseases from images with a level of precision that is sometimes higher than that exhibited by human experts. In genomics, deep learning aids in parsing and understanding genetic sequences, offering insight central to parsing for personalized medicine and treatment planning. Example: Deep Learning for Genomic Sequence Classification Explanation Data preparation: We simulate sequence data where each base of the DNA sequence (A, C, G, T) is represented as a one-hot encoded vector. This means each base is converted into a vector of four elements. The sequences and corresponding labels (binary classification) are randomly generated for demonstration. Model architecture and Conv1D layers: These convolutional layers are specifically useful for sequence data (like time series or genetic sequences). They process data in a way that respects its temporal or sequential nature. MaxPooling1D layers: These layers reduce the spatial size of the representation, decreasing the number of parameters and computation in the network, and hence, help to prevent overfitting. Flatten layer: This layer flattens the output from the convolutional and pooling layers to be used as input to the densely connected layers. Dense layers: These are fully connected layers. Dropout between these layers reduces overfitting by preventing complex co-adaptations on training data. Compilation and training: The model is compiled with the 'adam' optimizer and 'categorical_crossentropy' loss function, typical for multi-class classification tasks. It is trained using the .fit method, and performance is validated on a separate test set. Evaluation: After training, the model's performance is evaluated on the test set to see how well it can generalize to new, unseen data. AI Applications in Diagnostics and Treatment Planning AI has dramatically improved the speed and accuracy of diagnosing diseases by using medical images, genetic indicators, and patient histories for the most minor signs of the disease. Secondly, AI algorithms help develop personalized treatment regimens by filtering through enormous amounts of treatment data and patient responses to provide tailored care, optimizing therapeutic effectiveness while minimizing side effects. Challenges and Ethical Considerations in AI and Health Data Analysis However, integrating AI and ML in healthcare at its face value also brings ethical considerations. Nevertheless, the areas of concern to be adjusted are data privacy, algorithmic bias, and transparent decision-making processes, pointing to the essential landmarks of these adjustments for proper, responsible use of AI in healthcare. It is necessary to ensure patient data’s safety and protection, and any installation should guarantee freedom from any biases and not lose trust and fairness in service deployment. Conclusion The future of health is quite promising, with the development of AI and ML technologies that provide new sophistication in the spectrum of analytical tools, such as AR in surgical procedures and virtual health assistants powered by AI. These advances will make better diagnosis and treatment possible while ensuring smooth operations and ultimately contributing to more tailor-made and effective patient care. In the further development and continuous integration of AI/ML technologies, healthcare delivery will change through more efficient, accurate, and central patient service provision. This means that several regulatory constraints need to be considered in addition to the business and technical challenges discussed.
Recently, while working on a project at work, we came to this architectural choice of whether to use API Gateway as the interface of a backend service, back the service behind a load balancer, or maybe have the API Gateway route the requests to the load balancer fronting the service. While debating about these architectural choices with my peers, I realized this is a problem many software development engineers would face while designing solutions in their domain. This article will hopefully simplify these concepts and help choose the one that works the best based on individual use cases. Callout Please understand the requirements or the problem you are working on first as the choice you make will be highly dependent on the use cases or requirements you have. Application Programming Interface (API) Let's understand what an API is first. An API is how two actors of software systems (software components or users) communicate with each other. This communication happens through a defined set of interfaces and protocols e.g., the weather bureau’s software system contains daily weather data. The weather app on your phone “talks” to this system via APIs and shows you daily weather updates on your phone. API Gateway An API Gateway is a component of the app-delivery infrastructure that sits between clients and services and provides centralized handling of API communication between them. In very simplistic terms API Gateway is the gateway to the API. It is the channel that helps users of the APIs communicate with the APIs while abstracting out complex details of the APIs e.g. services they are in, access control (authentication and authorization), security (preventing DDoS attacks), etc. Imagine it being the switchboard operator of the manual telephone exchange, whom users can call and ask to be connected with a specific number (analogous to the software component here). Image Credits Let's discuss the pros and cons of API Gateway a little. Pros Access control: Providers support Authenticating and Authorizing the clients before requests reach the Backend systems. Security: Providers security/potential mitigations from DDoS (Distributed Denial of Service) attacks from the get-go. Abstraction: Abstracts out internal hosting details of the Backend APIs and provides clean routing to Backend services based on multiple techniques — path-based routing, Query String params-based routing, etc. Monitoring and analytics: API Gateway could provide additional support for API-level monitoring and analytics to help scale infrastructure gracefully. Cons Additional layer between users and services: API Gateway adds another layer between users and Services, thus adding additional complexity to the orchestration of requests. Performance impact: Since an additional layer is added in the service architecture, this could lead to potential performance impact, as the requests now have to pass through one more layer before reaching backend services. Load Balancing and Load Balancer Load balancing is the technique of distributing load between multiple backend servers, based on their capacity and actual requests pattern. Today's applications can have requests incoming at a higher rate (read hundreds or thousands of requests/second), asking backend services to perform actions (e.g. data processing, data fetching, etc.). This requires services to be hosted on multiple servers at once. This thus means we need a layer sitting on top of these backend servers (load balancer), which could route incoming requests to these servers, based on what they can handle "efficiently" while keeping customer experience and service performance intact. The load balancers also ensure that no one server is overworked, as that could lead to requests failing or getting higher latencies. On a very high level, the load balancer does the following: Routes incoming requests to backend servers to "efficiently" distribute the load on the servers. Maintain the performance of the service, by ensuring no one server is overworked. Lets the service efficiently and independently scale up or down and routes the requests to the active hosts (The load balancer figures out the number of active hosts by performing a technique named Heartbeat). Image Credits Pros Performance: Load balancers help maintain the service performance by ensuring that the request load is distributed across the servers. Availability: Load balancers help maintain the availability of the service as with them, now there could be multiple servers hosting the same service. Support scalability: Helps service scale up or down cleanly (horizontally) by letting new servers be added or removed anytime needed. Cons Potential single point of failure: Since all requests have to flow through a Load Balancer, a load balancer can become a single point of failure for the whole service, if not configured properly by adding enough redundancy. Additional overhead: Load balancers use multiple algorithms to route the requests to the backend services e.g. Round robin, Least connections, Adaptive, etc. This means a request has to flow through this additional overhead in the load balancer to figure out which server the request should be forwarded to. This could thus add additional performance overload as well on the requests. When To Use What Let's come to the crux of the article now. When do we use a load balancer for a service and when do we use an API Gateway for it? When To Use API Gateway Going with the Pros of API Gateway mentioned above, the following are the cases when API Gateway is best suited: When we need central access control (authentication and authorization) before the backend services/APIs When we need central security mechanisms for issues like DDoS. When we are exposing APIs to external customers (read Internet) we don't want to share internal details of the services or infrastructure to the Internet. When we need out-of-the-box monitoring and analytics on the APIs and need insights on how to scale the backend services. When To Use Load Balancer When the service can get a high number of requests/second, which one server can't handle the service would be hosted on more than one server. When the service has defined availability and SLAs and needs to adhere to that. When the service needs to be able to scale up or down as required.
I've always liked GUI, both desktop-based and browser-based before you needed five years of training on the latter. That's the reason I loved, and still love Vaadin: you can develop web UIs without writing a single line of HTML, JavaScript, and CSS. I'm still interested in the subject; a couple of years ago, I analyzed the state of JVM desktop frameworks. I also like the Rust programming language a lot. Tauri is a Rust-based framework for building desktop applications. Here's my view. Overview Build an optimized, secure, and frontend-independent application for multi-platform deployment. — Tauri website A Tauri app is composed of two modules: the client-side module in standard Web technologies (HTML, Javascript, and CSS) and the backend module in Rust. Tauri runs the UI in a dedicated Chrome browser instance. Users interact with the UI as usual. Tauri offers a binding between the client-side JavaScript and the backend Rust via a specific JS module, i.e, window.__TAURI__.tauri. It also offers other modules for interacting with the local system, such as the filesystem, OS, clipboard, window management, etc. Binding is based on strings. Here's the client-side code: JavaScript const { invoke } = window.__TAURI__.tauri; let greetInputEl; let greetMsgEl; greetMsgEl.textContent = await invoke("greet", { name: greetInputEl.value }); //1 Invoke the Tauri command named greet Here's the corresponding Rust code: Rust #[tauri::command] //1 fn greet(name: &str) -> String { //1 format!("Hello, {}! You've been greeted from Rust!", name) } 2. Define a Tauri command named greet In the following sections, I'll list Tauri's good, meh, and bad points. Remember that it's my subjective opinion based on my previous experiences. The Good Getting Started Fortunately, it is becoming increasingly rare, but some technologies need to remember that before you're an expert, you're a newbie. The first section of any site should be a quick explanation of the technology, and the second a getting started. Tauri succeeds in this; I got my first Tauri app in a matter of minutes by following the Quick start guide. Documentation Tauri's documentation is comprehensive, extensive (as far as my musings browsed them), and well-structured. Great Feedback Loop I've experienced exciting technologies where the feedback loop, the time it takes to see the results of a change, makes the technology unusable. GWT, I'm looking at you. Short feedback loops contribute to a great developer experience. In this regard, Tauri scores points. One can launch the app with a simple cargo tauri dev command. If the front end changes, Tauri reloads it. If any metadata changes, e.g., anything stored in tauri.conf.json, Tauri restarts the app. The only downside is that both behaviors lose the UI state. Complete Lifecycle Management Tauri doesn't only help you develop your app, it also provides the tools to debug, test, build, and distribute it. The Meh At first, I wanted to create my usual showcase for desktop applications, a file renamer app. However, I soon hit an issue when I wanted to select a directory using the file browser button. First, Tauri doesn't allow to use the regular JavaScript file-related API; Instead, it provides a more limited API. Worse, you need to explicitly configure which file system paths are available at build time, and they are part of an enumeration. I understand that security is a growing concern in modern software. Yet, I fail to understand this limitation on a desktop app, where every other app can access any directory. The Bad However, Tauri's biggest problem is its design, more precisely, its separation between the front and the back end. What I love in Vaadin is its management of all things frontend, leaving you to learn the framework only. It allows your backend developers to build web apps without dealing with HTML, CSS, and JavaScript. Tauri, though a desktop framework, made precisely the opposite choice. Your developers will need to know frontend technologies. Worse, the separation reproduces the request-response model created using browser technologies to create UIs. Reminder: early desktop apps use the Observer model, which better fits user interactions. We designed apps around the request-response model only after we ported them on the web. Using this model in a desktop app is a regression, in my opinion. Conclusion Tauri has many things to like, mainly everything that revolves around the developer experience. If you or your organization uses and likes web technologies, try Tauri. However, it's a no-go for me: to create a simple desktop app, I don't want to learn how to center a div or about the flexbox layout, etc. To Go Further Tauri
May 22, 2024 by
May 20, 2024 by CORE
May 22, 2024 by
Useful Tips and Tricks for Data Scientists
May 22, 2024 by
Explainable AI: Making the Black Box Transparent
May 16, 2023 by CORE
May 22, 2024 by
Hexagonal Architecture in the Frontend: A Real Case
May 22, 2024 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
Cypress vs Playwright: A Comparative Analysis
May 22, 2024 by
How To Create a Network Graph Using JavaScript
May 22, 2024 by
May 22, 2024 by
Cypress vs Playwright: A Comparative Analysis
May 22, 2024 by
Low Code vs. Traditional Development: A Comprehensive Comparison
May 16, 2023 by
How To Create a Network Graph Using JavaScript
May 22, 2024 by
Securing Generative AI Applications
May 22, 2024 by
Five IntelliJ Idea Plugins That Will Change the Way You Code
May 15, 2023 by