Defining ICT Infrastructure for Big Data Success

  • With the continued growth in big data opportunities, organisations that strategically invest in and optimise their IT and network infrastructure will be the ones primed to harness the transformative potential. Here, we speak with 3 experts from eir evo on how organisations grappling with massive data volumes can ensure effective data storage, accessibility, scalability and security.

    Jim Montgomery, Digital Transformation Business Development Lead

    Q. With the growing emphasis on real-time data analytics in big data, are there any data networking best practices or technologies that should be prioritised?

    The potential value of big data is immense but it does depend on being able to analyse and derive insight from it in real-time. The network is the foundation for delivering the required performance and there are 4 critical components to this: latency, bandwidth, redundancy and monitoring.

    Low latency across the network is paramount to faster transaction times, improved application performance and freeing up processing resources. When dealing with real-time data, a high latency or delays could mean the difference between success and failure. 

    In terms of bandwidth, speeds of 100G and up to 400G are commonly deployed within and between data centres and technologies such as MPLS, DWDM and Point to Point Fibre provide the required high bandwidth wide-area-network connectivity.

    Building network redundancy into every layer of the design means that, even in the event of failure, data can keep flowing and when one device fails for instance, another can automatically take over. Lastly, employing monitoring tools allows for quick issue identification and network automation allows the network to adapt to congestion, delays and failures dynamically reducing the time to resolution. 

    Q. Data security is a significant concern. Could you highlight specific networking technologies and protocols that are essential for securing data in transit?

    To enhance data security in transit, organisations must ensure three fundamentals are adequately incorporated into the network. Firstly, end-to-end encryption of the data by implementing protocols like IPsec and SSL/TLS. Secondly, using firewalls and intrusion detection/prevention systems (IDS/IPS) to monitor and filter network traffic. Thirdly, ensuring strict access controls to protect sensitive information in transit.

    In big data environments, these security fundamentals are integrated by implementing them at various levels of the data pipeline. For example, encryption may be applied to data at rest and in transit between Hadoop clusters using SSL/TLS. Firewalls and IDS/IPS systems are set up to monitor traffic between data nodes and external sources. Access controls are applied not only to data storage but also to data processing frameworks. Failures in implementing these can result in data breaches, compliance issues, and reputational damage so a comprehensive and well-managed security strategy is essential for organisations dealing with big data.

    Q. Data integration is a key challenge in big data projects. Can you share insights on how data network design can facilitate data movement between diverse sources?

    A well-structured network infrastructure facilitates data integration by efficiently and reliably connecting data sources, often between geographically dispersed data sources, for seamless data transfer. It is characterised by high bandwidth, low latency and critical security measures like encryption. However, a well-structured data network is also built with inherent scalability to adapt to growing data volumes and additional data sources without degrading performance.

    To facilitate data movement between diverse sources, properly configured data routers and switches enable intelligent data routing based on predefined rules. This allows organisations to direct data to the appropriate processing or storage systems for integration, enhancing efficiency and reducing the risk of data loss or duplication. With QoS mechanisms, organisations can prioritise data packets based on their importance and the criticality of the data integration task, meaning those with high priority receive the necessary network resources for optimal performance.

    All of these are fundamental characteristics for building robust data pipelines, that will allow for the automation of data ingestion, transformation and loading processes.

    John Doyle, Director of Managed Services and Cloud

    Q. Can you explain how public and private cloud services complement each other within an organisation's IT infrastructure strategy and how they can be optimised in big data initiatives?

    The choice between on-premises, public cloud, or hybrid deployment for big data solutions depends on several factors. On-premises offers full control, data sovereignty and predictable costs but limited scalability and high initial outlay for hardware, software, and infrastructure setup. On the other hand, public cloud offers scalability, cost efficiency and the availability of managed services and is suitable for variable workloads and for organisations looking to offload infrastructure management.

    A hybrid approach is ideal for achieving the balance between on-premises control and cloud scalability. Hybrid solutions are beneficial when dealing with legacy systems, sensitive data, or specific compliance requirements and provide for additional redundancy and critical data backup. Ultimately, the best choice often involves a careful evaluation of your organisation's specific needs, budget constraints, and long-term IT and big data strategy.

    Q. Data backup and disaster recovery are paramount for business continuity. What best practices should organisations with diverse data environments consider when leveraging cloud services in big data scenarios?

    Organisations can establish reliable and secure backup and disaster recovery strategies, either within a public cloud or by incorporating private cloud options such as eir evo’s Digital Planet for added redundancy and control. These strategies should incorporate documented policies around backup frequency and retention, aligning with compliance requirements; automated backups to ensure consistency and reduce human error; regular IT staff training on back-up procedures; and data redundancy facilitated by the cloud provider's replication features, including across regions.

    Remember backing up within a cloud to a different region within the same cloud is a good option but it carries some risk as the data is sitting within the same cloud. There are options on private cloud or alternate hybrid cloud backups that give an extra layer of resiliency.

    When it comes to costs, you can optimise these by choosing suitable storage tiers and efficient data transfer methods and by periodically reviewing and removing unnecessary backups.

    Q. How can organisations monitor performance and ensure optimal resource allocation and scalability for their data processing needs in the cloud?

    Monitoring with alerts on resource usage anomalies can help identify areas where resources are over-provisioned. Implementation of resource tagging allows organisations to track costs by department or project and educating teams can help with efficient resource usage.

    Automation can be utilised to scale resources up or down based on demand, keeping costs low by ensuring resources are active only when needed. To further save costs, consider spot instances (cloud) or preemptible VMs for non-critical workloads - such as testing, rapid batch processes, and fault tolerant systems - and using optimised storage tiers (Cool, Archive) and serverless options.

    To help organisations confidently manage their budgets, we offer expert guidance on cost-effective licensing strategies, including considerations for volume purchasing, optimising software bundles and potential savings during license migration to the cloud.  However, while cost optimisation is essential, we prioritise maintaining high performance standards to ensure clients get the most value from their cloud investments.

    Joe Carlyle, Microsoft Practice Director

    Q. As Microsoft Ireland Azure Infrastructure Partner of the Year, can you tell us how organisations can leverage Azure's cloud-native services to enhance their IT infrastructure for big data initiatives, and what are the key advantages of doing so?

    Adoption of Azure brings with it two key benefits – scale and security. Leveraging cloud native solutions, like Azure Synapse, allow you to not only get started more efficiently but scale to the capacity needed in a secure and governed manner. Leaving you with a modern footprint for data, managed by Azure - allowing your teams to focus on productivity, and leveraging the power of a modern data estate.

    Q. Scalability is crucial in the context of growing data volumes. Can you explain how Azure's auto-scaling capabilities can be integrated into an organisation's IT infrastructure strategy to efficiently handle increasing demands for big data processing?

    A move to cloud, more specifically Azure, brings with it a shift in thinking. Gone are the concerns around your fixed capacity SAN, or performance limitations of your hardware from a few years ago. The focus now shifts to requirements versus cost. Azure has near-infinite capability to offer the elite performance and storage that you may need, and this is often enticing for developers, data engineers etc.

    However, just because it can now complete an action in 10 seconds, rather than 120, doesn’t mean the business needs that. And often more importantly, can the business afford it? Integrating this shift in thinking, and frontloading all design and planning efforts ensures this will not become an issue, simply part of the process.

    Q. Data governance and compliance are vital. How can Azure's compliance and security features be effectively incorporated into the IT infrastructure to ensure data governance, protection, and regulatory compliance throughout the big data analytics lifecycle?

    Adoption of Azure not only brings with it cloud-native data solutions, it also brings native integration to massively scaled security solutions such as Microsoft Defender for Cloud, and core governance services such as Azure Policy. Leveraging the core controls of Azure, which includes both of these services, via a correctly planned and designed Azure Landing Zone is vital to ensure successful security and governance. Combining this all together with Azure Monitor allows you to report on your compliance in real-time, and remediate gaps with minimal effort if they appear, thereby ensuring your required standards are reported on and adhered to.

    eir evo designs, delivers and manages end-to-end digital infrastructure from edge to cloud for businesses of all sizes and as tier-1 partner with many leading technology brands including Microsoft, HPE, Cisco and Fortinet, is committed to helping organisations across Ireland, North and South, to get the most of their technology investments.

    This article appears in the Big Data edition of Sync NI magazine. To receive a free copy click here.

Share this story