$ whoami

Bernardo Gallegos Vallejo

Network Operations & Data Center Infrastructure Engineer

7+ years of experience supporting enterprise infrastructure. From AWS data centers to AI/ML workloads at Cirrascale, I bridge hardware and software to create reliable, scalable solutions.

GPU Cluster Management

Infrastructure Automation

5000+ Servers Managed

From Hardware to Code, Building the Future

My journey began in Amazon's warehouses, supporting network infrastructure that keeps operations running for thousands of users. I learned that behind every seamless experience lies carefully orchestrated hardware and the people who understand both its capabilities and limitations.

Moving to AWS data centers opened my eyes to enterprise-scale infrastructure. Managing 20,000+ devices taught me that reliability isn't just about robust hardware—it's about predictive monitoring, automated responses, and building systems that recover gracefully from failures. Contributing to 5MW data center expansions showed me how physical planning and logical architecture must work in harmony.

Now at Cirrascale Cloud Services, I'm at the intersection of traditional data center operations and cutting-edge AI infrastructure. Developing automation scripts for 5,000+ servers and supporting NVIDIA GPU deployments means every line of code I write directly impacts the compute power driving tomorrow's breakthroughs.

What drives me is turning complex infrastructure challenges into elegant, automated solutions. Whether it's creating monitoring systems that prevent failures before they happen, or building the network backbone that connects AI clusters, I find purpose in making the invisible foundation of technology both reliable and scalable.

"I believe the most impactful infrastructure engineers understand both the physics of hardware and the power of automation. In our AI-driven future, the data centers and networks we build today are the foundation for innovations we can barely imagine tomorrow."

Ready to build the future of infrastructure together?

Technical Skills

A comprehensive toolkit built through years of hands-on experience managing enterprise infrastructure and GPU clusters.

Data Center Technologies

Dell PowerEdge & HPE ProLiant ServersNVIDIA GPU Installation & ConfigurationCisco Catalyst/Nexus & Arista SwitchesLiquid Cooling Systems & Hot/Cold AisleFiber Optic Testing & Structured CablingRack Layout Design & Power PlanningPXE Boot & Imaging Systems

Network & Systems

TCP/IP, VLANs, OSPF, Spanning TreeDHCP, DNS, ACL ConfigurationLinux Command Line & AdministrationPython & Bash ScriptingSNMP & IPMI MonitoringInfiniBand & High-Speed InterconnectsZero Touch Provisioning (ZTP)

Infrastructure Management

ServiceNow & Ticketing SystemsNetwork Diagramming & DocumentationSecurity Best PracticesProject Coordination & Vendor ManagementAsset Management SystemsPerformance Analysis & OptimizationIncident Response & Troubleshooting

Technical Philosophy

Infrastructure as Code

Everything should be version-controlled, reproducible, and automated.

Monitoring First

You can't improve what you can't measure. Observability is crucial.

Hardware + Software

Understanding both physical infrastructure and intelligent automation.

Experience

A track record of delivering reliable infrastructure solutions and leading automation initiatives in enterprise environments.

02/2025

Network Operations Engineer

Cirrascale Cloud Services

02/2025 - Present

Austin, TX

Key Achievements:

▸Developed Python and Bash scripts for BMC monitoring across 5,000+ servers, automating health checks and predictive failure analysis
▸Configured and maintained Cisco and Arista switches for computing clusters, implementing VLAN segmentation and routing protocols
▸Assisted with NVIDIA GPU deployments for AI/ML workloads, including driver updates and thermal monitoring
▸Built monitoring solutions integrating IPMI, SNMP, and custom APIs to track power and performance metrics
▸Created comprehensive documentation including runbooks, network diagrams, and SOPs in ServiceNow

Technologies:

PythonBashCiscoAristaNVIDIA GPUsIPMISNMPServiceNowBMCVLAN

10/2019

Data Center Engineer

Amazon Web Services (AWS)

10/2019 - 02/2025

New Albany, OH

Key Achievements:

▸Participated in new data center construction including rack layout planning, power distribution design, and network topology for 5MW expansion
▸Supported data center operations for 20,000+ devices including server deployment, network infrastructure, and power/cooling systems
▸Assisted with AI/ML compute cluster design and deployment including rack placement, InfiniBand topology, and GPU server configurations
▸Maintained PXE boot infrastructure for automated OS deployment and implemented ZTP for network equipment
▸Designed and installed fiber optic and copper cabling for new builds including pathway planning and cable tray installation
▸Maintained 99.9% uptime through proactive monitoring and rapid incident response

Technologies:

Dell PowerEdgeHPE ProLiantCisco Catalyst/NexusAristaNVIDIA GPUsInfiniBandPXEZTPFiber Optics

06/2017

IT Technician

Amazon

06/2017 - 10/2019

Etna, OH

Key Achievements:

▸Provided first-level support for campus network infrastructure serving 3,000+ users
▸Tracked and maintained inventory of network equipment using asset management systems
▸Identified network bottlenecks and implemented fixes to improve warehouse system response times
▸Coordinated with vendors for RMA processing and equipment orders

Technologies:

Network InfrastructureAsset ManagementVendor RelationsPerformance AnalysisCampus Networking

Years Experience

99.9%

Uptime Achieved

5000+

Servers Managed

20+

GPU Clusters

Projects

Infrastructure automation and optimization projects that demonstrate practical solutions to real-world challenges.

BMC Monitoring Automation for 5,000+ Servers

Python and Bash automation suite for comprehensive server health monitoring and predictive failure analysis

Challenge:

Manual server health checks across thousands of servers were time-consuming and reactive, leading to unexpected failures and extended downtime that impacted critical AI/ML workloads.

Solution:

Developed comprehensive Python scripts integrating IPMI, SNMP, and custom APIs to automate BMC monitoring. Created predictive algorithms analyzing temperature trends, power consumption, and hardware sensor data to identify potential failures before they occur.

Impact:

Automated health checks across 5,000+ servers, reduced manual monitoring time by 90%, and implemented predictive maintenance that prevented dozens of critical failures. System now provides 24/7 autonomous monitoring with intelligent alerting.

5,000+ servers monitored90% reduction in manual checks24/7 autonomous monitoringPredictive failure prevention

Technologies:

PythonBashIPMISNMPBMCCustom APIsServiceNow Integration

AI Infrastructure Network Architecture

Network design and implementation for high-performance AI/ML compute clusters

Challenge:

Supporting NVIDIA GPU deployments required specialized network configurations with VLAN segmentation, optimized routing protocols, and thermal monitoring to ensure maximum performance for distributed AI workloads.

Solution:

Configured Cisco and Arista switches for computing clusters, implementing strategic VLAN segmentation and routing protocols. Designed network topology optimized for GPU-to-GPU communication and integrated thermal monitoring systems for cluster health management.

Impact:

Successfully deployed network infrastructure supporting multiple AI/ML clusters, achieved optimal GPU communication performance, and established monitoring frameworks that ensure consistent thermal management and network reliability.

Multiple GPU clusters supportedOptimized AI workload performanceZero network bottlenecksComprehensive thermal monitoring

Technologies:

Cisco Catalyst/NexusArista SwitchesVLAN ConfigurationNVIDIA GPUsThermal MonitoringNetwork Protocols

5MW Data Center Expansion Project

Comprehensive infrastructure planning and implementation for enterprise-scale data center expansion

Challenge:

Contributing to a 5MW data center expansion required coordinating rack layout planning, power distribution design, network topology implementation, and fiber optic cabling while maintaining zero safety incidents and meeting aggressive deadlines.

Solution:

Participated in full greenfield data center buildout including rack placement optimization, power distribution planning, InfiniBand network topology design, and structured cabling installation. Coordinated with construction teams and vendors to ensure on-time delivery.

Impact:

Contributed to successful on-time delivery of 5MW expansion with zero safety incidents, supporting deployment of AI/ML compute infrastructure and establishing scalable foundation for future growth.

5MW capacity expansionZero safety incidentsOn-time project deliveryScalable infrastructure foundation

Technologies:

Rack Layout DesignPower DistributionInfiniBand TopologyFiber Optic CablingStructured CablingCDU Installation

PXE Boot Infrastructure & Zero Touch Provisioning

Automated deployment system for rapid server provisioning and network equipment configuration

Challenge:

Manual server provisioning and network device configuration was creating deployment bottlenecks, taking hours per device and increasing the risk of configuration errors in large-scale deployments.

Solution:

Maintained and optimized PXE boot infrastructure for automated OS deployment and implemented Zero Touch Provisioning (ZTP) for network equipment using vendor tools and Python scripts. Created standardized deployment workflows integrated with asset management systems.

Impact:

Dramatically reduced deployment time from hours to minutes per device, eliminated configuration errors through automation, and established scalable provisioning processes supporting rapid infrastructure scaling.

Hours to minutes deployment timeZero configuration errorsAutomated provisioningRapid infrastructure scaling

Technologies:

PXE BootZTPPython ScriptingVendor ToolsAsset ManagementAutomated Workflows

Campus Network Performance Optimization

Network troubleshooting and optimization tools for enterprise campus infrastructure

Challenge:

Network bottlenecks were impacting warehouse system performance for 3,000+ users, causing delays in operations and reduced productivity across multiple facilities.

Solution:

Developed systematic approach to identify network bottlenecks using performance analysis tools and implemented targeted fixes to improve system response times. Created monitoring frameworks to prevent future performance degradation.

Impact:

Significantly improved warehouse system response times, eliminated network bottlenecks affecting thousands of users, and established proactive monitoring to maintain optimal performance standards.

3,000+ users supportedEliminated network bottlenecksImproved system response timesProactive performance monitoring

Technologies:

Network Performance AnalysisCampus NetworkingPerformance MonitoringBottleneck IdentificationSystem Optimization

Documentation & Knowledge Management System

Comprehensive documentation platform for infrastructure operations and maintenance procedures

Challenge:

Lack of standardized documentation was leading to inconsistent procedures, extended troubleshooting times, and knowledge gaps when team members were unavailable, impacting overall operational efficiency.

Solution:

Created comprehensive runbooks, network diagrams, and standard operating procedures in ServiceNow. Developed training materials and knowledge base that reduced new technician onboarding time and standardized operational procedures across teams.

Impact:

Reduced new technician onboarding time by 50%, standardized operational procedures, and created sustainable knowledge management system that improved overall team efficiency and reduced troubleshooting time.

50% faster onboardingStandardized proceduresComprehensive knowledge baseImproved team efficiency

Technologies:

ServiceNowDocumentation SystemsNetwork DiagrammingProcess StandardizationKnowledge Management

Interested in Collaborating?

I'm always excited to work on new infrastructure challenges and automation projects. Let's discuss how we can build something amazing together.

Ready to Build Reliable Infrastructure?

Whether you need expertise in data center operations, network infrastructure, or automation solutions for enterprise-scale environments, let's connect and build something solid together.

Send a Message

Tell me about your infrastructure challenges

Direct Contact

Availability

Available for new projects

Open to consulting opportunities

Location: Austin, TX 78728

Specializing in:

• GPU cluster management & optimization
• Infrastructure automation & IaC
• Predictive maintenance systems
• Data center operations
• High-performance computing