Daniel pacimrez Rubio,西班牙瓜达拉哈拉的开发者
Daniel is available for hire
Hire Daniel

Daniel Pérez Rubio

Verified Expert  in Engineering

Data Scientist and Developer

Location
Guadalajara, Spain
Toptal Member Since
November 29, 2021

Daniel是一位经验丰富的数据科学家,拥有信号理论(电信)硕士学位。. 他有8年的专业经验:从有影响力的种子期创业公司,如Ketekelo(首席技术官), 到巴斯夫(BASF)等跨国公司(资深数据科学家), two years). Daniel strives on challenges, 因此,他决定成为一名自由数据科学家,帮助Toptal客户在开发机器学习方面取得卓越成就, deep learning, NLP, and big data solutions.

Portfolio

麻省理工学院的非公开NLP创业公司(总参与)
Python, Jupyter Notebook,主题建模...
Daimler
Python, Databricks, PySpark, Spark SQL, Scikit-learn, Pandas, NumPy, SciPy...
BASF
Python, Pandas, Scikit-learn, NumPy, SciPy, Docker, MongoDB, SpaCy...

Experience

Availability

Part-time

Preferred Environment

Windows, Windows Subsystem for Linux (WSL), Visual Studio Code (VS Code), Docker

The most amazing...

...我开发的产品是一个内部服务台票务优先排序模型, 是什么帮助减少了60%的员工之间的冲突.

Work Experience

Data Scientist

2021 - 2022
麻省理工学院的非公开NLP创业公司(总参与)
  • 开发了一个基于Shapely值的模型可解释性的生产管道,用于分析公司中语言和文化趋势之间的复杂依赖关系.
  • 设计了一个强大的可复制设置,确保AutoML功能具有多重过拟合, dimensionality, 和信号/噪声控制过程,如SMOTE, hyperparameter tuning, SHAP-based feature selection, cross-validation, and seed control.
  • 重构和优化了两个现有的大数据管道, 提高稳定性,减少资源分配,成本降低75%.
  • 实现并优化了一个基于大型语言模型(BERT)的主题建模管道, 这有助于验证他们的自定义主题建模方法.
  • 为BERT和GPT2等大型语言模型架构实现了灵活的微调过程,并使用它来训练多个主题分类模型, 哪些是用来改进自定义主题建模管道的.
  • 为子句情感分类工具实现了基于大型语言模型的子句解析分类管道.
  • 设计并证明了一种基于自动标记技术的半监督学习概念,用于大型语言模型的迭代改进.
  • 实现了一个高效的运行时预测资源分配概念,以避免GPU和系统内存问题. 这是一个基于资源使用记录和多项式插值管道的过程,它有助于减少大多数内存分配错误.
  • 在9个月内对客户设计的不同功能特性进行了多次可行性分析, 根据客户的决定进行后续实施,并完成其产品路线图中的所有开放点.
  • Kept daily contact with the CTO and CEO, 为他们提供所有必要的见解和底层细节,以便他们能够指导产品开发, 总是提出建议和我的专家意见,但优先考虑他们的意愿.
Technologies: Python, Jupyter Notebook,主题建模, 生成预训练变压器(GPT), GPT, Natural Language Processing (NLP), Amazon Web Services (AWS), Amazon S3 (AWS S3), Amazon Elastic MapReduce (EMR), Amazon SageMaker, Shapely, Spark SQL, PySpark, SciPy, SpaCy, Scikit-learn, PyTorch, BERT, Docker, Language Models, Deep Learning, StatsModels, Matplotlib, Pandas, NumPy, Clustering, Unsupervised Learning, ETL, HyperOpt, Data Analysis, Dashboards, Data Visualization, Product Leadership, Python 3, Predictive Modeling, Data Engineering

Senior Data Scientist

2021 - 2021
Daimler
  • 开发三大大数据售后时间序列预测产品:轮胎更换时间, the timing of brake disc replacement, and the timing of brake pad replacement.
  • kickstarter创建了一个实验库,允许多个数据科学家对同一产品进行实验, 因此,这些实验的结果可以进行比较, replicated, 并且很容易与商业伙伴沟通.
  • 促进了分支模型和CI/CD管道的改进,以消除人为错误和操作开销,并开启了在笔记本中开发软件包而不是脚本的可能性.
  • 为团队的数据湖创建了两个新数据源:30米分辨率的全球海拔(Aster 30)和地区名称本地化, including common countries, cities, provinces, and names written in over ten languages.
  • 参与组织2021年戴姆勒创新日, 为期两天的活动重点是创造新鲜的产品设计和熟悉最现代的技术.
Technologies: Python, Databricks, PySpark, Spark SQL, Scikit-learn, Pandas, NumPy, SciPy, Matplotlib, MLflow, Plotly, Azure Data Lake, Azure Data Factory, Azure DevOps, GitHub, Jira, Seaborn, GIS, Spark ML, Docker, Data Science, SQL, Predictive Maintenance, Data Visualization, Data Analysis, Python 3, Predictive Modeling, Data Engineering

Senior Data Scientist

2019 - 2021
BASF
  • 开发了两个成功的NLP产品:用于客户名称匹配的模糊逻辑专家系统和用于专利搜索引擎监控的主题建模仪表板.
  • 提出了三个通用性产品:推荐系统、健康库存管理, 用于域名商标欺诈检测的威胁级分类器, 以及为服务台的票务优先排序提供升级概率预测.
  • 为HR执行西班牙2020-2021年员工调查的主题和情绪分析报告,帮助他们处理数千个有价值的自由文本反馈字段.
  • 为机器学习进行了多次研讨会, Git, open-source software, and remote Docker environments.
  • 通过促进和共同组织本地和全球倡议来支持公司文化:10%的创新时间, cross-squad collaboration initiatives, and custom training plans.
  • Led, together with my colleagues, 在一家全球性公司中引入现代Python工作流, seamlessly using best code practices, CI/CD pipelines, containerization, and remote environments.
  • 通过多次技术面试支持招聘流程.
  • 在团队生命周期的一半以上时间里,承担了产品负责人的共享角色.
  • 在Scrum和看板敏捷框架下成功高效地工作, 两年内推出了五款成功的产品.
Technologies: Python, Pandas, Scikit-learn, NumPy, SciPy, Docker, MongoDB, SpaCy, Natural Language Toolkit (NLTK), Django, FastAPI, Apache Airflow, Databricks, PyTorch, Helm, Kubernetes, Multiprocessing, R, PySpark, Spark SQL, Microsoft SQL Server, SAP HANA SQL Script, Beautiful Soup, lxml, Plotly, Matplotlib, TextRank, GitLab CI/CD, Seaborn, Data Science, SQL, ETL, Data Analysis, Data Visualization, Python 3, Predictive Modeling, Data Engineering

Senior Data Scientist

2018 - 2019
Rebold
  • 执行并维护模型培训的每日CD管道, optimization, and deployment for an ad-buying agent.
  • 开发了一个完整的电子邮件活动受众丰富的分析解决方案.
  • 拥有三大大数据日运行产品:机器学习广告购买代理培训, 电子邮件活动的受众丰富分析, 以及基于cookie的受众分类.
  • 支持商业智能(BI)同事, 实现自定义Python脚本和SQL查询,以改进其流程并帮助其更有效地工作.
  • Developed, 在一个自由DevOps工程师的帮助下, 一个Python工具,用于创建和运行基于剧本的Ansible模板.
  • kickstarter启动了一个基于Flask的公民开发web平台.
Technologies: PySpark, Spark SQL, PostgreSQL, Apache Airflow, Amazon Web Services (AWS), Ansible, Git, Python, Scikit-learn, NumPy, Pandas, HyperOpt, Amazon Elastic MapReduce (EMR), Amazon S3 (AWS S3), Spark ML, Flask, Apache Superset, Google Data Studio, Continuous Delivery (CD), ETL, Data Science, SQL, Data Analysis, Dashboards, Data Visualization, Python 3, Predictive Modeling, Data Engineering

Data Scientist

2016 - 2018
Human Forecast
  • 作为公司唯一的技术人员独立工作.
  • 开发了几个基于机器学习的价值主张的PoC解决方案, 其中大部分可以在我的GitHub配置文件中找到.
  • 销售和开发了四个最终产品:一个用于市场研究的主题发现引擎, 一个实时的社会品牌形象观察站, an Edge AI handrail use advisor, 以及一个基于聊天机器人的智能合约解决方案,用于国际商务跟踪.
  • 与CEO一起制定售前策略.
  • 为空客等大公司做过几次产品演示, Navantia, Vall d'Hebron Hospital, and Cemex Ventures.
  • 在主题建模等不同领域工作过, human pose recognition, hyperspectral imaging, Edge AI, sentiment analysis, chatbots, smart contracts, data mining, dashboarding, and APIs.
Technologies: Python, Machine Learning, Google Cloud, Scikit-learn, Natural Language Toolkit (NLTK), OpenCV, Node.js, Git, TensorFlow, Pandas, NumPy, Raspberry Pi, Arduino, Solidity, Flask, Matplotlib, Bokeh, Plotly, D3.. js, Asyncio, superord, Apache HTTP Server, NGINX, Web3.js、SciPy、Beautiful Soup、Docker、Express.js, Tableau, GIS, C, Data Science, SQL, Dashboards, Data Analysis, Data Visualization, Product Leadership, Predictive Modeling

CTO

2014 - 2015
Ketekelo
  • 担任技术主管和全栈开发人员, 设置开发路线图并执行它, together with an intern student.
  • 实现了几个自定义WordPress/WooCommerce组件、API集成和一个抓取工具.
  • Pitched at multiple events. 被马德里当地政府评为最佳球场, 吸引了Kike Sarasola和Fundación等投资者的兴趣.
  • 获得IE商学院、兰萨德拉和马德里地方政府的加速项目.
Technologies: PHP, JavaScript, HTML, Linux Administration, WooCommerce, jQuery, Bootstrap, MySQL, Scraping, APIs, Ajax, SQL, Amazon Web Services (AWS), Product Leadership

市场研究主题发现引擎

http://github.com/danielperezr88/TOM
我创建的web服务由谷歌自定义搜索API之间的集成组成, a web scraper, and a topic modelling pipeline, 所有这些都通过交互式前端进行管理和消费. 该服务的特色是创建新的搜索词D3.每个搜索词和日期确定的不同主题的可视化, 以及不同聚合级别的导航(每天的主题重要性), Ngram importance per topic, article weight per topic, etc.). 取消了具有不同级别权限和访问文章的用户管理.

它允许用户定义对感兴趣字段的细粒度搜索, 跟踪每个领域发现的不同主题及其随时间的相关性, 如果在这个领域中出现了一个新的感兴趣的话题,你就会迅速发现.

物流Dapp:用于货运跟踪的智能合约聊天机器人应用程序

http://github.com/danielperezr88/logistics-dapp
我开发的一个应用程序,运行在传统的Coinbase Toshi(目前被Wallet取代)上。, 旨在支持和记录所有涉及国际货运的交易. 该应用程序的界面是完全对话的,并具有多方功能, role-based permissions, 以及整个过程的一步一步跟进.

The app is in the active MVP phase. It's been tested and proved useful, 但由于Coinbase的Dapp平台的变化以及与赞助商关系的中断,目前不支持.

扶手顾问:现场人体姿势跟踪相机,提高工人的安全

http://github.com/danielperezr88/idoonet-rpi-mvncs
我在一个边缘AI设备上工作,该设备装载了一个独立的人体姿势跟踪软件(一个来自FOSS姿势估计项目的修改叉),并由一个小的, low consumption computing unit, a camera attached, and optionally a warning lightbulb, a screen, and a sound system.

Once placed on a point with good visibility of a handrail-guarded area and configured with labels of the handrail positions and associated areas of use; it will track the correct use of the handrail by all workers on the area and show real-time feedback to those in a preferred way (sound, video, and lightbulb feedback).

Languages

Python, Python 3, C, c++, SQL, PHP, JavaScript, HTML, Solidity, R, Java

Libraries/APIs

Scikit-learn, Natural Language Toolkit (NLTK), Pandas, NumPy, Beautiful Soup, PySpark, PyTorch, Shapely, Matplotlib, Spark ML, jQuery, OpenCV, Node.js, TensorFlow, D3.js, Asyncio, Web3.js, SciPy, SpaCy

Tools

Spark SQL, Amazon Elastic MapReduce (EMR), Git, Supervisord, Apache HTTP Server, Apache Airflow, Named-entity Recognition (NER), GitLab CI/CD, GitHub, Jira, Seaborn, MATLAB, Amazon SageMaker, Plotly, NGINX, Tableau, GIS, Ansible, Helm, StatsModels

Paradigms

数据科学,持续交付(CD), ETL, Azure DevOps,敏捷,动态编程

Platforms

Visual Studio Code (VS Code), Jupyter Notebook, Windows, Docker, Unix, Raspberry Pi, Arduino, Amazon Web Services (AWS), Databricks, WooCommerce, Kubernetes

Other

Statistics, Natural Language Processing (NLP), Machine Learning, K-nearest Neighbors (KNN), TextRank, Data Analysis, Data Visualization, Predictive Modeling, GPT, 生成预训练变压器(GPT), Windows Subsystem for Linux (WSL), Numerical Methods, Programming, Embedded Systems, Optimization, Computer Vision, Signal Processing, Deep Learning, Linux Administration, Scraping, APIs, HyperOpt, Apache Superset, Google Data Studio, Support Vector Machines (SVM), Neural Networks, K-means Clustering, Bayesian Statistics, Information Retrieval, Transformers, Word Embedding, Linguistic Tagging, FastAPI, Multiprocessing, lxml, MLflow, Time Series Analysis, Recurrent Neural Networks (RNN), Sentiment Analysis, Business Planning, Chatbots, Topic Modeling, BERT, Dashboards, Product Leadership, Data Engineering, Electronics, Telematics, Evolutionary Computation, Ajax, Bokeh, Robotics, Motion Planning, Language Models, Azure Data Lake, Azure Data Factory, Encoder-Decoder Neural Architecture, Sequence Models, Fundamental Analysis, Quantitative Analysis, Portfolio Optimization, Risk Models, Attribution Modeling, Backtesting Trading Strategies, Negotiation, Tax Accounting, Business Modeling, Business Model Canvas, Partnerships, Google Custom Search, Clustering, Unsupervised Learning, Predictive Maintenance, Reinforcement Learning, Monte Carlo Simulations, Deep Reinforcement Learning, Temporal Difference Learning, Monte Carlo

Frameworks

Flask, Django, Bootstrap, Express.js, Jinja

Storage

MySQL, PostgreSQL, Amazon S3 (AWS S3), MongoDB, Google Cloud, Microsoft SQL Server, SAP HANA SQL Script

2020 - 2020

人工智能研究生课程

Stanford University - Stanford, CA

2006 - 2014

本科及硕士学历,电信专业

阿尔卡尔大学-阿尔卡尔埃纳雷斯,马德里,西班牙

OCTOBER 2022 - PRESENT

Stanford Reinforcement Learning

Stanford University | Online

DECEMBER 2020 - PRESENT

斯坦福自然语言处理与深度学习

Stanford University | Online

AUGUST 2020 - PRESENT

AI for Trading Nanodegree

Udacity

MAY 2015 - PRESENT

Startup Acceleration and Consolidation

IE Business School

DECEMBER 2011 - PRESENT

人工智能概论

Sebastian Thrun and Peter Norvig