Full-Stack & Multimodal AI Engineer (Text · Voice · Image · Video)

Copied!

Full-Stack & Multimodal AI Engineer (Text · Voice · Image · Video)

hourly Rate

15.00 USD

Chatbots
Description: Build a conversational AI to answer user queries or provide customer support.
• Technologies / Frameworks: Python, Rasa, Hugging Face Transformers, GPT API, Flask/Django (backend), React (frontend).
• Model / Approach: LLM (e.g., GPT-3/4, BERT), sequence-to-sequence models for dialogue.
• Risk: Misunderstanding user intent, generating incorrect responses.
• Solution: Use intent classification, fallback responses, and test with real user data.

2. Voice Agents
Description: Create a voice assistant that understands speech and executes commands.
• Technologies / Frameworks: Python, SpeechRecognition, PyTorch/TensorFlow, Google Cloud Speech-to-Text, gTTS for text-to-speech.
• Model / Approach: ASR (Automatic Speech Recognition) + NLP for intent detection.
• Risk: Noise, accent variation, misinterpretation.
• Solution: Use noise reduction, speech pre-processing, and training on diverse datasets.

3. Lip Sync
Description: Sync facial movements or lip movements with speech/audio input.
• Technologies / Frameworks: Python, OpenCV, Dlib, Mediapipe, PyTorch.
• Model / Approach: GANs or LSTM-based temporal models for facial animation.
• Risk: Uncanny or unrealistic facial movement.
• Solution: Pre-train on large datasets, fine-tune on specific faces, use smoothing filters.

4. Q&A Systems
Description: System that answers user questions from a document or knowledge base.
• Technologies / Frameworks: Python, Hugging Face Transformers (BERT, RoBERTa), FAISS, LangChain.
• Model / Approach: Embedding-based retrieval + LLM for generating answers.
• Risk: Answering incorrectly due to ambiguous queries or insufficient data.
• Solution: Use document embeddings, context windows, and confidence scoring.

5. Web-Browsing / Tool-Driven Assistants
Description: Automate web tasks or extract information, combined with AI reasoning.
• Technologies / Frameworks: Python, Selenium, Playwright, LangChain, OpenAI API.
• Model / Approach: LLM orchestrates tasks + web scraping/automation.
• Risk: Website structure changes, bot detection, or incorrect extraction.
• Solution: Handle exceptions, use APIs when available, and regularly update scripts.

6. Price Predictio (+0 days)

1.00 USD

What people loved about this seller

Description

I am AI Engineer and
Full-Stack Developer with deep expertise in building intelligent systems across
text, voice, image, and video modalities. Skilled in designing production-grade
AI applications—from chatbots and multimodal assistants to automation pipelines
and computer-vision systems. Strong foundation in LLMs, RAG, fine-tuning, and
scalable web/mobile development, with a proven ability to integrate AI into
real-world products.

Skills

AI/ML & NLP

LLMs, Chat AI, NLP, text
generation & summarization
RAG, vector databases,
dataset labeling
Fine-tuning, LangChain,
LangGraph
Sentiment analysis,
classification, NER, topic modeling

Multimodal AI

Text-to-Image (DALL·E,
Midjourney, SDXL)
Text-to-Video (Sora, Runway,
Pika Labs)
Image-to-Video (Runway,
Stable Video Diffusion)
Speech-to-Text (Whisper,
AssemblyAI), Text-to-Speech (GPT-TTS, Murf.ai)
Voice cloning, audio
enhancement, podcast cleanup

Computer Vision

Object detection, face
recognition
Medical imaging analysis
CCTV analytics, pose
estimation
Robotics perception for
navigation and manipulation

Software & Full-Stack Development

React.js, Next.js
Node.js
Python
Web development, mobile
development
AI integration &
automation pipelines

Business Focus

Experience creating
AI-powered tools that optimize support, reduce costs, and automate
workflows.
Deep understanding of
integrating AI into business environments such as customer service,
operations, logistics, marketing, legal, and research.
Ability to design systems
that improve decision-making, streamline processes, and increase ROI
through intelligent automation.

Experience

Chatbots (support bots, FAQ,
internal assistants)
Voice agents for call
centers, assistants, and IVR systems
Multimodal agents combining
text, voice, and images
Image generation apps, video
generation tools, avatar creation, photo editing
AI music generation workflows
Smart meeting transcribers
and podcast/audio cleanup systems
PDF Q&A and document
understanding solutions
Company knowledge assistants
and domain-specific RAG systems
Search + answer engines and
research agents
Coding agents,
email/automation agents, web-browsing agents
CV systems for robotics,
drones, warehouses, and surveillance
AI monitoring, analytics, and
model hosting setups

Others

Strong background in web
scraping for data pipelines
Experience building
automation systems across cloud and on-prem
Capable of designing
scalable, production-ready AI architectures
Cross-platform development
skills for both web and mobile apps

About the seller

jacob777999

Seller

Not rated yet

From

United Kingdom

Last Seen

2 months ago

Member Since

December 11, 2025

Instructions

I am AI Engineer and Full-Stack Developer with deep expertise in building intelligent systems across text, voice, image, and video modalities. Skilled in designing production-grade AI applications—from chatbots and multimodal assistants to automation pipelines and computer-vision systems. Strong foundation in LLMs, RAG, fine-tuning, and scalable web/mobile development, with a proven ability to integrate AI into real-world products.

Skills
AI/ML & NLP
• LLMs, Chat AI, NLP, text generation & summarization
• RAG, vector databases, dataset labeling
• Fine-tuning, LangChain, LangGraph
• Sentiment analysis, classification, NER, topic modeling
Multimodal AI
• Text-to-Image (DALL·E, Midjourney, SDXL)
• Text-to-Video (Sora, Runway, Pika Labs)
• Image-to-Video (Runway, Stable Video Diffusion)
• Speech-to-Text (Whisper, AssemblyAI), Text-to-Speech (GPT-TTS, Murf.ai)
• Voice cloning, audio enhancement, podcast cleanup
Computer Vision
• Object detection, face recognition
• Medical imaging analysis
• CCTV analytics, pose estimation
• Robotics perception for navigation and manipulation
Software & Full-Stack Development
• React.js, Next.js
• Node.js
• Python
• Web development, mobile development
• AI integration & automation pipelines

Business Focus
• Experience creating AI-powered tools that optimize support, reduce costs, and automate workflows.
• Deep understanding of integrating AI into business environments such as customer service, operations, logistics, marketing, legal, and research.
• Ability to design systems that improve decision-making, streamline processes, and increase ROI through intelligent automation.

Experience
• Chatbots (support bots, FAQ, internal assistants)
• Voice agents for call centers, assistants, and IVR systems
• Multimodal agents combining text, voice, and images
• Image generation apps, video generation tools, avatar creation, photo editing
• AI music generation workflows
• Smart meeting transcribers and podcast/audio cleanup systems
• PDF Q&A and document understanding solutions
• Company knowledge assistants and domain-specific RAG systems
• Search +

Booking

Milestones

FAQ

What is your nickname

12345678

Audio

Preview

Map

Additional Details

Order Additional

hourly Rate

15.00 USD

1.00 USD

Feedback

This job has no reviews.

Full-Stack & Multimodal AI Engineer (Text · Voice · Image · Video)

hourly Rate

15.00 USD

hourly Rate

15.00 USD

1.00 USD

Earn A 10% Commission!

Make Money Sharing Zeerk With Your Friends

You may share your affiliate link on websites, forums, social networks, blogs or articles.

Anyone who clicks this link will be tagged with your cookie and you will make 10% of whatever they buy on Zeerk.

You can even just send friends to the Zeerk home page and get 10% on anything they stumble upon and buy!

>> Referral URL Generator

Register

Full-Stack & Multimodal AI Engineer (Text · Voice · Image · Video)

jacob777999

Full-Stack & Multimodal AI Engineer (Text · Voice · Image · Video)

Earn A 10% Commission!

Categories

Authentication

Terms

Useful Links

Login

Register

Forgot Password

Full-Stack & Multimodal AI Engineer (Text · Voice · Image · Video)

jacob777999

Full-Stack & Multimodal AI Engineer (Text · Voice · Image · Video)

Earn A 10% Commission!

Follow us on:

Categories

Authentication

Terms

Useful Links