टेक्स्ट में शब्द, अक्षर और लाइनें कैसे गिनें (ऑनलाइन और कोड)

May 14, 2026 4 min read

शब्द गिनना आसान लगता है जब तक edge cases न आएं: हाइफ़नेटेड शब्द एक गिना जाए या दो? URL गिने जाएं? कई भाषाओं में टेक्स्ट का क्या? यह गाइड हर environment में सटीक गिनती के तरीके बताती है।

Try Word Counter →

ब्राउज़र में

टेक्स्ट को word counter में पेस्ट करें और शब्द, अक्षर, बिना स्पेस के अक्षर, वाक्य, पैराग्राफ और अनुमानित पढ़ने का समय तुरंत पाएं। टाइप करते समय अपडेट होता है — लिमिट में लिखते समय उपयोगी (Twitter थ्रेड, grant application, article submission)।

एक शब्द क्या होता है?

स्टैंडर्ड परिभाषा: whitespace से अलग किए गए non-whitespace कैरेक्टर की sequence। इस परिभाषा के अनुसार:

hello world → 2 शब्द
well-known → 1 शब्द (हाइफ़नेटेड)
C++ → 1 शब्द
https://example.com/path?q=1 → 1 शब्द
"quoted text" → 2 शब्द (punctuation आसपास के शब्दों से जुड़ता है)

ज़्यादातर मामलों में यह ठीक है। URL बाहर करने हों तो गिनने से पहले हटाएं।

JavaScript

बेसिक वर्ड काउंट:

function wordCount(text) {
    return text.trim().split(/\s+/).filter(Boolean).length;
}

console.log(wordCount("Hello world"));        // 2
console.log(wordCount("  spaces   matter  ")); // 2
console.log(wordCount(""));                    // 0

filter(Boolean) कई लगातार स्पेस या leading/trailing whitespace से बनी empty strings हटाता है।

कैरेक्टर काउंट:

function textStats(text) {
    return {
        characters: text.length,
        charactersNoSpaces: text.replace(/\s/g, '').length,
        words: text.trim() === '' ? 0 : text.trim().split(/\s+/).length,
        sentences: (text.match(/[.!?]+/g) || []).length,
        paragraphs: text.trim() === '' ? 0 : text.trim().split(/\n\s*\n/).length,
        readingTimeMinutes: Math.ceil(text.trim().split(/\s+/).length / 200),
    };
}

const stats = textStats("Hello world. This is a test.\n\nSecond paragraph.");
console.log(stats);
// {
//   characters: 48,
//   charactersNoSpaces: 40,
//   words: 9,
//   sentences: 2,
//   paragraphs: 2,
//   readingTimeMinutes: 1
// }

पढ़ने का समय 200 शब्द प्रति मिनट पर आधारित है (ऑनलाइन टेक्स्ट पढ़ने का सामान्य औसत; किताबों के लिए 250 wpm)।

वर्ड फ़्रीक्वेंसी (कौन से शब्द सबसे ज़्यादा आते हैं):

function wordFrequency(text) {
    const words = text.toLowerCase().match(/\b[a-z']+\b/g) || [];
    return words.reduce((freq, word) => {
        freq[word] = (freq[word] || 0) + 1;
        return freq;
    }, {});
}

const freq = wordFrequency("the cat sat on the mat the cat");
const sorted = Object.entries(freq).sort((a, b) => b[1] - a[1]);
console.log(sorted);
// [['the', 3], ['cat', 2], ['sat', 1], ['on', 1], ['mat', 1]]

Python

शब्द और अक्षर काउंट:

def text_stats(text: str) -> dict:
    words = text.split()
    sentences = len([s for s in text.replace('!', '.').replace('?', '.').split('.') if s.strip()])
    paragraphs = len([p for p in text.strip().split('\n\n') if p.strip()])

    return {
        'characters': len(text),
        'characters_no_spaces': len(text.replace(' ', '')),
        'words': len(words),
        'sentences': sentences,
        'paragraphs': paragraphs,
        'reading_time_minutes': max(1, len(words) // 200),
    }

sample = "Hello world. This is a test.\n\nSecond paragraph here."
print(text_stats(sample))

फ़ाइल में शब्द गिनें:

def count_words_in_file(filepath: str) -> dict:
    with open(filepath, encoding='utf-8') as f:
        text = f.read()
    return text_stats(text)

print(count_words_in_file('essay.txt'))

वर्ड फ़्रीक्वेंसी:

from collections import Counter
import re

def word_frequency(text: str, top_n: int = 10) -> list[tuple[str, int]]:
    words = re.findall(r"\b[a-z']+\b", text.lower())
    return Counter(words).most_common(top_n)

sample = "the cat sat on the mat the cat"
print(word_frequency(sample))
# [('the', 3), ('cat', 2), ('sat', 1), ('on', 1), ('mat', 1)]

कमांड लाइन

Linux/macOS:

# शब्द गिनें
wc -w file.txt

# लाइनें, शब्द, अक्षर गिनें
wc file.txt
# आउटपुट: lines words chars filename

# string में शब्द गिनें
echo "hello world" | wc -w
# 2

# कई फ़ाइलों में शब्द गिनें, कुल के साथ
wc -w *.txt

यूनीक शब्द गिनें (vocabulary size):

cat file.txt | tr '[:upper:]' '[:lower:]' | tr -cs '[:alpha:]' '\n' | sort | uniq -c | sort -rn | head -20

यह pipeline: lowercase → एक लाइन पर एक शब्द → sort → unique गिनें → count से sort करें।

Python वन-लाइनर:

python3 -c "import sys; text=open(sys.argv[1]).read(); print(len(text.split()))" file.txt

Windows PowerShell:

(Get-Content file.txt -Raw).Split() | Where-Object { $_ } | Measure-Object | Select-Object -ExpandProperty Count

विभिन्न प्लेटफ़ॉर्म की आम कैरेक्टर लिमिट

प्लेटफ़ॉर्म / फ़ॉर्मेट	लिमिट	क्या गिना जाता है
Twitter / X पोस्ट	280 कैरेक्टर	कैरेक्टर (शब्द नहीं)
SMS	160 कैरेक्टर	प्रति segment कैरेक्टर
Meta title (SEO)	50–60 कैरेक्टर	कैरेक्टर
Meta description (SEO)	150–160 कैरेक्टर	कैरेक्टर
LinkedIn पोस्ट	3,000 कैरेक्टर	कैरेक्टर
Instagram caption	2,200 कैरेक्टर	कैरेक्टर
Google My Business पोस्ट	1,500 कैरेक्टर	कैरेक्टर
Medium article (optimal)	1,500–2,500 शब्द	शब्द
Blog post (SEO average)	1,200–2,500 शब्द	शब्द

SEO titles और descriptions के लिए, शब्द गिनती से ज़्यादा कैरेक्टर काउंट मायने रखता है क्योंकि Google pixel width के हिसाब से truncate करता है (titles के लिए लगभग 580 px)।

गिनने से पहले खास content हटाना

URL को नज़रअंदाज़ करते हुए शब्द गिनें:

function countWordsNoUrls(text) {
    const noUrls = text.replace(/https?:\/\/\S+/g, '');
    return noUrls.trim().split(/\s+/).filter(Boolean).length;
}

Code blocks (Markdown) को नज़रअंदाज़ करते हुए शब्द गिनें:

import re

def count_words_no_code(markdown: str) -> int:
    # Fenced code blocks हटाएं
    no_code = re.sub(r'```[\s\S]*?```', '', markdown)
    # Inline code हटाएं
    no_code = re.sub(r'`[^`]+`', '', no_code)
    return len(no_code.split())

मुख्य बातें

वर्ड काउंट: Python में text.split(), JavaScript में text.trim().split(/\s+/).filter(Boolean)।
कैरेक्टर काउंट: Python में len(text), JavaScript में text.length।
कमांड लाइन: Linux/macOS पर wc -w file.txt।
पढ़ने का समय: शब्द ÷ 200 (ऊपर की ओर round करें)।
SEO metadata के लिए, शब्द नहीं कैरेक्टर गिनें — Google pixel width से truncate करता है, शब्द count से नहीं।

Try Word Counter →