如何统计文本中的字数、字符数和行数（在线及代码）

May 14, 2026 3 min read

字数统计听起来简单，直到遇到边界情况：连字符的单词算一个还是两个？URL算吗？多语言文本怎么处理？本文介绍各环境下准确计数的方法。

Try Word Counter →

在浏览器中

将文本粘贴到字数统计工具，即时获得字数、字符数、不含空格的字符数、句子数、段落数和预计阅读时间。边输入边更新——适合在有字数限制时使用（Twitter帖子、申请书、文章投稿）。

什么算一个单词？

标准定义：由空白分隔的非空白字符序列。按此定义：

hello world → 2个单词
well-known → 1个单词（连字符）
C++ → 1个单词
https://example.com/path?q=1 → 1个单词
"quoted text" → 2个单词（标点符号附在相邻单词上）

大多数情况下这已经足够。如果需要排除URL，统计前先去掉它们。

JavaScript

基本字数统计：

function wordCount(text) {
    return text.trim().split(/\s+/).filter(Boolean).length;
}

console.log(wordCount("Hello world"));        // 2
console.log(wordCount("  spaces   matter  ")); // 2
console.log(wordCount(""));                    // 0

filter(Boolean) 删除多个连续空格或首尾空白产生的空字符串。

字符统计：

function textStats(text) {
    return {
        characters: text.length,
        charactersNoSpaces: text.replace(/\s/g, '').length,
        words: text.trim() === '' ? 0 : text.trim().split(/\s+/).length,
        sentences: (text.match(/[.!?]+/g) || []).length,
        paragraphs: text.trim() === '' ? 0 : text.trim().split(/\n\s*\n/).length,
        readingTimeMinutes: Math.ceil(text.trim().split(/\s+/).length / 200),
    };
}

const stats = textStats("Hello world. This is a test.\n\nSecond paragraph.");
console.log(stats);
// {
//   characters: 48,
//   charactersNoSpaces: 40,
//   words: 9,
//   sentences: 2,
//   paragraphs: 2,
//   readingTimeMinutes: 1
// }

阅读时间按每分钟200个单词计算（网络文本阅读的常用平均值；书籍为250 wpm）。

词频统计（哪些词出现最多）：

function wordFrequency(text) {
    const words = text.toLowerCase().match(/\b[a-z']+\b/g) || [];
    return words.reduce((freq, word) => {
        freq[word] = (freq[word] || 0) + 1;
        return freq;
    }, {});
}

const freq = wordFrequency("the cat sat on the mat the cat");
const sorted = Object.entries(freq).sort((a, b) => b[1] - a[1]);
console.log(sorted);
// [['the', 3], ['cat', 2], ['sat', 1], ['on', 1], ['mat', 1]]

Python

字数和字符数统计：

def text_stats(text: str) -> dict:
    words = text.split()
    sentences = len([s for s in text.replace('!', '.').replace('?', '.').split('.') if s.strip()])
    paragraphs = len([p for p in text.strip().split('\n\n') if p.strip()])

    return {
        'characters': len(text),
        'characters_no_spaces': len(text.replace(' ', '')),
        'words': len(words),
        'sentences': sentences,
        'paragraphs': paragraphs,
        'reading_time_minutes': max(1, len(words) // 200),
    }

sample = "Hello world. This is a test.\n\nSecond paragraph here."
print(text_stats(sample))

统计文件中的字数：

def count_words_in_file(filepath: str) -> dict:
    with open(filepath, encoding='utf-8') as f:
        text = f.read()
    return text_stats(text)

print(count_words_in_file('essay.txt'))

词频统计：

from collections import Counter
import re

def word_frequency(text: str, top_n: int = 10) -> list[tuple[str, int]]:
    words = re.findall(r"\b[a-z']+\b", text.lower())
    return Counter(words).most_common(top_n)

sample = "the cat sat on the mat the cat"
print(word_frequency(sample))
# [('the', 3), ('cat', 2), ('sat', 1), ('on', 1), ('mat', 1)]

命令行

Linux/macOS：

# 统计单词数
wc -w file.txt

# 统计行数、单词数、字符数
wc file.txt
# 输出：行数 单词数 字符数 文件名

# 统计字符串中的单词数
echo "hello world" | wc -w
# 2

# 统计多个文件的单词数及总计
wc -w *.txt

统计唯一单词数（词汇量）：

cat file.txt | tr '[:upper:]' '[:lower:]' | tr -cs '[:alpha:]' '\n' | sort | uniq -c | sort -rn | head -20

这个管道：转小写 → 每行一个单词 → 排序 → 统计唯一值 → 按数量排序。

Python一行代码：

python3 -c "import sys; text=open(sys.argv[1]).read(); print(len(text.split()))" file.txt

Windows PowerShell：

(Get-Content file.txt -Raw).Split() | Where-Object { $_ } | Measure-Object | Select-Object -ExpandProperty Count

各平台常见字符限制

平台/格式	限制	计数单位
Twitter / X 帖子	280个字符	字符（非单词）
短信	160个字符	每段字符
Meta标题（SEO）	50–60个字符	字符
Meta描述（SEO）	150–160个字符	字符
LinkedIn帖子	3,000个字符	字符
Instagram说明	2,200个字符	字符
Google我的商家帖子	1,500个字符	字符
Medium文章（最佳）	1,500–2,500个单词	单词
博客文章（SEO平均）	1,200–2,500个单词	单词

对于SEO标题和描述，字符数比字数更重要，因为Google按像素宽度截断（标题约580像素）。

统计前排除特定内容

统计字数时忽略URL：

function countWordsNoUrls(text) {
    const noUrls = text.replace(/https?:\/\/\S+/g, '');
    return noUrls.trim().split(/\s+/).filter(Boolean).length;
}

统计字数时忽略代码块（Markdown）：

import re

def count_words_no_code(markdown: str) -> int:
    # 删除围栏代码块
    no_code = re.sub(r'```[\s\S]*?```', '', markdown)
    # 删除行内代码
    no_code = re.sub(r'`[^`]+`', '', no_code)
    return len(no_code.split())

要点总结

字数统计：Python用text.split()，JavaScript用text.trim().split(/\s+/).filter(Boolean)。
字符数统计：Python用len(text)，JavaScript用text.length。
命令行：Linux/macOS用wc -w file.txt。
阅读时间：字数 ÷ 200（向上取整）。
对于SEO元数据，统计字符数而非单词数——Google按像素宽度截断，而非单词数。

Try Word Counter →